Contrastive grammar

From LING073
Jump to: navigation, search

A previous version of this document is available at Spring 2018/Contrastive grammar.

Some of the content that used to be here is now on the Lexical selection page.

Contrastive grammars

How to approach identifying linguistic differences

  • The last step we'll be implementing in our RBMT pipeline is structural transfer, where we can adjust the output of lexical transfer and lexical selection. We can adjust (change, add, subtract) tags and the order of words and phrases.
  • Start by comparing the current output of your translation system to the ideal output. Focus on words marked with # in the output and words in the wrong order.
  • For a given way of expressing the same thing (a translation, or equivalent sentence), find differences in the parse (tags or order generated versus expected), ignoring the content of the lexical items.
  • I.e., make sure a given full version of the sentence in each language is tagged correctly and otherwise translates correctly (has correct analysis, disambiguation, lexical transfer, and lexical selection), and then compare the output of the lex mode to the desired analysis to be generated (e.g., by running the correct translation through that language's transducer).

What constitutes a linguistic difference for structural transfer

  • You'll want to identify any differences in word order, tags used, etc.
    • An example of a set of similar languages might be:
      "Je le mangerai": ^je/je<prn><pers><p1><sg><nom>$ ^le/le<prn><pers><p3><m><acc>$ ^mangerais/manger<v><tv><fut><p1><sg>$
      "Yo lo comeré": ^yo/yo<prn><pers><p1><sg><nom>$ ^lo/lo<prn><pers><p3><m><acc>$ ^comeré/comer<v><tv><fut><p1><sg>$
    • An example of a set of rather different languages might be:
      "Yiyemeyeceğim": ^yiyemeyeceğim/ye<v><tv><abil><neg><fut>+i<cop><aor><p1><sg>$
      "I won't be able to eat it": ^I/I<prn><pers><p1><sg><subj>$ ^won't/will<vaux>+not<adv>$ ^be/be<v><iv><inf>$ ^able/able<adj>$ ^to/to<pr>$ $eat/eat<v><tv><inf>$ ^it/it<prn><pers><p3><nt><obj>$
  • In the above example of similar languages, there are no differences in tags, only in stems. Here there is nothing to contrast. In the above example of different languages, there is quite a bit that can be contrasted—but almost too much to break down into individual points.
  • An example of languages with some minor differences might be:
    • "Je l'ai vu": ^je/je<prn><pers><p1><sg><nom>$ ^l'ai/le<prn><pers><p3><m><acc>+avoir<vaux><pres><p1><sg>$ ^vu/voir<v><tv><prc_past>$
      "Yo lo vi": ^yo/yo<prn><pers><p1><sg><nom>$ ^lo/lo<prn><pers><p3><m><acc>$ ^vi/ver<v><tv><ifi><p1><sg>$
    • Here the difference is that one language uses the auxiliary with a participle where the other language uses a simple verb form
  • Note that the following does not constitute a contrastive difference
    • "Je mange la glace": ^je/je<prn><pers><p1><sg><nom>$ ^mange/manger<v><tv><pres><p1><sg>$ ^la/le<det><def><f>$ ^glace/glace<n><f>$
      "Yo como el hielo": ^yo/yo<prn><pers><p1><sg><nom>$ ^como/comer<v><tv><pres><p1><sg>$ ^el/el<det><def><m>$ ^hielo/hielo<n><m>$
    • This is still useful to note, as it can still be implemented as a transfer rule.

The examples above using templates

Click to edit or view the source of this section to see how the transferTest and transferMorphText templates are structured.

  • (fra) Je le mangerai → (spa) Yo lo comeré ("I'll eat it.")
    (fra) je<prn><pers><p1><sg><nom> le<prn><pers><p3><m><acc> manger<v><tv><fut><p1><sg> → (spa) yo<prn><pers><p1><sg><nom> lo<prn><pers><p3><m><acc> comer<v><tv><fut><p1><sg>
  • (tur) yiyemeyeceğim → (eng) I won't be able to eat it
    (tur) ye<v><tv><abil><neg><fut>+i<cop><aor><p1><sg> → (eng) I<prn><pers><p1><sg><subj> will<vaux>+not<adv> be<v><iv><inf> able<adj> to<pr> eat<v><tv><inf> it<prn><pers><p3><nt><obj>

What to focus on

To spell this out, you should focus on:

  • any difference in word order, within a phrase, and of phrases.
For example, do nouns and adjectives occur in the same order in your two languages? adpositions and noun phrases? subjects and predicates (e.g., verbs)?
Importantly, think of how words group together, and how the order of the groups are important. For example, in English, you can get a series of adjectives before a noun (in an expensive big fancy white house), so you can have preposition before a noun phrase, and within the noun phrase you get a determiner followed by an adjective phrase followed by a noun, and within the adjective phrase you get a whole bunch of adjectives.
  • any difference in how tags are used, including how they have to agree between words.
An example of this might be subject-verb agreement, or gender agreement between adjectives and nouns. Any type of grammatical marking in one language but not another is fair game.
But don't worry about subcategorisation, like transitivity in verbs or gender in nouns. (Note that in many languages of Europe, gender constitutes subcategory tags for nouns, but grammatical tags for adjectives.)

The assignment

This assignment is due at the end of the 11th week of class (this semester: Friday, April 22nd, 2021 at 23:59, i.e., by midnight)

Contrastive grammar

  1. Create a page on the wiki named "LanguageX and LanguageY/Contrastive Grammar". Add it to the category Category:Sp21_ContrastiveGrammars and the categories for the two languages. Add a link to the page under "Developed resources" on the main page.
  2. Using the ten sentences with all stems translated from the lexical transfer assignment, identify at least five differences between the two languages, trying to cover a range of phenomena (e.g., don't make everything about nouns).
    • These can be anything where the analyses of equivalent phrases look different when the roots are ignored: i.e., differences in tags used (will appear as # in full pipeline output), differences in basic word order, etc. It can also be where a word is added (e.g., adding a definite article in English where one doesn't occur in another language) or where a tag determines a new word (e.g., a negative morpheme on a verb being realised as "not" in English).
  3. Document these differences on the wiki.
    • You'll want on the order of two or three examples of each grammatical difference.
    • Make a section on the Contrastive Grammar page for "xyz-abc tests"
      • If you're doing two-way translation (i.e., you've joined forces with another group and each half of the group is working on a separate direction), then also add a section called "abc-xyz tests". In this case, each section will have equivalent tests, but in opposite directions.
    • Use the TransferMorphTest and TransferTest templates. For each example, use both templates. The use of these templates is demonstrated above and on the Transfer rules page. You'll scrape the content of these templates into a file later for testing, so make sure to use them correctly!