Contrastive grammar

From LING073
Jump to: navigation, search

What constitutes a contrastive grammar for RBMT

  • Start by looking at material in the language you're translating into your language.
  • For a given way of expressing the same thing (a translation, or equivalent sentence), find differences in the parse, ignoring the content of the lexical items.
  • I.e., make sure the full version of the sentence in each language parses correctly (and has correct disambiguation), and then compare the parses side-by-side.
  • You'll want to identify any differences in word order, tags used, etc.
    • An example of a set of similar languages might be:
      "Je le mangerais": ^je/je<prn><pers><p1><sg><nom>$ ^le/le<prn><pers><p3><m><acc>$ ^mangerais/manger<v><tv><fut><p1><sg>$
      "Yo lo comeré": ^yo/yo<prn><pers><p1><sg><nom>$ ^lo/lo<prn><pers><p3><m><acc>$ ^comeré/comer<v><tv><fut><p1><sg>$
    • An example of a set of rather different languages might be:
      "Yiyemeyeceğim": ^yiyemeyeceğim/ye<v><tv><abil><neg><fut>+i<cop><aor><p1><sg>$
      "I won't be able to eat it": ^I/I<prn><pers><p1><sg><subj>$ ^won't/will<vaux>+not<adv>$ ^be/be<v><iv>$ ^able/able<adj>$ ^to eat/eat<v><tv><inf>$ ^it/it<prn>pers><p3><nt><obj>$
  • In the above example of similar languages, there are no differences in tags, only in stems. Here there is nothing to contrast. In the above example of different languages, there is quite a bit that can be contrasted—but almost too much to break down into individual points.
  • An example of languages with some minor differences might be:
    • "Je l'ai vu": ^je/je<prn><pers><p1><sg><nom>$ ^l'ai/le<prn><pers><p3><m><acc>+avoir<vaux><pres><p1><sg>$ ^vu/voir<v><tv><prc_past>$
      "Yo lo vi": ^yo/yo<prn><pers><p1><sg><nom>$ ^lo/lo<prn><pers><p3><m><acc>$ ^vi/ver<v><tv><ifi><p1><sg>$
    • Here the difference is that one language uses the auxiliary with a participle where the other language uses a simple verb form
  • Note that the following does not constitute a contrastive difference
    • "Je mange la glace": ^je/je<prn><pers><p1><sg><nom>$ ^mange/manger<v><tv><pres><p1><sg>$ ^la/le<det><def><f>$ ^glace/glace<n><f>$
      "Yo como el hielo": ^yo/yo<prn><pers><p1><sg><nom>$ ^como/comer<v><tv><pres><p1><sg>$ ^el/el<det><def><m>$ ^hielo/hielo<n><m>$
    • This is still useful to note, as it can still be implemented as a transfer rule.

The assignment

This assignment is due at the end of the 8th week of class (this semester: Friday, March 17th, 2017 at noon)

Getting started

  1. Figure out who you'd like to work with on MT
    • Ideally you'll work with someone who's been working on a closely related or structurally extremely similar language. This may not be feasible in all cases.
    • Also good are languages that are spoken in close proximity to one another or share similar cultural influences, even if unrelated.
    • You should look for languages that mark similar things on the same parts of speech and/or that have similar word order.
    • You may work in groups of 2 or 3, or in some rare cases I may allow groups of 4. In groups of any size, each language should be both a source and destination of translation.
  2. Share the corpus, keyboard, and transducer repos with each other. I.e., give each other read access (not necessarily write access).
  3. Fork the repos of the language you're translating from. Do this through the github interface.
  4. Clone the forked repos locally (in your ~/Source directory).
  5. You'll want to look at the corpus repo for this assignment, and you'll want the other stuff around for later. We'll also talk more about forking, pull requests, etc. later.
  6. Create a page on the wiki named "LanguageX and LanguageY" (either order, and just one page).
    • Add the page to the category Category:Sp17_TranslationPairs.
    • Put a note at the top along the lines of "Resources for machine translation between LanguageX and LanguageY", where the language names link back to your pages on them.
    • Add a link on each of the language pages to this new page, making a similar note as above.

Parallel corpus

  1. Create a repository on github named ling073-xyz-abc-corpus. Make sure both collaborators have full access to it. Put a link to it on the wiki page in a section title "Developed resources".
  2. Construct a parallel corpus of at least 500 characters in one of the languages (or 250 characters if a syllabic writing system).
    • The corpus will ideally consist of sentences with the same meaning. You can extract sentences from translations of the same text (e.g., bible translations or the Universal Declaration of Human Rights), or use phrasebook examples that are similar.
    • You can also try to use example sentences from your sources to try to construct grammatical sentences with similar meaning, but make note of this.
    • In extreme cases, you may include short phrases with the same meaning, but please speak with the professor about this as soon as possible.
  3. Populate two files xyz.sentences.txt and abc.sentences.txt with the parallel text. I recommend adding eng.sentences.txt to keep track of what the sentences mean.
  4. Include a LICENSE, AUTHORS, and MANIFEST files as before (the latter to note origin and licensing of groups of sentences and any other notes).

Contrastive grammar

  1. Create a page on the wiki named "LanguageX and LanguageY/Contrastive Grammar". Add it to the category Category:Sp17_ContrastiveGrammars. Add a link to the page under "Developed resources" on the main page.
  2. Identify at least five differences between the two languages.
    • These can be anything where the analyses of equivalent phrases look different when the roots are ignored: i.e., differences in tags used, differences in basic word order, etc.
  3. Document these differences on the wiki.
    • Make two sections on the Contrastive Grammar page: one called "xyz-abc tests" and one called "abc-xyz tests".
    • Each section will have equivalent tests, but in opposite directions.
    • Use the TransferMorphTest and TransferTest templates as used on the Transfer rules page.