Difference between revisions of "Contrastive grammar"

From LING073
Jump to: navigation, search
(The assignment)
(checked first section)
Line 1: Line 1:
This document is in flux.  Stand by while it's worked on.  —[[User:Jwashin1|Jwashin1]] ([[User talk:Jwashin1|talk]]) 15:03, 19 March 2019 (EDT)
 
 
 
A previous version of this document is available at [[Spring 2018/Contrastive grammar]].
 
A previous version of this document is available at [[Spring 2018/Contrastive grammar]].
  
Line 7: Line 5:
 
== Contrastive grammars ==
 
== Contrastive grammars ==
 
=== How to approach identifying linguistic differences ===
 
=== How to approach identifying linguistic differences ===
* The last step we'll be implementing in our RBMT pipeline is '''structural transfer''', where we can adjust the output of lexical transfer and lexical selection.  We can adjust (change, add, subtract) tags and the order of words.
+
* The last step we'll be implementing in our RBMT pipeline is '''structural transfer''', where we can adjust the output of lexical transfer and lexical selection.  We can adjust (change, add, subtract) tags and the order of words and phrases.
* Start by comparing the current output of your translation system to the ideal output.  Focus on <code>#</code> and words in the wrong order.
+
* Start by comparing the current output of your translation system to the ideal output.  Focus on words marked with <code>#</code> in the output and words in the wrong order.
 
* For a given way of expressing the same thing (a translation, or equivalent sentence), find differences in the parse (tags or order generated versus expected), ignoring the content of the lexical items.
 
* For a given way of expressing the same thing (a translation, or equivalent sentence), find differences in the parse (tags or order generated versus expected), ignoring the content of the lexical items.
 
* I.e., make sure a given full version of the sentence in each language is tagged correctly and otherwise translates correctly (has correct analysis, disambiguation, lexical transfer, and lexical selection), and then compare the output of the <code>lex</code> mode to the desired analysis to be generated (e.g., by running the correct translation through that language's transducer).
 
* I.e., make sure a given full version of the sentence in each language is tagged correctly and otherwise translates correctly (has correct analysis, disambiguation, lexical transfer, and lexical selection), and then compare the output of the <code>lex</code> mode to the desired analysis to be generated (e.g., by running the correct translation through that language's transducer).

Revision as of 13:04, 18 April 2021

A previous version of this document is available at Spring 2018/Contrastive grammar.

Some of the content that used to be here is now on the Lexical selection page.

Contrastive grammars

How to approach identifying linguistic differences

  • The last step we'll be implementing in our RBMT pipeline is structural transfer, where we can adjust the output of lexical transfer and lexical selection. We can adjust (change, add, subtract) tags and the order of words and phrases.
  • Start by comparing the current output of your translation system to the ideal output. Focus on words marked with # in the output and words in the wrong order.
  • For a given way of expressing the same thing (a translation, or equivalent sentence), find differences in the parse (tags or order generated versus expected), ignoring the content of the lexical items.
  • I.e., make sure a given full version of the sentence in each language is tagged correctly and otherwise translates correctly (has correct analysis, disambiguation, lexical transfer, and lexical selection), and then compare the output of the lex mode to the desired analysis to be generated (e.g., by running the correct translation through that language's transducer).

What constitutes a linguistic difference for structural transfer

  • You'll want to identify any differences in word order, tags used, etc.
    • An example of a set of similar languages might be:
      "Je le mangerai": ^je/je<prn><pers><p1><sg><nom>$ ^le/le<prn><pers><p3><m><acc>$ ^mangerais/manger<v><tv><fut><p1><sg>$
      "Yo lo comeré": ^yo/yo<prn><pers><p1><sg><nom>$ ^lo/lo<prn><pers><p3><m><acc>$ ^comeré/comer<v><tv><fut><p1><sg>$
    • An example of a set of rather different languages might be:
      "Yiyemeyeceğim": ^yiyemeyeceğim/ye<v><tv><abil><neg><fut>+i<cop><aor><p1><sg>$
      "I won't be able to eat it": ^I/I<prn><pers><p1><sg><subj>$ ^won't/will<vaux>+not<adv>$ ^be/be<v><iv><inf>$ ^able/able<adj>$ ^to/to<pr>$ $eat/eat<v><tv><inf>$ ^it/it<prn><pers><p3><nt><obj>$
  • In the above example of similar languages, there are no differences in tags, only in stems. Here there is nothing to contrast. In the above example of different languages, there is quite a bit that can be contrasted—but almost too much to break down into individual points.
  • An example of languages with some minor differences might be:
    • "Je l'ai vu": ^je/je<prn><pers><p1><sg><nom>$ ^l'ai/le<prn><pers><p3><m><acc>+avoir<vaux><pres><p1><sg>$ ^vu/voir<v><tv><prc_past>$
      "Yo lo vi": ^yo/yo<prn><pers><p1><sg><nom>$ ^lo/lo<prn><pers><p3><m><acc>$ ^vi/ver<v><tv><ifi><p1><sg>$
    • Here the difference is that one language uses the auxiliary with a participle where the other language uses a simple verb form
  • Note that the following does not constitute a contrastive difference
    • "Je mange la glace": ^je/je<prn><pers><p1><sg><nom>$ ^mange/manger<v><tv><pres><p1><sg>$ ^la/le<det><def><f>$ ^glace/glace<n><f>$
      "Yo como el hielo": ^yo/yo<prn><pers><p1><sg><nom>$ ^como/comer<v><tv><pres><p1><sg>$ ^el/el<det><def><m>$ ^hielo/hielo<n><m>$
    • This is still useful to note, as it can still be implemented as a transfer rule.

The examples above using templates

  • (fra) Je le mangerai → (spa) Yo lo comeré ("I'll eat it.")
    (fra) je<prn><pers><p1><sg><nom> le<prn><pers><p3><m><acc> manger<v><tv><fut><p1><sg> → (spa) yo<prn><pers><p1><sg><nom> lo<prn><pers><p3><m><acc> comer<v><tv><fut><p1><sg>
  • (tur) yiyemeyeceğim → (eng) I won't be able to eat it
    (tur) ye<v><tv><abil><neg><fut>+i<cop><aor><p1><sg> → (eng) I<prn><pers><p1><sg><subj> will<vaux>+not<adv> be<v><iv><inf> able<adj> to<pr> eat<v><tv><inf> it<prn><pers><p3><nt><obj>

The assignment

This assignment is due at the end of the 11th week of class (this semester: Friday, April 22nd, 2021 at 23:59, i.e., by midnight)

Contrastive grammar

  1. Create a page on the wiki named "LanguageX and LanguageY/Contrastive Grammar". Add it to the category Category:Sp21_ContrastiveGrammars and the categories for the two languages. Add a link to the page under "Developed resources" on the main page.
  2. Using the ten sentences with all stems translated from the lexical transfer assignment, identify at least five differences between the two languages, trying to cover a range of phenomena (e.g., don't make everything about nouns).
    • These can be anything where the analyses of equivalent phrases look different when the roots are ignored: i.e., differences in tags used (will appear as # in full pipeline output), differences in basic word order, etc. It can also be where a word is added (e.g., adding a definite article in English where one doesn't occur in another language) or where a tag determines a new word (e.g., a negative morpheme on a verb being realised as "not" in English).
  3. Document these differences on the wiki.
    • You'll want on the order of two or three examples of each grammatical difference.
    • Make a section on the Contrastive Grammar page for "xyz-abc tests"
      • If you're doing two-way translation (i.e., you've joined forces with another group and each half of the group is working on a separate direction), then also add a section called "abc-xyz tests". In this case, each section will have equivalent tests, but in opposite directions.
    • Use the TransferMorphTest and TransferTest templates. For each example, use both templates. The use of these templates is demonstrated above and on the Transfer rules page. You'll scrape the content of these templates into a file later for testing, so make sure to use them correctly!