Difference between revisions of "Spring 2018/Contrastive grammar"

From LING073
Jump to: navigation, search
(The assignment)
(What constitutes a contrastive grammar for RBMT)
Line 5: Line 5:
 
* You'll want to identify any differences in word order, tags used, etc.
 
* You'll want to identify any differences in word order, tags used, etc.
 
** An example of a set of similar languages might be:
 
** An example of a set of similar languages might be:
**: "Je le mangeré": <code>^je/je<prn><pers><p1><sg><nom>$ ^le/le<prn><pers><p3><m><acc>$ ^mangeré/manger<v><tv><fut><p1><sg>$</code>
+
**: "Je le mangerais": <code>^je/je<prn><pers><p1><sg><nom>$ ^le/le<prn><pers><p3><m><acc>$ ^mangerais/manger<v><tv><fut><p1><sg>$</code>
 
**: "Yo lo comeré": <code>^yo/yo<prn><pers><p1><sg><nom>$ ^lo/lo<prn><pers><p3><m><acc>$ ^comeré/comer<v><tv><fut><p1><sg>$</code>
 
**: "Yo lo comeré": <code>^yo/yo<prn><pers><p1><sg><nom>$ ^lo/lo<prn><pers><p3><m><acc>$ ^comeré/comer<v><tv><fut><p1><sg>$</code>
 
** An example of a set of rather different languages might be:
 
** An example of a set of rather different languages might be:

Revision as of 12:30, 28 February 2017

What constitutes a contrastive grammar for RBMT

  • Start by looking at material in the language you're translating into your language.
  • For a given way of expressing the same thing (a translation, or equivalent sentence), find differences in the parse, ignoring the content of the lexical items.
  • I.e., make sure the full version of the sentence in each language parses correctly (and has correct disambiguation), and then compare the parses side-by-side.
  • You'll want to identify any differences in word order, tags used, etc.
    • An example of a set of similar languages might be:
      "Je le mangerais": ^je/je<prn><pers><p1><sg><nom>$ ^le/le<prn><pers><p3><m><acc>$ ^mangerais/manger<v><tv><fut><p1><sg>$
      "Yo lo comeré": ^yo/yo<prn><pers><p1><sg><nom>$ ^lo/lo<prn><pers><p3><m><acc>$ ^comeré/comer<v><tv><fut><p1><sg>$
    • An example of a set of rather different languages might be:
      "Yiyemeyeceğim": ^yiyemeyeceğim/ye<v><tv><abil><neg><fut>+i<cop><aor><p1><sg>$
      "I won't be able to eat it": ^I/I<prn><pers><p1><sg><subj>$ ^won't/will<vaux>+not<adv>$ ^be/be<v><iv>$ ^able/able<adj>$ ^to eat/eat<v><tv><inf>$ ^it/it<prn>pers><p3><nt><obj>$
  • In the above example of similar languages, there are no differences in tags, only in stems. Here there is nothing to contrast. In the above example of different languages, there is quite a bit that can be contrasted—but almost too much to break down into individual points.
  • An example of languages with some minor differences might be:
    • "Je l'ai vu": ^je/je<prn><pers><p1><sg><nom>$ ^l'ai/le<prn><pers><p3><m><acc>+avoir<vaux><pres><p1><sg>$ ^vu/voir<v><tv><prc_past>$
      "Yo lo vi": ^yo/yo<prn><pers><p1><sg><nom>$ ^lo/lo<prn><pers><p3><m><acc>$ ^vi/ver<v><tv><ifi><p1><sg>$
    • Here the difference is that one language uses the auxiliary with a participle where the other language uses a simple verb form
  • Note that the following does not constitute a contrastive difference
    • "Je mange la glace": ^je/je<prn><pers><p1><sg><nom>$ ^mange/manger<v><tv><pres><p1><sg>$ ^la/le<det><def><f>$ ^glace/glace<n><f>$
      "Yo como el hielo": ^yo/yo<prn><pers><p1><sg><nom>$ ^como/comer<v><tv><pres><p1><sg>$ ^el/el<det><def><m>$ ^hielo/hielo<n><m>$
    • This is still useful to note, as it can still be implemented as a transfer rule.

The assignment

Getting started

  1. Figure out who you'd like to work with on MT
    • Ideally you'll work with someone who's been working on a closely related or structurally extremely similar language. This may not be feasible in all cases.
    • Also good are languages that are spoken in close proximity to one another or share similar cultural influences, even if unrelated.
    • You may work in groups of 2 or 3, or in some rare cases I may allow groups of 4. In groups of any size, each language should be both a source and destination of translation.
  2. Share the corpus, keyboard, and transducer repos with each other. I.e., give each other read access (not necessarily write access).
  3. Fork the repos of the language you're translating from. Do this through the github interface.
  4. Clone the forked repos locally (in your ~/Source directory).
  5. You'll want to look at the corpus repo for this assignment, and you'll want the other stuff around for later. We'll also talk more about forking, pull requests, etc. later.

Parallel corpus

  1. Construct a parallel corpus of at least 500 characters in one of the languages (or 250 characters if a syllabic writing system).

more later

Contrastive grammar

  1. Identify at least five differences between the two languages.
  2. Document with transfer tests.

more later