Nuosu and Mandarin/Final Project

From LING073
Revision as of 21:33, 12 May 2022 by Ywang13 (talk | contribs) (Machine Translation)

Jump to: navigation, search

Final Project Description

For our final project, we decided to expand our morphological transducer for Nuosu, aiming for a corpus coverage above 80%. We also improved our Nuosu-Mandarin machine translation accuracy. Since there are 2 million Nuosu speakers, and 40% of them are bilingual speakers of Nuosu and Mandarin, we think a translation system between Nuosu and Mandarin can benefit this population. We have been using a Nuosu an online glossary for Nuosu and Mandari/English translation. We think our our morphological transducer and machine translation system are potentially helpful with generating more sentence examples for the website.

Previous Work

Nuosu Transducer

Nuosu-Mandarin Machine Translation

Additions

Nuosu Transducer

  • We added 100+ stems to the lexd. file, covering common nouns, verbs, and adjectives mentioned in A Grammar of Nuosu (Gerner 2013).
  • We attempted to implement the negation suffix 'ap' using twol rule; currently the negation infix is only working for monosyllabic verbs and adjectives.

Nuosu-Mandarin Machine Translation

  • Added one structural transfer rule to deal with sentence level particles in Nuosu that are omitted in Mandarin.
  • Added one structural transfer rule to insert Mandarin particle 'de', which can function as predicate marker, possesive marker, adjectivizer, etc.
  • Added ~80 new lexicon to our bilingual dictionary, making the total ~200.

Corpus Construction

  • Constructed small corpus with 30 sentences; the lexicon in this corpus are mostly added to our transducer and the bilingual dictionary, which is useful for evaluateing how the effectiveness of the structural transfer rules when the lexicon are mostly known.
  • Constructed large corpus with 100 sentences; the sentences in this corpus are randomly selected from the Bible and the Grammar book, and evaluating over this corpus shows the general accuracy of our machine translation system.

Selected Example Sentences from the Large Corpus

  • (iii) ꋌꊋꋠꃅꀥ → (zho) 他用力地跑
    (iii) ꋌ<prn> ꊋꋠ<adj><advl><vblex> → (zho) 他<prn> 用力地<adv><vblex>
  • (iii) ꉪꊇꀃꑍꏦꎫꁧ → (zho) 我们今天去街上
    (iii) ꉪꊇ<prn> ꀃꑍ<n> ꏦꎫ<n><vblex> → (zho) 我们<prn> 今天<n><vblex> 街上<n>
  • (iii) ꉢꎧꀋꅝꀐ → (zho) 我没有喝红酒
    (iii) ꉢ<prn><n><neg><vblex><part> → (zho) 我<prn> 没有<neg><vblex> 红酒<n>
  • (iii) ꊾꆹꉬꇮꐯꀋꇊꌡ → (zho) 所有人差不多地像
    (iii) ꊾ<n><part> ꉬꇮ<adj><RECL> ꀋꇊ<adv><vblex> → (zho) 所有<adj><n> 差不多地<adv><vblex>
  • (iii) ꉢꃀꌠꀐ → (zho) 我是老人了
    (iii) ꉢ<prn> ꃀꌠ<n><part> → (zho) 我<prn> 现在<adv><pr> 老人<n>

Final Evaluation

Transducer

  • Coverage over the Bible: 78.5%
  • Precision: 97.60%
  • Recall: 86.46%

Machine Translation

  • Small Corpora:
    • WER: 37.9%
    • PER: 17.8%
  • Large Corpora:
    • WER: 107.5%
    • PER: 80.9%

(We think ths difference in the WERs and PERs is due to the large number of unknown nouns and pronouns contained in the bible that are not added to our bilingual dictionary; also the reference text are polished translations selected from the bible but the test text are direct translations using our machine translation system that maps between lexicons)

Future Work

  • Add more structural transfer rules to deal with compounding in Mandarin.
  • Reduce twol rule conflicts in the transducer; improve on the implementation of the negation suffix 'ap'.
  • Enhance translation accuracy in collaboration with native speakers.

Links to Public Git Repo and Poster

  • Summary Poster of the Project