Difference between revisions of "Nuosu and Mandarin/Final Project"

From LING073
Jump to: navigation, search
Line 54: Line 54:
 
** WER:
 
** WER:
 
** PER:
 
** PER:
 
  
 
== Future Work ==
 
== Future Work ==
 
* Add more structural transfer rules to deal with compounding in Mandarin.
 
* Add more structural transfer rules to deal with compounding in Mandarin.
* Reduce rule conflicts in twol file; improve on the implementation of the negation suffix 'ap'
+
* Reduce rule conflicts in twol file; improve on the implementation of the negation suffix 'ap'.
 
 
  
 
== Link to Code and Poster  ==
 
== Link to Code and Poster  ==
  
*Morphological Analyser, generator, and disambiguator  
+
*Morphological Analyser, generator, and disambiguator for Nuosu
 
**[https://github.swarthmore.edu/Ling073-sp22/ling073-iii Morphological Transducer]
 
**[https://github.swarthmore.edu/Ling073-sp22/ling073-iii Morphological Transducer]
  
 
*Machine Translation between Nuosu and Mandarin
 
*Machine Translation between Nuosu and Mandarin
 
**[https://github.swarthmore.edu/Ling073-sp22/ling073-iii-zho Machine Translation System]
 
**[https://github.swarthmore.edu/Ling073-sp22/ling073-iii-zho Machine Translation System]
 +
 +
*Improved Mandarin Transducer
 +
**[https://github.swarthmore.edu/Ling073-sp22/apertium-zho Apertium-zho]
 +
 +
  
 
*Summary Poster of the Project
 
*Summary Poster of the Project

Revision as of 20:19, 12 May 2022

Final Project Description

For our final project, we decided to expand our morphological transducer for Nuosu, aiming for a corpus coverage above 80%. We also improved our Nuosu-Mandarin machine translation accuracy. Since there are 2 million Nuosu speakers, and 40% of them are bilingual speakers of Nuosu and Mandarin, we think a translation system between Nuosu and Mandarin can benefit this population. We have been using a Nuosu an online glossary for Nuosu and Mandari/English translation. We think our our morphological transducer and machine translation system are potentially helpful with generating more sentence examples for the website.

Previous Work

Nuosu Transducer

Nuosu-Mandarin Machine Translation

Additions

Nuosu Transducer

  • We added 100+ stems to the lexd. file, covering common nouns, verbs, and adjectives mentioned in A Grammar of Nuosu (Gerner 2013).
  • We attempted to implement the negation suffix 'ap' using twol rule; currently the negation infix is only working for monosyllabic verbs and adjectives.

Nuosu-Mandarin Machine Translation

  • Added one structural transfer rule to deal with sentence level particles in Nuosu that are omitted in Mandarin.
  • Added one structural transfer rule to insert Mandarin particle 'de', which can function as predicate marker, possesive marker, adjectivizer, etc.
  • Added ~80 new lexicon to our bilingual dictionary, making the total ~200.

Corpus Construction

  • Constructed small corpus with 30 sentences; the lexicon in this corpus are mostly added to our transducer and the bilingual dictionary, which is useful for evaluateing how the effectiveness of the structural transfer rules when the lexicon are mostly known.
  • Constructed large corpus with 100 sentences; the sentences in this corpus are randomly selected from the Bible and the Grammar book, and evaluating over this corpus shows the general accuracy of our machine translation system.

Selected Example Sentences from the Large Corpus

  • (iii) ꋌꊋꋠꃅꀥ → (zho) 他用力地跑
    (iii) ꋌ<prn> ꊋꋠ<adj><advl><vblex> → (zho) 他<prn> 用力地<adv><vblex>
  • (iii) ꉪꊇꀃꑍꏦꎫꁧ → (zho) 我们今天去街上
    (iii) ꉪꊇ<prn> ꀃꑍ<n> ꏦꎫ<n><vblex> → (zho) 我们<prn> 今天<n><vblex> 街上<n>
  • (iii) ꉢꎧꀋꅝꀐ → (zho) 我没有喝红酒
    (iii) ꉢ<prn><n><neg><vblex><part> → (zho) 我<prn> 没有<neg><vblex> 红酒<n>
  • (iii) ꊾꆹꉬꇮꐯꀋꇊꌡ → (zho) 所有人差不多地像
    (iii) ꊾ<n><part> ꉬꇮ<adj><RECL> ꀋꇊ<adv><vblex> → (zho) 所有<adj><n> 差不多地<adv><vblex>
  • (iii) ꉢꃀꌠꀐ → (zho) 我是老人了
    (iii) ꉢ<prn> ꃀꌠ<n><part> → (zho) 我<prn> 现在<adv><pr> 老人<n>

Final Evaluation

Transducer

  • Coverage over the Bible: 78.5%
  • Precision: 97.60%
  • Recall: 86.46%

Machine Translation

  • Small Corpora:
    • WER: 37.9%
    • PER: 17.8%
  • Large Corpora:
    • WER:
    • PER:

Future Work

  • Add more structural transfer rules to deal with compounding in Mandarin.
  • Reduce rule conflicts in twol file; improve on the implementation of the negation suffix 'ap'.

Link to Code and Poster


  • Summary Poster of the Project