English and Nivkh

From LING073
Jump to: navigation, search

Resources for machine translation between English and Nivkh

External resources

Developed resources

Initial Evaluation

niv → eng evaluation

Coverage using monolingual transducer:

Number of tokenised words in the corpus: 198
Coverage: 69.7%

Coverage using bilingual transducer:

Number of tokenised words in the corpus: 198
Coverage: 68.69%
Sentence Translation lexical transfer full translation
иф иғдь. he killed him. ^иф<prn><pers><p3><sg>/he<prn><p3><mf><sg>$ ^иғ<v><tv><ind>/kill<vblex><ind>$^.<sent>/.<sent>$ #he #kill.
иф иғныдь. he will kill him. ^иф<prn><pers><p3><sg>/he<prn><p3><mf><sg>$ ^иғ<v><tv><fut><ind>/kill<vblex><fut><ind>$^.<sent>/.<sent>$ #he #kill.
ӿоӻор озр видь. then, standing up, he went away. ^ӿоӻор<adv>/then<adv>$ ^оз<v><iv><cvb_nar><p3><sg>/stand<vblex><cvb_nar><p3><sg>$ ^ви<v><iv><ind>/go<vblex><ind>$^.<sent>/.<sent>$ then #stand #go.
умгу ооламотть. the woman kissed the child. ^умгу<n>/woman<n>$ ^оола<n>/child<n>$ ^мот<v><tv><ind>/kiss<vblex><ind>$^.<sent>/.<sent>$ #woman #child #kiss.
умгу леп пʼоолаардь. the woman fed her child bread. ^умгу<n>/woman<n>$ ^леп<n>/bread<n>$ ^пʼи<det><ref.><poss>/prpers<det><pos><p3><mf><pl>$ ^оола<n>/child<n>$ ^ар<v><tv><ind>/feed<vblex><ind>$^.<sent>/.<sent>$ #woman #bread #self #child #feed.
нивхӄʼатумгу пудь. the woman who scolded the man went out. ^нивх<n>/person<n>$ ^ӽат<v><tv>/scold<vblex>$ ^умгу<n>/woman<n>$ ^пʼу<v><iv><ind>/go out<vblex><ind>$^.<sent>/.<sent>$ #person #scold #woman #go out.
умгу ӿынивхӄʼатть. the woman scolded that man. ^умгу<n>/woman<n>$ ^ӿы<det><dem>/that<det><dem>$ ^нивх<n>/person<n>$ ^ӽат<v><tv><ind>/scold<vblex><ind>$^.<sent>/.<sent>$ #woman #this #person #scold.
ытык пʼызнивхордь. father met the man who had called him. ^*ытык/*ытык$ ^пʼи<prn><ref.>/self<n>$ ^ыз<v><tv>/call<vblex>$ ^нивх<n>/person<n>$ ^ор<v><tv><ind>/meet<vblex><ind>$^.<sent>/.<sent>$ #father #self #call #person #meet.
оола пʼӈафӄзадь. the child beat his friend. ^оола<n>/child<n>$ ^пʼи<det><ref.><poss>/prpers<det><pos><p3><mf><pl>$ ^ӈафӄ<n>/friend<n>$ ^за<v><tv><ind>/beat<vblex><ind>$^.<sent>/.<sent>$ #child #self #friend #beat.
ытык оолаамӽтадь. father praised the child. ^ытык<n>/father<n>$ ^оола<n>/child<n>$ ^амӽта<v><tv><ind>/praise<vblex><ind>$^.<sent>/.<sent>$ #father #child #praise.

Final Evaluation


  • Transducer and machine translation: added 101 stems to transducer and 100 to bilingual dictionary
  • Structural transfer: 2 rules (pronouns and word order)
  • Twol: 4 rules
    • Set /B/ to <п> after fricatives
      • тыфпарк/тыф<n>+варк<part>
      • тыфварк/*тыфварк
    • Set /B/ to <б> after sonorants
      • ^ӄанбарк/ӄан<n>+варк<part>$
      • канварк/*канварк$
    • Set /Q/ to <ӻ> after voiced stops
      • ^ыӈгӻана/ыӈг<v><tv><pot>$
    • Set /Q/ to <ӽ> after voiceless stops
      • ^ӽатӽана/ӽат<v><tv><pot>$

Monolingual Transducer

Precision and Recall against annotated corpus

Totals: 51 forms, 68 tp, 0 fp, 0 tn, 24 fn

Precision: 100.00000%

Recall: 73.91304%

Coverage over large corpus

Number of tokenised words in the corpus: 15696

Coverage: 33.50%

Number of words in large corpus: 76713

Number of stems in the transducer: Unique entries: 297

MT Pair

WER and PER over longer corpus

Word error rate (WER): 80.46 %

Position-independent word error rate (PER): 75.86 %

Percentage of unknown words: 26.92 %

Number of tokens


Number of words in reference: 348

Number of words in test: 208

Number of tokenised words in the corpus: 233


Number of tokenised words in the corpus: 15109

Trimmed Coverage


Coverage: 75.97%


Coverage: 30.28%