Difference between revisions of "Navajo and English"

From LING073
Jump to: navigation, search
(Sentence Evaluation)
(Polished RBMT System)
 
(12 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
*[https://wikis.swarthmore.edu/ling073/Navajo/Grammar Grammar]
 
*[https://wikis.swarthmore.edu/ling073/Navajo/Grammar Grammar]
 
*[https://wikis.swarthmore.edu/ling073/Navajo_and_English/Lexical_selection Lexical selection]
 
*[https://wikis.swarthmore.edu/ling073/Navajo_and_English/Lexical_selection Lexical selection]
 +
*[https://wikis.swarthmore.edu/ling073/Navajo_and_English/Contrastive_Grammar Contrastive Grammar]
 +
*[https://wikis.swarthmore.edu/ling073/Navajo_and_English/Structural_transfer Structural Transfer]
  
 
== External Resources  ==
 
== External Resources  ==
Line 12: Line 14:
 
*[https://github.com/apertium/apertium-eng English Transducer]
 
*[https://github.com/apertium/apertium-eng English Transducer]
 
*[https://github.swarthmore.edu/Ling073-sp22/ling073-nav-eng-corpus Corpus repository]
 
*[https://github.swarthmore.edu/Ling073-sp22/ling073-nav-eng-corpus Corpus repository]
 
 
CHANGE THIS
 
  
 
==NAV -> ENG Evaluation==
 
==NAV -> ENG Evaluation==
 
=== Coverage Analysis ===
 
=== Coverage Analysis ===
 
 
FOR FORMATTING PURPOSES ONLY
 
  
 
* Monolingual transducer coverage of small corpus:  407 / 1216 (~33.47%)
 
* Monolingual transducer coverage of small corpus:  407 / 1216 (~33.47%)
 
* Bilingual transducer coverage of small corpus:    533 / 1345 (~39.63%)
 
* Bilingual transducer coverage of small corpus:    533 / 1345 (~39.63%)
 
  
 
=== Sentence Evaluation ===
 
=== Sentence Evaluation ===
Line 31: Line 26:
 
   '''Original sentence:''' Dibé bikééʼ dínááh.
 
   '''Original sentence:''' Dibé bikééʼ dínááh.
 
   '''Intended Translation:''' Go after the sheep.
 
   '''Intended Translation:''' Go after the sheep.
   '''Biltrans Output:''' ^Rusa<n>/Deer<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^ne<det><dem>/@ne<det><dem>$ ^ores<v><iv><p3><sg>/stand<vblex><iv><p3><sg>
+
   '''Biltrans Output:''' ^Dibé<n>/Wood<n>$ ^*bikééʼ/*bikééʼ$ ^dínááh<v>/go<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 
   '''Translation Output:''' #Wood *bikééʼ #go.
 
   '''Translation Output:''' #Wood *bikééʼ #go.
 
==== 2. ====
 
==== 2. ====
 
   '''Original sentence:''' Nimá dóó nizhéʼé bíighah nídaah.
 
   '''Original sentence:''' Nimá dóó nizhéʼé bíighah nídaah.
 
   '''Intended Translation:''' Sit beside your mother and father.
 
   '''Intended Translation:''' Sit beside your mother and father.
   '''Biltrans Output:''' ^Ras<n>/Day<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^dares<n>/sun<n>$ ^i<prn><pers><p3><sg><spc>/the<det><def><sp>$ ^sam<v><iv><p3><sg>/hot<vblex><iv><p3><sg>$ ^kaku<adv>/very<adv>$ ^inja<cnjcoo>/so<cnjadv>$ ^mbrow<v><iv><p3><du>/thirsty<vblex><iv><p3><du>
+
   '''Biltrans Output:''' ^<n><px2sg>/Mother<n><px2sg>$ ^dóó<cnjcoo>/and<cnjcoo>$ ^zhéʼé<n><px2sg>/father<n><px2sg>$ ^*bíighah/*bíighah$ ^nídaah<v>/sit<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 
   '''Translation Output:''' #Mother and #father *bíighah #sit.
 
   '''Translation Output:''' #Mother and #father *bíighah #sit.
  
Line 42: Line 37:
 
   '''Original sentence:''' Chidí biyiʼ ayóo deesdoi.   
 
   '''Original sentence:''' Chidí biyiʼ ayóo deesdoi.   
 
   '''Intended Translation:''' It is very hot inside the vehicle.
 
   '''Intended Translation:''' It is very hot inside the vehicle.
   '''Biltrans Output:''' ^Ikak<n>/Snake<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^yasne<det><dem>/@yasne<det><dem>$ ^nas<v><tv><p3><sg>/smell<vblex><tv><p3><sg>$ ^i<prn><pers><p3><sg>/prpers<prn><subj><p3><m><sg>
+
   '''Biltrans Output:''' ^Chidí<n>/Automobile<n>$ ^*biyiʼ/*biyiʼ$ ^ayóo<adv>/remarkably<adv>$ ^deesdoi<v>/hot<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>
 
   '''Translation Output:''' #Automobile *biyiʼ *ayóo #hot.
 
   '''Translation Output:''' #Automobile *biyiʼ *ayóo #hot.
  
Line 48: Line 43:
 
   '''Original sentence:''' Kodi atooʼ hólǫ́.
 
   '''Original sentence:''' Kodi atooʼ hólǫ́.
 
   '''Intended Translation:''' Here is some stew.
 
   '''Intended Translation:''' Here is some stew.
   '''Biltrans Output:''' ^Vark<v><iv><p3><pc>/Lie<vblex><iv><p3><pc>/Live<vblex><iv><p3><pc>$ ^ro<pr>/at<pr>$ ^mnu<n>/village<n>$ ^i<prn><pers><p3><sg><spc>/the<det><def><sp>$ ^ne<det><dem>/@ne<det><dem>$^.<sent>/.<sent>$
+
   '''Biltrans Output:''' ^Kodi<adv>/Here<adv>$ ^atooʼ<n>/stew<n>$ ^*hólǫ́/*hólǫ́$^.<sent>/.<sent>$^.<sent>/.<sent>$
 
   '''Translation Output:''' Here #stew *hólǫ́.
 
   '''Translation Output:''' Here #stew *hólǫ́.
  
Line 54: Line 49:
 
   '''Original sentence:''' Atooʼ łaʼ naa deeshkááł.  
 
   '''Original sentence:''' Atooʼ łaʼ naa deeshkááł.  
 
   '''Intended Translation:''' I will give you some stew.
 
   '''Intended Translation:''' I will give you some stew.
   '''Biltrans Output:''' ^Mnai<v><iv><p3><sg>/Stop<vblex><iv><p3><sg>$ ^kwar<adv>/already<adv>$^?<sent>/?<sent>$^.<sent>/.<sent>$
+
   '''Biltrans Output:''' ^Atooʼ<n>/Stew<n>$ ^łaʼ<det>/some<det>$ ^naa<post>/around<pp>$ ^deeshkááł<v>/give<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 
   '''Translation Output:''' #Stew #some #around #give.
 
   '''Translation Output:''' #Stew #some #around #give.
  
Line 60: Line 55:
 
   '''Original sentence:''' Wóláchííʼ bighan binaa ałhéénílyeed.  
 
   '''Original sentence:''' Wóláchííʼ bighan binaa ałhéénílyeed.  
 
   '''Intended Translation:''' Go around the ant mound.
 
   '''Intended Translation:''' Go around the ant mound.
   '''Biltrans Output:''' ^Sampe<adv>/Then<adv>$ ^fur<v><tv><p1><pl><ex>/build<vblex><tv><p1><pl><ex>$ ^rum<n>/house<n>$ ^na<prn><pers><p3><pl><inan><spc>/the<det><def><sp>$ ^ne<det><dem>/@ne<det><dem>$ ^ra<pr>/until<pr>$ ^bro<v><iv><p3><pl><inan>/empty<adj><p3><pl><inan>$ ^romawa<n>/boy<n>$ ^si<prn><pers><p3><pl><an><spc>/the<det><def><sp>$ ^farkor<v><iv><p3><pl><an>/study<vblex><iv><p3><pl><an>$^.<sent>/.<sent>$^.<sent>/.<sent>$
+
   '''Biltrans Output:''' ^Wóláchííʼ<n>/Ant<n>$ ^ghan<n><px3sg>/house<n><px3sg>$ ^*binaa/*binaa$ ^ałhéénílyeed<v>/go<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 
   '''Translation Output:''' #Ant #house *binaa #go.  
 
   '''Translation Output:''' #Ant #house *binaa #go.  
  
Line 67: Line 62:
 
   '''Original sentence:''' Jooł nikídílniihí tsáskʼeh biyaa íímááz.  
 
   '''Original sentence:''' Jooł nikídílniihí tsáskʼeh biyaa íímááz.  
 
   '''Intended Translation:''' The basketball rolled underneath the bed.
 
   '''Intended Translation:''' The basketball rolled underneath the bed.
   '''Biltrans Output:''' ^Kain<v><iv><p1><pl><ex>/Sit<vblex><iv><p1><pl><ex>$ ^do<pr>/at<pr>$ ^Sepse<n>/Sepse<n>$ ^fa<pr>/to<pr>$ ^fafyar<v><iv><p1><pl><ex>/tell<vblex><iv><p1><pl><ex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
+
   '''Biltrans Output:''' ^Jooł<n>/Ball<n>$ ^nikídílniihí<adj>/basketball<adj>$ ^tsáskʼeh<n>/bed<n>$ ^*biyaa/*biyaa$ ^íímááz<v>/roll<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 
   '''Translation Output:''' #basketball #Ball #bed *biyaa #roll.  
 
   '''Translation Output:''' #basketball #Ball #bed *biyaa #roll.  
  
Line 73: Line 68:
 
   '''Original sentence:''' Shiyázhí, hoghandi naanishísh ałtso íinilaa?
 
   '''Original sentence:''' Shiyázhí, hoghandi naanishísh ałtso íinilaa?
 
   '''Intended Translation:''' My child, did you finish your homework?
 
   '''Intended Translation:''' My child, did you finish your homework?
   '''Biltrans Output:'''^Snewar<n>/Belly<n>$ ^det<det><pos><px3sg><sg><spc>/prpers<det><pos><px3sg><sg><spc>$ ^ba<v><iv><p3><sg>/big<adj><p3><sg>$^.<sent>/.<sent>$^.<sent>/.
+
   '''Biltrans Output:'''^Yázhí<n><px1sg>/Little<n><px1sg>$^,<cm>/,<cm>$ ^hoghandi<adj>/home<adj>$ ^naanishísh<n>/work<n>$ ^ałtso<adj>/completed<adj>$ ^íinilaa<v>/finish<vblex>$^?<sent>/?<sent>$^.<sent>/.<sent>$
 
   '''Translation Output:''' #Little, #home *naanishísh #completed #finish?
 
   '''Translation Output:''' #Little, #home *naanishísh #completed #finish?
  
Line 79: Line 74:
 
   '''Original sentence:''' Shiyázhí, nízhiʼ naaltsoos bikááʼ íníleeh.
 
   '''Original sentence:''' Shiyázhí, nízhiʼ naaltsoos bikááʼ íníleeh.
 
   '''Intended Translation:''' My child, write your name on the paper.
 
   '''Intended Translation:''' My child, write your name on the paper.
   '''Biltrans Output:''' ^Snai<v><tv><p3><sg>/Enlighten<vblex><tv><p3><sg>$ ^aya<prn><pers><p1><sg>/prpers<prn><subj><p1><mf><sg>$ ^ro<pr>/at<pr>$ ^marandan<n>/trip<n>$ ^det<det><pos><px1sg><sg><spc>/prpers<det><pos><px1sg><sg><spc>$^.<sent>/.<sent>$^.<sent>/.<sent>
+
   '''Biltrans Output:''' ^Yázhí<n><px1sg>/Little<n><px1sg>$^,<cm>/,<cm>$ ^hoghandi<adj>/home<adj>$ ^*naanishísh/*naanishísh$ ^ałtso<adj>/completed<adj>$ ^íinilaa<v>/finish<vblex>$^?<sent>/?<sent>$^.<sent>/.<sent>$
 
   '''Translation Output:''' #Little, *nízhiʼ #paper *bikááʼ #write.
 
   '''Translation Output:''' #Little, *nízhiʼ #paper *bikááʼ #write.
  
Line 85: Line 80:
 
   '''Original sentence:''' Naaltsoos tsitsʼaaʼ naaltsoos atseedzį́ biiʼ hadéébįįd.
 
   '''Original sentence:''' Naaltsoos tsitsʼaaʼ naaltsoos atseedzį́ biiʼ hadéébįįd.
 
   '''Intended Translation:''' The cardboard box is filled with newspapers.
 
   '''Intended Translation:''' The cardboard box is filled with newspapers.
   '''Biltrans Output:''' ^Ikak<n>/Snake<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^ne<det><dem>/@ne<det><dem>$ ^snonsnon<n>/name<n>$ ^det<det><pos><px3sg><sg><spc>/prpers<det><pos><px3sg><sg><spc>$ ^Kormsamba<n>/Kormsamba<n>$^.<sent>/.<sent>$^.<sent>/.<sent>$
+
   '''Biltrans Output:''' ^Naaltsoos<n>/Paper<n>$ ^*tsitsʼaaʼ/*tsitsʼaaʼ$ ^naaltsoos<n>/paper<n>$ ^*atseedzį́/*atseedzį́$ ^*biiʼ/*biiʼ$ ^hadéébįįd<v>/fill<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
   '''Translation Output:''' #Paper *tsitsʼaaʼ #paper *atseedzį́ *biiʼ #fill
+
   '''Translation Output:''' #Paper #box #paper *atseedzį́ *biiʼ #fill
  
 
==Additions==
 
==Additions==
===Disambiguation===
+
* ~20 noun stems
* Added a disambiguation rule, selecting the article form of the word "na" over the pronoun form when preceded by noun
+
* 39 twol rules
** Brought ambiguation in corpus from ~1.04 to ~1.03. Note an increase from our original ambiguation score due to more words being added.
+
* 1 disambiguation rule
===Structural Transfer===
+
* 1 lexical selection rule
* Added a rule which added an implicit subject in the absence of an explicit one.
+
* 1 transfer rule
* Changed rule to specify the type (subject/object) of pronoun.
 
* Added a rule which specified def/dem determiner phrases
 
* Added a rule that correctly translated the adverb endings on determiners
 
* These changes brought our WER from 72.22% to 27.78% and our PER from 63.89% to 22.22%
 
===Adding Stems===
 
* Added ~new 100 new stems
 
  
 
==Polished RBMT System==
 
==Polished RBMT System==
* Precision: 87.77293%
+
* Stems in transducer: 316
* Recall: 94.81132%
+
* Over nav.longer.txt
* Coverage over large corpus: 7072 / 14287 (~0.49499545040946314832)
+
** coverage: 2437 / 4819 (~0.50570657812824237394)
* Stems in transducer: 382
+
* Over nav.basic.txt:
* Over bhw.longer.txt:
+
* coverage: 762 / 1364 (~0.55865102639296187683)
** Word Error Rate (WER): 80.28 %
+
** Number of words in reference: 760
** Position-independent word error rate (PER): 71.83 %
+
** Number of words in test: 1188
** Percentage of unknown words: 15.54 %
+
** Percentage of unknown words: 00.00 %
** Number of position-independent correct words: 81/284
+
** Edit distance: 1176
** Coverage: 211 / 250 (0.844)
+
** Word Error Rate (WER): 154.74 %
* Over bhw.corpus.large.txt
+
** Position-independent word error rate (PER): 153.68 %
** Coverage: 5817 / 13768 (~0.42250145264381173736)
+
** Number of position-independent correct words: 20
 +
**Number of unknown words which were free rides: 0
 +
**Percentage of unknown words that were free rides: 0%
 +
 
 +
 
  
  

Latest revision as of 14:49, 8 May 2022

Resources for machine translation between Navajo and English

Developed Resources

External Resources

NAV -> ENG Evaluation

Coverage Analysis

  • Monolingual transducer coverage of small corpus: 407 / 1216 (~33.47%)
  • Bilingual transducer coverage of small corpus: 533 / 1345 (~39.63%)

Sentence Evaluation

1.

 Original sentence: Dibé bikééʼ dínááh.
 Intended Translation: Go after the sheep.
 Biltrans Output: ^Dibé<n>/Wood<n>$ ^*bikééʼ/*bikééʼ$ ^dínááh<v>/go<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 Translation Output: #Wood *bikééʼ #go.

2.

 Original sentence: Nimá dóó nizhéʼé bíighah nídaah.
 Intended Translation: Sit beside your mother and father.
 Biltrans Output: ^Má<n><px2sg>/Mother<n><px2sg>$ ^dóó<cnjcoo>/and<cnjcoo>$ ^zhéʼé<n><px2sg>/father<n><px2sg>$ ^*bíighah/*bíighah$ ^nídaah<v>/sit<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 Translation Output: #Mother and #father *bíighah #sit.

3.

 Original sentence: Chidí biyiʼ ayóo deesdoi.  
 Intended Translation: It is very hot inside the vehicle.
 Biltrans Output: ^Chidí<n>/Automobile<n>$ ^*biyiʼ/*biyiʼ$ ^ayóo<adv>/remarkably<adv>$ ^deesdoi<v>/hot<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$  
 Translation Output: #Automobile *biyiʼ *ayóo #hot.

4.

 Original sentence: Kodi atooʼ hólǫ́.
 Intended Translation: Here is some stew.
 Biltrans Output: ^Kodi<adv>/Here<adv>$ ^atooʼ<n>/stew<n>$ ^*hólǫ́/*hólǫ́$^.<sent>/.<sent>$^.<sent>/.<sent>$
 Translation Output: Here #stew *hólǫ́.

5.

 Original sentence: Atooʼ łaʼ naa deeshkááł. 
 Intended Translation: I will give you some stew.
 Biltrans Output: ^Atooʼ<n>/Stew<n>$ ^łaʼ<det>/some<det>$ ^naa<post>/around<pp>$ ^deeshkááł<v>/give<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 Translation Output: #Stew #some #around #give.

6.

 Original sentence: Wóláchííʼ bighan binaa ałhéénílyeed. 
 Intended Translation: Go around the ant mound.
 Biltrans Output: ^Wóláchííʼ<n>/Ant<n>$ ^ghan<n><px3sg>/house<n><px3sg>$ ^*binaa/*binaa$ ^ałhéénílyeed<v>/go<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 Translation Output: #Ant #house *binaa #go. 


7.

 Original sentence: Jooł nikídílniihí tsáskʼeh biyaa íímááz. 
 Intended Translation: The basketball rolled underneath the bed.
 Biltrans Output: ^Jooł<n>/Ball<n>$ ^nikídílniihí<adj>/basketball<adj>$ ^tsáskʼeh<n>/bed<n>$ ^*biyaa/*biyaa$ ^íímááz<v>/roll<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 Translation Output: #basketball #Ball #bed *biyaa #roll. 

8.

 Original sentence: Shiyázhí, hoghandi naanishísh ałtso íinilaa?
 Intended Translation: My child, did you finish your homework?
 Biltrans Output:^Yázhí<n><px1sg>/Little<n><px1sg>$^,<cm>/,<cm>$ ^hoghandi<adj>/home<adj>$ ^naanishísh<n>/work<n>$ ^ałtso<adj>/completed<adj>$ ^íinilaa<v>/finish<vblex>$^?<sent>/?<sent>$^.<sent>/.<sent>$
 Translation Output: #Little, #home *naanishísh #completed #finish?

9.

 Original sentence: Shiyázhí, nízhiʼ naaltsoos bikááʼ íníleeh.
 Intended Translation: My child, write your name on the paper.
 Biltrans Output: ^Yázhí<n><px1sg>/Little<n><px1sg>$^,<cm>/,<cm>$ ^hoghandi<adj>/home<adj>$ ^*naanishísh/*naanishísh$ ^ałtso<adj>/completed<adj>$ ^íinilaa<v>/finish<vblex>$^?<sent>/?<sent>$^.<sent>/.<sent>$
 Translation Output: #Little, *nízhiʼ #paper *bikááʼ #write.

10.

 Original sentence: Naaltsoos tsitsʼaaʼ naaltsoos atseedzį́ biiʼ hadéébįįd.
 Intended Translation: The cardboard box is filled with newspapers.
 Biltrans Output: ^Naaltsoos<n>/Paper<n>$ ^*tsitsʼaaʼ/*tsitsʼaaʼ$ ^naaltsoos<n>/paper<n>$ ^*atseedzį́/*atseedzį́$ ^*biiʼ/*biiʼ$ ^hadéébįįd<v>/fill<vblex>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 Translation Output: #Paper #box #paper *atseedzį́ *biiʼ #fill

Additions

  • ~20 noun stems
  • 39 twol rules
  • 1 disambiguation rule
  • 1 lexical selection rule
  • 1 transfer rule

Polished RBMT System

  • Stems in transducer: 316
  • Over nav.longer.txt
    • coverage: 2437 / 4819 (~0.50570657812824237394)
  • Over nav.basic.txt:
  • coverage: 762 / 1364 (~0.55865102639296187683)
    • Number of words in reference: 760
    • Number of words in test: 1188
    • Percentage of unknown words: 00.00 %
    • Edit distance: 1176
    • Word Error Rate (WER): 154.74 %
    • Position-independent word error rate (PER): 153.68 %
    • Number of position-independent correct words: 20
    • Number of unknown words which were free rides: 0
    • Percentage of unknown words that were free rides: 0%