Difference between revisions of "Dzongkha and English"

From LING073
Jump to: navigation, search
(Final Evaluation)
(Final Evaluation)
Line 99: Line 99:
 
* WER and PER over longer corpus: 97.44 %
 
* WER and PER over longer corpus: 97.44 %
 
* Proportion of stems translated correctly in the longer corpus: 36/36
 
* Proportion of stems translated correctly in the longer corpus: 36/36
* Trimmed coverage over longer corpora: 36/36
+
* Trimmed coverage over longer corpus: 36/36
* Trimmed coverage over large corpora:  
+
* Trimmed coverage over large corpus:  
* # of tokens in longer corpora: 36
+
* # of tokens in the longer corpus: 36
* # of tokens in large corpora: 8187
+
* # of tokens in the large corpus: 8187
  
 
[[Category:Sp21_TranslationPairs]] [[Category:Dzongkha]] [[Category:English]]
 
[[Category:Sp21_TranslationPairs]] [[Category:Dzongkha]] [[Category:English]]

Revision as of 04:27, 25 May 2021

Resources for machine translation between Dzongkha and English.

External Resources

  • Find our dzo-eng machine translation repo here.
  • Find our dzo morphological transducer repo here.
  • Fine apertium-eng repo here.
  • Find our dzo-eng corpus here.

dzo → eng evaluation

Coverage Analysis

  • Monolingual transducer coverage: 36 / 36
  • Bilingual transducer coverage: 36 / 36
  • Total number of tokens in the dzo.sentences.txt file: 55
  • Total number of tokens not found in the dictionary (number of unknown words): 0

Sentence Evaluation

Sentence 1

 Original Sentence: བྱི་ལི་དེ་སྒྲོམ་ནང་འདུག་
 Intended English Translation: The cat is in the box.
 Biltrans Output: ^བྱི་ལི་<n>/cat<n>$^དེ་<det>/the<det><def>$^སྒྲོམ་<n><loc>/box<n><loc>$^འདུག་<vbser>/is<vbser>$
 Translation Output: #cat #the #box #is

Sentence 2

 Original Sentence: ང་བཅས་ཀྱི་ཁྱིམ་སྦོམ་ཡོད་
 Intended English Translation: Our house is big.
 Biltrans Output: ^ང་བཅས་<prn><p1><pl><gen>/we<prn><p1><pl><gen>$^ཁྱིམ་<n>/house<n>$^སྦོམ་<adj>/big<adj>$^ཡོད་<vbser>/has<vbser>$
 Translation Output: #we #big #house #has

Sentence 3

 Original Sentence: ཁོ་དགེ་སློང་མེན་
 Intended English Translation: He is not a monk.
 Biltrans Output: ^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^དགེ་སློང་<n>/monk<n>$^ཨིན་<vbser><neg>/is<vbser><neg>$
 Translation Output:  #he #monk #is not

Sentence 4

 Original Sentence: མོ་ལུ་སྲིངམོ་གཉིས་ཡོད་
 Intended English Translation: She has two younger sisters.
 Biltrans Output: ^མོ་<prn><p3><sg><f><dat>/she<prn><p3><sg><f><dat>$^སྲིངམོ་<n>/sister<n>$^གཉིས་<num>/two<num>$^ཡོད་<vbser>/has<vbser>$
 Translation Output: #she #sister #two #has

Sentence 5

 Original Sentence: མོ་དེ་ཁོ་བ་རྒས་
 Intended English Translation: She is older than him.
 Biltrans Output: ^མོ་<prn><p3><sg><f>/she<prn><p3><sg><f>$^དེ་<det>/the<det><def>$^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^རྒས་<adj><comp>/older<adj><comp>$
 Translation Output: #she #the #he #older 

Sentence 6

 Original Sentence: ད་ཉིམ་ཤར་དོ་
 Intended English Translation: The sun is shining now.
 Biltrans Output: ^ད་<adv>/now<adv>$^ཉིམ་<n>/sun<n>$^ཤར་<v><iv><pres><prog>/shine<vblex><pres><prog>$
 Translation Output: now #sun #shine

Sentence 7

 Original Sentence: ང་དུས་ཚོད་ཁར་ལྷོད་ཅི་
 Intended English Translation: I arrived on time.
 Biltrans Output: ^ང་<prn><p1><sg>/I<prn><p1><sg>$^དུས་ཚོད་<n><loc>/time<n><loc>$^ལྷོད་<v><iv>/arrive<vblex>$
 Translation Output:  #I #time arrived

Sentence 8

 Original Sentence: སྲུང་འབྲི་མི་དག་པ་གཅིག་འདུག་
 Intended English Translation: There are a few story writers.
 Biltrans Output: ^སྲུང་<n>/story<n>$^འབྲི་<v><tv><vadj>/write<vblex><vadj>$^གཅིག་<num>/one<num>$^འདུག་<vbser>/is<vbser>$
 Translation Output: #story #write #one #is 

Sentence 9

 Original Sentence: རྒྱ་མཚོ་ལས་ནོར་བུ་འཐོབ་
 Intended English Translation: [We] get jewel from the ocean.
 Biltrans Output: ^རྒྱ་མཚོ་<n><abl>/ocean<n><abl>$^ནོར་བུ་<n>/pearl<n>$^འཐོབ་<v><iv>/get<vblex>$
 Translation Output: #ocean #pearl #get 

Sentence 10

 Original Sentence: བྱི་ཙི་ཚུ་ཟུང་གེ་
 Intended English Translation: Let's catch the rats.
 Biltrans Output: ^བྱི་ཙི་<n><adj><qnt>/rat<n><adj><qnt>$^ཟུང་<v><tv>/catch<vblex>$^གེ་<vaux><adh>/shall<vaux><adh>$
 Translation Output: #rat #catch #shall

Additions

  • Added 100 more stems in the bilingual dictionary:
    • Initial # of stems: 66
    • Final # of stems: 166
  • Added two more lexical selection rules.
  • Added two more structural transfer rules.

Final Evaluation

Dzongkha monolingual transducer:

  • Precision and recall against the annotated.basic corpus: We get an error.
  • Coverage over the large corpus: 3121/6718 (~0.465)
  • # of words in the large corpus: 131018 characters
  • # of stems in the transducer: 255 lexicon entries

Dzo-Eng MT:

  • WER and PER over test phrases: 83.87 %
  • WER and PER over longer corpus: 97.44 %
  • Proportion of stems translated correctly in the longer corpus: 36/36
  • Trimmed coverage over longer corpus: 36/36
  • Trimmed coverage over large corpus:
  • # of tokens in the longer corpus: 36
  • # of tokens in the large corpus: 8187