Difference between revisions of "Dzongkha and English"

From LING073
Jump to: navigation, search
(dzo → eng evaluation)
(Final Evaluation)
 
(26 intermediate revisions by the same user not shown)
Line 12: Line 12:
  
 
=== Coverage Analysis ===
 
=== Coverage Analysis ===
* Monolingual transducer coverage: 35 / 36 (~0.972)
+
* Monolingual transducer coverage: 36 / 36  
* Bilingual transducer coverage: 35 / 36 (~0.972)
+
* Bilingual transducer coverage: 36 / 36
 +
* Total number of tokens in the dzo.sentences.txt file: 55
 +
* Total number of tokens not found in the dictionary (number of unknown words): 0
  
 
=== Sentence Evaluation ===
 
=== Sentence Evaluation ===
Line 20: Line 22:
 
   '''Original Sentence:''' བྱི་ལི་དེ་སྒྲོམ་ནང་འདུག་
 
   '''Original Sentence:''' བྱི་ལི་དེ་སྒྲོམ་ནང་འདུག་
 
   '''Intended English Translation:''' The cat is in the box.
 
   '''Intended English Translation:''' The cat is in the box.
   '''Biltrans Output:''' ^བྱི་ལི་<n>/cat<n>$ ^དེ་<det>/the<det>$ ^སྒྲོམ་<n><loc>/box<n><loc>$ ^འདུག་<vbser>/is<vbser>
+
   '''Biltrans Output:''' ^བྱི་ལི་<n>/cat<n>$^དེ་<det>/the<det><def>$^སྒྲོམ་<n><loc>/box<n><loc>$^འདུག་<vbser>/is<vbser>$
 
   '''Translation Output:''' #cat #the #box #is
 
   '''Translation Output:''' #cat #the #box #is
  
Line 26: Line 28:
 
   '''Original Sentence:''' ང་བཅས་ཀྱི་ཁྱིམ་སྦོམ་ཡོད་
 
   '''Original Sentence:''' ང་བཅས་ཀྱི་ཁྱིམ་སྦོམ་ཡོད་
 
   '''Intended English Translation:''' Our house is big.
 
   '''Intended English Translation:''' Our house is big.
   '''Biltrans Output:''' ^ང་བཅས་<prn><p1><pl><gen>/we<prn><p1><pl><gen>$ ^ཁྱིམ་<n>/house<n>$ ^སྦོམ་<adj>/big<adj>$ ^ཡོད་<vbser>/has<vbser>
+
   '''Biltrans Output:''' ^ང་བཅས་<prn><p1><pl><gen>/we<prn><p1><pl><gen>$^ཁྱིམ་<n>/house<n>$^སྦོམ་<adj>/big<adj>$^ཡོད་<vbser>/has<vbser>$
   '''Translation Output:''' #we #house #big #has
+
   '''Translation Output:''' #we #big #house #has
  
 
==== Sentence 3 ====
 
==== Sentence 3 ====
 
   '''Original Sentence:''' ཁོ་དགེ་སློང་མེན་
 
   '''Original Sentence:''' ཁོ་དགེ་སློང་མེན་
 
   '''Intended English Translation:''' He is not a monk.
 
   '''Intended English Translation:''' He is not a monk.
   '''Biltrans Output:''' ^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$ ^དགེ་སློང་<n>/monk<n>$ ^ཨིན་<vbser><neg>/is<vbser><neg>
+
   '''Biltrans Output:''' ^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^དགེ་སློང་<n>/monk<n>$^ཨིན་<vbser><neg>/is<vbser><neg>$
   '''Translation Output:'''  #he #monk #is
+
   '''Translation Output:'''  #he #monk #is not
  
 
==== Sentence 4 ====
 
==== Sentence 4 ====
 
   '''Original Sentence:''' མོ་ལུ་སྲིངམོ་གཉིས་ཡོད་
 
   '''Original Sentence:''' མོ་ལུ་སྲིངམོ་གཉིས་ཡོད་
 
   '''Intended English Translation:''' She has two younger sisters.
 
   '''Intended English Translation:''' She has two younger sisters.
   '''Biltrans Output:''' ^མོ་<prn><p3><sg><f><dat>/she<prn><p3><sg><f><dat>$ ^སྲིངམོ་<n>/sister<n>$ ^གཉིས་<num>/two<num>$ ^ཡོད་<vbser>/has<vbser>
+
   '''Biltrans Output:''' ^མོ་<prn><p3><sg><f><dat>/she<prn><p3><sg><f><dat>$^སྲིངམོ་<n>/sister<n>$^གཉིས་<num>/two<num>$^ཡོད་<vbser>/has<vbser>$
 
   '''Translation Output:''' #she #sister #two #has
 
   '''Translation Output:''' #she #sister #two #has
  
Line 44: Line 46:
 
   '''Original Sentence:''' མོ་དེ་ཁོ་བ་རྒས་
 
   '''Original Sentence:''' མོ་དེ་ཁོ་བ་རྒས་
 
   '''Intended English Translation:''' She is older than him.
 
   '''Intended English Translation:''' She is older than him.
   '''Biltrans Output:''' ^མོ་<prn><p3><sg><f>/she<prn><p3><sg><f>$ ^དེ་<det>/the<det>$ ^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$ ^རྒས་<adj><comp>/older<adj><comp>
+
   '''Biltrans Output:''' ^མོ་<prn><p3><sg><f>/she<prn><p3><sg><f>$^དེ་<det>/the<det><def>$^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^རྒས་<adj><comp>/older<adj><comp>$
 
   '''Translation Output:''' #she #the #he #older  
 
   '''Translation Output:''' #she #the #he #older  
  
 
==== Sentence 6 ====
 
==== Sentence 6 ====
 
   '''Original Sentence:''' ད་ཉིམ་ཤར་དོ་
 
   '''Original Sentence:''' ད་ཉིམ་ཤར་དོ་
   '''Intended English Translation:''' The sun is shining.
+
   '''Intended English Translation:''' The sun is shining now.
   '''Biltrans Output:'''  
+
   '''Biltrans Output:''' ^ད་<adv>/now<adv>$^ཉིམ་<n>/sun<n>$^ཤར་<v><iv><pres><prog>/shine<vblex><pres><prog>$
   '''Translation Output:'''
+
   '''Translation Output:''' now #sun #shine
  
 
==== Sentence 7 ====
 
==== Sentence 7 ====
 
   '''Original Sentence:''' ང་དུས་ཚོད་ཁར་ལྷོད་ཅི་
 
   '''Original Sentence:''' ང་དུས་ཚོད་ཁར་ལྷོད་ཅི་
   '''Intended English Translation:''' I arrived just in time.
+
   '''Intended English Translation:''' I arrived on time.
   '''Biltrans Output:''' ^ད་<adv>/now<adv>$ ^ཉིམ་<n>/sun<n>$ ^ཤར་<v><iv><pres>/shine<v><iv><pres>
+
   '''Biltrans Output:''' ^ང་<prn><p1><sg>/I<prn><p1><sg>$^དུས་ཚོད་<n><loc>/time<n><loc>$^ལྷོད་<v><iv>/arrive<vblex>$
   '''Translation Output:'''  now #sun #shine
+
   '''Translation Output:'''  #I #time arrived
  
 
==== Sentence 8 ====
 
==== Sentence 8 ====
 
   '''Original Sentence:''' སྲུང་འབྲི་མི་དག་པ་གཅིག་འདུག་
 
   '''Original Sentence:''' སྲུང་འབྲི་མི་དག་པ་གཅིག་འདུག་
 
   '''Intended English Translation:''' There are a few story writers.
 
   '''Intended English Translation:''' There are a few story writers.
   '''Biltrans Output:''' ^སྲུང་<n>/story<n>$ ^འབྲི་<v><tv><vadj>/write<v><tv><vadj>$ ^དག་པ་གཅིག་<adj>/few<adj>$ ^འདུག་<vbser>/is<vbser>
+
   '''Biltrans Output:''' ^སྲུང་<n>/story<n>$^འབྲི་<v><tv><vadj>/write<vblex><vadj>$^གཅིག་<num>/one<num>$^འདུག་<vbser>/is<vbser>$
   '''Translation Output:''' #story #write #few #is  
+
   '''Translation Output:''' #story #write #one #is  
  
 
==== Sentence 9 ====
 
==== Sentence 9 ====
 
   '''Original Sentence:''' རྒྱ་མཚོ་ལས་ནོར་བུ་འཐོབ་
 
   '''Original Sentence:''' རྒྱ་མཚོ་ལས་ནོར་བུ་འཐོབ་
 
   '''Intended English Translation:''' [We] get jewel from the ocean.
 
   '''Intended English Translation:''' [We] get jewel from the ocean.
   '''Biltrans Output:''' ^རྒྱ་མཚོ་<n><abl>/ocean<n><abl>$ ^ནོར་བུ་<n>/pearl<n>$ ^འཐོབ་<v><iv>/get<v><iv>
+
   '''Biltrans Output:''' ^རྒྱ་མཚོ་<n><abl>/ocean<n><abl>$^ནོར་བུ་<n>/pearl<n>$^འཐོབ་<v><iv>/get<vblex>$
 
   '''Translation Output:''' #ocean #pearl #get  
 
   '''Translation Output:''' #ocean #pearl #get  
  
Line 74: Line 76:
 
   '''Original Sentence:''' བྱི་ཙི་ཚུ་ཟུང་གེ་
 
   '''Original Sentence:''' བྱི་ཙི་ཚུ་ཟུང་གེ་
 
   '''Intended English Translation:''' Let's catch the rats.
 
   '''Intended English Translation:''' Let's catch the rats.
   '''Biltrans Output:''' ^བྱི་ཙི་<n><adj><qnt>/rat<n><adj><qnt>$ ^ཟུང་<v><tv>/catch<v><tv>$ ^གེ་<vaux>/let<vaux>
+
   '''Biltrans Output:''' ^བྱི་ཙི་<n><adj><qnt>/rat<n><adj><qnt>$^ཟུང་<v><tv>/catch<vblex>$^གེ་<vaux><adh>/shall<vaux><adh>$
   '''Translation Output:''' #rat #catch #let
+
   '''Translation Output:''' #rat #catch #shall
  
 +
== Additions ==
 +
 +
* Added 100 more stems in the bilingual dictionary:
 +
** Initial # of stems: 66
 +
** Final # of stems: 166
 +
* Added two more lexical selection rules.
 +
* Added two more structural transfer rules.
 +
 +
== Final Evaluation ==
 +
 +
Dzongkha monolingual transducer:
 +
* Precision and recall against the annotated.basic corpus: We get an error.
 +
* Coverage over the large corpus: 3121/6718 (~0.465)
 +
* # of words in the large corpus: 131018 characters
 +
* # of stems in the transducer: 255 lexicon entries
 +
 +
Dzo-Eng MT:
 +
* WER and PER over test phrases: 83.87 %
 +
* WER and PER over longer corpus: 97.44 %
 +
* Proportion of stems translated correctly in the longer corpus: 101
 +
* Trimmed coverage over longer corpus: 101/132
 +
* Trimmed coverage over large corpus: 9698/21689
 +
* # of tokens in the longer corpus: 132
 +
* # of tokens in the large corpus: 21689
  
 
[[Category:Sp21_TranslationPairs]] [[Category:Dzongkha]] [[Category:English]]
 
[[Category:Sp21_TranslationPairs]] [[Category:Dzongkha]] [[Category:English]]

Latest revision as of 04:32, 25 May 2021

Resources for machine translation between Dzongkha and English.

External Resources

  • Find our dzo-eng machine translation repo here.
  • Find our dzo morphological transducer repo here.
  • Fine apertium-eng repo here.
  • Find our dzo-eng corpus here.

dzo → eng evaluation

Coverage Analysis

  • Monolingual transducer coverage: 36 / 36
  • Bilingual transducer coverage: 36 / 36
  • Total number of tokens in the dzo.sentences.txt file: 55
  • Total number of tokens not found in the dictionary (number of unknown words): 0

Sentence Evaluation

Sentence 1

 Original Sentence: བྱི་ལི་དེ་སྒྲོམ་ནང་འདུག་
 Intended English Translation: The cat is in the box.
 Biltrans Output: ^བྱི་ལི་<n>/cat<n>$^དེ་<det>/the<det><def>$^སྒྲོམ་<n><loc>/box<n><loc>$^འདུག་<vbser>/is<vbser>$
 Translation Output: #cat #the #box #is

Sentence 2

 Original Sentence: ང་བཅས་ཀྱི་ཁྱིམ་སྦོམ་ཡོད་
 Intended English Translation: Our house is big.
 Biltrans Output: ^ང་བཅས་<prn><p1><pl><gen>/we<prn><p1><pl><gen>$^ཁྱིམ་<n>/house<n>$^སྦོམ་<adj>/big<adj>$^ཡོད་<vbser>/has<vbser>$
 Translation Output: #we #big #house #has

Sentence 3

 Original Sentence: ཁོ་དགེ་སློང་མེན་
 Intended English Translation: He is not a monk.
 Biltrans Output: ^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^དགེ་སློང་<n>/monk<n>$^ཨིན་<vbser><neg>/is<vbser><neg>$
 Translation Output:  #he #monk #is not

Sentence 4

 Original Sentence: མོ་ལུ་སྲིངམོ་གཉིས་ཡོད་
 Intended English Translation: She has two younger sisters.
 Biltrans Output: ^མོ་<prn><p3><sg><f><dat>/she<prn><p3><sg><f><dat>$^སྲིངམོ་<n>/sister<n>$^གཉིས་<num>/two<num>$^ཡོད་<vbser>/has<vbser>$
 Translation Output: #she #sister #two #has

Sentence 5

 Original Sentence: མོ་དེ་ཁོ་བ་རྒས་
 Intended English Translation: She is older than him.
 Biltrans Output: ^མོ་<prn><p3><sg><f>/she<prn><p3><sg><f>$^དེ་<det>/the<det><def>$^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^རྒས་<adj><comp>/older<adj><comp>$
 Translation Output: #she #the #he #older 

Sentence 6

 Original Sentence: ད་ཉིམ་ཤར་དོ་
 Intended English Translation: The sun is shining now.
 Biltrans Output: ^ད་<adv>/now<adv>$^ཉིམ་<n>/sun<n>$^ཤར་<v><iv><pres><prog>/shine<vblex><pres><prog>$
 Translation Output: now #sun #shine

Sentence 7

 Original Sentence: ང་དུས་ཚོད་ཁར་ལྷོད་ཅི་
 Intended English Translation: I arrived on time.
 Biltrans Output: ^ང་<prn><p1><sg>/I<prn><p1><sg>$^དུས་ཚོད་<n><loc>/time<n><loc>$^ལྷོད་<v><iv>/arrive<vblex>$
 Translation Output:  #I #time arrived

Sentence 8

 Original Sentence: སྲུང་འབྲི་མི་དག་པ་གཅིག་འདུག་
 Intended English Translation: There are a few story writers.
 Biltrans Output: ^སྲུང་<n>/story<n>$^འབྲི་<v><tv><vadj>/write<vblex><vadj>$^གཅིག་<num>/one<num>$^འདུག་<vbser>/is<vbser>$
 Translation Output: #story #write #one #is 

Sentence 9

 Original Sentence: རྒྱ་མཚོ་ལས་ནོར་བུ་འཐོབ་
 Intended English Translation: [We] get jewel from the ocean.
 Biltrans Output: ^རྒྱ་མཚོ་<n><abl>/ocean<n><abl>$^ནོར་བུ་<n>/pearl<n>$^འཐོབ་<v><iv>/get<vblex>$
 Translation Output: #ocean #pearl #get 

Sentence 10

 Original Sentence: བྱི་ཙི་ཚུ་ཟུང་གེ་
 Intended English Translation: Let's catch the rats.
 Biltrans Output: ^བྱི་ཙི་<n><adj><qnt>/rat<n><adj><qnt>$^ཟུང་<v><tv>/catch<vblex>$^གེ་<vaux><adh>/shall<vaux><adh>$
 Translation Output: #rat #catch #shall

Additions

  • Added 100 more stems in the bilingual dictionary:
    • Initial # of stems: 66
    • Final # of stems: 166
  • Added two more lexical selection rules.
  • Added two more structural transfer rules.

Final Evaluation

Dzongkha monolingual transducer:

  • Precision and recall against the annotated.basic corpus: We get an error.
  • Coverage over the large corpus: 3121/6718 (~0.465)
  • # of words in the large corpus: 131018 characters
  • # of stems in the transducer: 255 lexicon entries

Dzo-Eng MT:

  • WER and PER over test phrases: 83.87 %
  • WER and PER over longer corpus: 97.44 %
  • Proportion of stems translated correctly in the longer corpus: 101
  • Trimmed coverage over longer corpus: 101/132
  • Trimmed coverage over large corpus: 9698/21689
  • # of tokens in the longer corpus: 132
  • # of tokens in the large corpus: 21689