Difference between revisions of "Dzongkha and English"

From LING073
Jump to: navigation, search
(Final Evaluation)
 
(47 intermediate revisions by the same user not shown)
Line 5: Line 5:
  
 
* Find our dzo-eng machine translation repo [https://github.swarthmore.edu/Ling073-sp21/ling073-dzo-eng here].
 
* Find our dzo-eng machine translation repo [https://github.swarthmore.edu/Ling073-sp21/ling073-dzo-eng here].
* Find our morphological transducer repo [https://github.swarthmore.edu/Ling073-sp21/ling073-dzo here].
+
* Find our dzo morphological transducer repo [https://github.swarthmore.edu/Ling073-sp21/ling073-dzo here].
* Find our initial corpus [https://github.swarthmore.edu/Ling073-sp21/ling073-dzo-corpus/ here].
+
* Fine apertium-eng repo [https://github.com/apertium/apertium-eng here].
 +
* Find our dzo-eng corpus [https://github.swarthmore.edu/Ling073-sp21/ling073-dzo-eng-corpus here].
 +
 
 +
== dzo → eng evaluation ==
 +
 
 +
=== Coverage Analysis ===
 +
* Monolingual transducer coverage: 36 / 36
 +
* Bilingual transducer coverage: 36 / 36
 +
* Total number of tokens in the dzo.sentences.txt file: 55
 +
* Total number of tokens not found in the dictionary (number of unknown words): 0
 +
 
 +
=== Sentence Evaluation ===
 +
 
 +
==== Sentence 1 ====
 +
  '''Original Sentence:''' བྱི་ལི་དེ་སྒྲོམ་ནང་འདུག་
 +
  '''Intended English Translation:''' The cat is in the box.
 +
  '''Biltrans Output:''' ^བྱི་ལི་<n>/cat<n>$^དེ་<det>/the<det><def>$^སྒྲོམ་<n><loc>/box<n><loc>$^འདུག་<vbser>/is<vbser>$
 +
  '''Translation Output:''' #cat #the #box #is
 +
 
 +
==== Sentence 2 ====
 +
  '''Original Sentence:''' ང་བཅས་ཀྱི་ཁྱིམ་སྦོམ་ཡོད་
 +
  '''Intended English Translation:''' Our house is big.
 +
  '''Biltrans Output:''' ^ང་བཅས་<prn><p1><pl><gen>/we<prn><p1><pl><gen>$^ཁྱིམ་<n>/house<n>$^སྦོམ་<adj>/big<adj>$^ཡོད་<vbser>/has<vbser>$
 +
  '''Translation Output:''' #we #big #house #has
  
== Developed Resources ==
+
==== Sentence 3 ====
* Find our dzo-eng corpus [https://github.swarthmore.edu/Ling073-sp21/ling073-dzo-eng-corpus here].
+
  '''Original Sentence:''' ཁོ་དགེ་སློང་མེན་
 +
  '''Intended English Translation:''' He is not a monk.
 +
  '''Biltrans Output:''' ^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^དགེ་སློང་<n>/monk<n>$^ཨིན་<vbser><neg>/is<vbser><neg>$
 +
  '''Translation Output:'''  #he #monk #is not
 +
 
 +
==== Sentence 4 ====
 +
  '''Original Sentence:''' མོ་ལུ་སྲིངམོ་གཉིས་ཡོད་
 +
  '''Intended English Translation:''' She has two younger sisters.
 +
  '''Biltrans Output:''' ^མོ་<prn><p3><sg><f><dat>/she<prn><p3><sg><f><dat>$^སྲིངམོ་<n>/sister<n>$^གཉིས་<num>/two<num>$^ཡོད་<vbser>/has<vbser>$
 +
  '''Translation Output:''' #she #sister #two #has
 +
 
 +
==== Sentence 5 ====
 +
  '''Original Sentence:''' མོ་དེ་ཁོ་བ་རྒས་
 +
  '''Intended English Translation:''' She is older than him.
 +
  '''Biltrans Output:''' ^མོ་<prn><p3><sg><f>/she<prn><p3><sg><f>$^དེ་<det>/the<det><def>$^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^རྒས་<adj><comp>/older<adj><comp>$
 +
  '''Translation Output:''' #she #the #he #older
 +
 
 +
==== Sentence 6 ====
 +
  '''Original Sentence:''' ད་ཉིམ་ཤར་དོ་
 +
  '''Intended English Translation:''' The sun is shining now.
 +
  '''Biltrans Output:''' ^ད་<adv>/now<adv>$^ཉིམ་<n>/sun<n>$^ཤར་<v><iv><pres><prog>/shine<vblex><pres><prog>$
 +
  '''Translation Output:''' now #sun #shine
 +
 
 +
==== Sentence 7 ====
 +
  '''Original Sentence:''' ང་དུས་ཚོད་ཁར་ལྷོད་ཅི་
 +
  '''Intended English Translation:''' I arrived on time.
 +
  '''Biltrans Output:''' ^ང་<prn><p1><sg>/I<prn><p1><sg>$^དུས་ཚོད་<n><loc>/time<n><loc>$^ལྷོད་<v><iv>/arrive<vblex>$
 +
  '''Translation Output:'''  #I #time arrived
 +
 
 +
==== Sentence 8 ====
 +
  '''Original Sentence:''' སྲུང་འབྲི་མི་དག་པ་གཅིག་འདུག་
 +
  '''Intended English Translation:''' There are a few story writers.
 +
  '''Biltrans Output:''' ^སྲུང་<n>/story<n>$^འབྲི་<v><tv><vadj>/write<vblex><vadj>$^གཅིག་<num>/one<num>$^འདུག་<vbser>/is<vbser>$
 +
  '''Translation Output:''' #story #write #one #is
 +
 
 +
==== Sentence 9 ====
 +
  '''Original Sentence:''' རྒྱ་མཚོ་ལས་ནོར་བུ་འཐོབ་
 +
  '''Intended English Translation:''' [We] get jewel from the ocean.
 +
  '''Biltrans Output:''' ^རྒྱ་མཚོ་<n><abl>/ocean<n><abl>$^ནོར་བུ་<n>/pearl<n>$^འཐོབ་<v><iv>/get<vblex>$
 +
  '''Translation Output:''' #ocean #pearl #get
 +
 
 +
==== Sentence 10 ====
 +
  '''Original Sentence:''' བྱི་ཙི་ཚུ་ཟུང་གེ་
 +
  '''Intended English Translation:''' Let's catch the rats.
 +
  '''Biltrans Output:''' ^བྱི་ཙི་<n><adj><qnt>/rat<n><adj><qnt>$^ཟུང་<v><tv>/catch<vblex>$^གེ་<vaux><adh>/shall<vaux><adh>$
 +
  '''Translation Output:''' #rat #catch #shall
 +
 
 +
== Additions ==
 +
 
 +
* Added 100 more stems in the bilingual dictionary:
 +
** Initial # of stems: 66
 +
** Final # of stems: 166
 +
* Added two more lexical selection rules.  
 +
* Added two more structural transfer rules.  
 +
 
 +
== Final Evaluation ==
 +
 
 +
Dzongkha monolingual transducer:
 +
* Precision and recall against the annotated.basic corpus: We get an error.
 +
* Coverage over the large corpus: 3121/6718 (~0.465)
 +
* # of words in the large corpus: 131018 characters
 +
* # of stems in the transducer: 255 lexicon entries
 +
 
 +
Dzo-Eng MT:
 +
* WER and PER over test phrases: 83.87 %
 +
* WER and PER over longer corpus: 97.44 %
 +
* Proportion of stems translated correctly in the longer corpus: 101
 +
* Trimmed coverage over longer corpus: 101/132
 +
* Trimmed coverage over large corpus: 9698/21689
 +
* # of tokens in the longer corpus: 132
 +
* # of tokens in the large corpus: 21689
  
[[Category:Sp21_TranslationPairs]]
+
[[Category:Sp21_TranslationPairs]] [[Category:Dzongkha]] [[Category:English]]

Latest revision as of 04:32, 25 May 2021

Resources for machine translation between Dzongkha and English.

External Resources

  • Find our dzo-eng machine translation repo here.
  • Find our dzo morphological transducer repo here.
  • Fine apertium-eng repo here.
  • Find our dzo-eng corpus here.

dzo → eng evaluation

Coverage Analysis

  • Monolingual transducer coverage: 36 / 36
  • Bilingual transducer coverage: 36 / 36
  • Total number of tokens in the dzo.sentences.txt file: 55
  • Total number of tokens not found in the dictionary (number of unknown words): 0

Sentence Evaluation

Sentence 1

 Original Sentence: བྱི་ལི་དེ་སྒྲོམ་ནང་འདུག་
 Intended English Translation: The cat is in the box.
 Biltrans Output: ^བྱི་ལི་<n>/cat<n>$^དེ་<det>/the<det><def>$^སྒྲོམ་<n><loc>/box<n><loc>$^འདུག་<vbser>/is<vbser>$
 Translation Output: #cat #the #box #is

Sentence 2

 Original Sentence: ང་བཅས་ཀྱི་ཁྱིམ་སྦོམ་ཡོད་
 Intended English Translation: Our house is big.
 Biltrans Output: ^ང་བཅས་<prn><p1><pl><gen>/we<prn><p1><pl><gen>$^ཁྱིམ་<n>/house<n>$^སྦོམ་<adj>/big<adj>$^ཡོད་<vbser>/has<vbser>$
 Translation Output: #we #big #house #has

Sentence 3

 Original Sentence: ཁོ་དགེ་སློང་མེན་
 Intended English Translation: He is not a monk.
 Biltrans Output: ^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^དགེ་སློང་<n>/monk<n>$^ཨིན་<vbser><neg>/is<vbser><neg>$
 Translation Output:  #he #monk #is not

Sentence 4

 Original Sentence: མོ་ལུ་སྲིངམོ་གཉིས་ཡོད་
 Intended English Translation: She has two younger sisters.
 Biltrans Output: ^མོ་<prn><p3><sg><f><dat>/she<prn><p3><sg><f><dat>$^སྲིངམོ་<n>/sister<n>$^གཉིས་<num>/two<num>$^ཡོད་<vbser>/has<vbser>$
 Translation Output: #she #sister #two #has

Sentence 5

 Original Sentence: མོ་དེ་ཁོ་བ་རྒས་
 Intended English Translation: She is older than him.
 Biltrans Output: ^མོ་<prn><p3><sg><f>/she<prn><p3><sg><f>$^དེ་<det>/the<det><def>$^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^རྒས་<adj><comp>/older<adj><comp>$
 Translation Output: #she #the #he #older 

Sentence 6

 Original Sentence: ད་ཉིམ་ཤར་དོ་
 Intended English Translation: The sun is shining now.
 Biltrans Output: ^ད་<adv>/now<adv>$^ཉིམ་<n>/sun<n>$^ཤར་<v><iv><pres><prog>/shine<vblex><pres><prog>$
 Translation Output: now #sun #shine

Sentence 7

 Original Sentence: ང་དུས་ཚོད་ཁར་ལྷོད་ཅི་
 Intended English Translation: I arrived on time.
 Biltrans Output: ^ང་<prn><p1><sg>/I<prn><p1><sg>$^དུས་ཚོད་<n><loc>/time<n><loc>$^ལྷོད་<v><iv>/arrive<vblex>$
 Translation Output:  #I #time arrived

Sentence 8

 Original Sentence: སྲུང་འབྲི་མི་དག་པ་གཅིག་འདུག་
 Intended English Translation: There are a few story writers.
 Biltrans Output: ^སྲུང་<n>/story<n>$^འབྲི་<v><tv><vadj>/write<vblex><vadj>$^གཅིག་<num>/one<num>$^འདུག་<vbser>/is<vbser>$
 Translation Output: #story #write #one #is 

Sentence 9

 Original Sentence: རྒྱ་མཚོ་ལས་ནོར་བུ་འཐོབ་
 Intended English Translation: [We] get jewel from the ocean.
 Biltrans Output: ^རྒྱ་མཚོ་<n><abl>/ocean<n><abl>$^ནོར་བུ་<n>/pearl<n>$^འཐོབ་<v><iv>/get<vblex>$
 Translation Output: #ocean #pearl #get 

Sentence 10

 Original Sentence: བྱི་ཙི་ཚུ་ཟུང་གེ་
 Intended English Translation: Let's catch the rats.
 Biltrans Output: ^བྱི་ཙི་<n><adj><qnt>/rat<n><adj><qnt>$^ཟུང་<v><tv>/catch<vblex>$^གེ་<vaux><adh>/shall<vaux><adh>$
 Translation Output: #rat #catch #shall

Additions

  • Added 100 more stems in the bilingual dictionary:
    • Initial # of stems: 66
    • Final # of stems: 166
  • Added two more lexical selection rules.
  • Added two more structural transfer rules.

Final Evaluation

Dzongkha monolingual transducer:

  • Precision and recall against the annotated.basic corpus: We get an error.
  • Coverage over the large corpus: 3121/6718 (~0.465)
  • # of words in the large corpus: 131018 characters
  • # of stems in the transducer: 255 lexicon entries

Dzo-Eng MT:

  • WER and PER over test phrases: 83.87 %
  • WER and PER over longer corpus: 97.44 %
  • Proportion of stems translated correctly in the longer corpus: 101
  • Trimmed coverage over longer corpus: 101/132
  • Trimmed coverage over large corpus: 9698/21689
  • # of tokens in the longer corpus: 132
  • # of tokens in the large corpus: 21689