Difference between revisions of "Dzongkha and English"
From LING073
(→Final Evaluation) |
(→Final Evaluation) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 98: | Line 98: | ||
* WER and PER over test phrases: 83.87 % | * WER and PER over test phrases: 83.87 % | ||
* WER and PER over longer corpus: 97.44 % | * WER and PER over longer corpus: 97.44 % | ||
− | * Proportion of stems translated correctly in the longer corpus: | + | * Proportion of stems translated correctly in the longer corpus: 101 |
− | * Trimmed coverage over longer | + | * Trimmed coverage over longer corpus: 101/132 |
− | * Trimmed coverage over large | + | * Trimmed coverage over large corpus: 9698/21689 |
− | * # of tokens in longer | + | * # of tokens in the longer corpus: 132 |
− | * # of tokens in large | + | * # of tokens in the large corpus: 21689 |
[[Category:Sp21_TranslationPairs]] [[Category:Dzongkha]] [[Category:English]] | [[Category:Sp21_TranslationPairs]] [[Category:Dzongkha]] [[Category:English]] |
Latest revision as of 04:32, 25 May 2021
Resources for machine translation between Dzongkha and English.
Contents
External Resources
- Find our dzo-eng machine translation repo here.
- Find our dzo morphological transducer repo here.
- Fine apertium-eng repo here.
- Find our dzo-eng corpus here.
dzo → eng evaluation
Coverage Analysis
- Monolingual transducer coverage: 36 / 36
- Bilingual transducer coverage: 36 / 36
- Total number of tokens in the dzo.sentences.txt file: 55
- Total number of tokens not found in the dictionary (number of unknown words): 0
Sentence Evaluation
Sentence 1
Original Sentence: བྱི་ལི་དེ་སྒྲོམ་ནང་འདུག་ Intended English Translation: The cat is in the box. Biltrans Output: ^བྱི་ལི་<n>/cat<n>$^དེ་<det>/the<det><def>$^སྒྲོམ་<n><loc>/box<n><loc>$^འདུག་<vbser>/is<vbser>$ Translation Output: #cat #the #box #is
Sentence 2
Original Sentence: ང་བཅས་ཀྱི་ཁྱིམ་སྦོམ་ཡོད་ Intended English Translation: Our house is big. Biltrans Output: ^ང་བཅས་<prn><p1><pl><gen>/we<prn><p1><pl><gen>$^ཁྱིམ་<n>/house<n>$^སྦོམ་<adj>/big<adj>$^ཡོད་<vbser>/has<vbser>$ Translation Output: #we #big #house #has
Sentence 3
Original Sentence: ཁོ་དགེ་སློང་མེན་ Intended English Translation: He is not a monk. Biltrans Output: ^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^དགེ་སློང་<n>/monk<n>$^ཨིན་<vbser><neg>/is<vbser><neg>$ Translation Output: #he #monk #is not
Sentence 4
Original Sentence: མོ་ལུ་སྲིངམོ་གཉིས་ཡོད་ Intended English Translation: She has two younger sisters. Biltrans Output: ^མོ་<prn><p3><sg><f><dat>/she<prn><p3><sg><f><dat>$^སྲིངམོ་<n>/sister<n>$^གཉིས་<num>/two<num>$^ཡོད་<vbser>/has<vbser>$ Translation Output: #she #sister #two #has
Sentence 5
Original Sentence: མོ་དེ་ཁོ་བ་རྒས་ Intended English Translation: She is older than him. Biltrans Output: ^མོ་<prn><p3><sg><f>/she<prn><p3><sg><f>$^དེ་<det>/the<det><def>$^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^རྒས་<adj><comp>/older<adj><comp>$ Translation Output: #she #the #he #older
Sentence 6
Original Sentence: ད་ཉིམ་ཤར་དོ་ Intended English Translation: The sun is shining now. Biltrans Output: ^ད་<adv>/now<adv>$^ཉིམ་<n>/sun<n>$^ཤར་<v><iv><pres><prog>/shine<vblex><pres><prog>$ Translation Output: now #sun #shine
Sentence 7
Original Sentence: ང་དུས་ཚོད་ཁར་ལྷོད་ཅི་ Intended English Translation: I arrived on time. Biltrans Output: ^ང་<prn><p1><sg>/I<prn><p1><sg>$^དུས་ཚོད་<n><loc>/time<n><loc>$^ལྷོད་<v><iv>/arrive<vblex>$ Translation Output: #I #time arrived
Sentence 8
Original Sentence: སྲུང་འབྲི་མི་དག་པ་གཅིག་འདུག་ Intended English Translation: There are a few story writers. Biltrans Output: ^སྲུང་<n>/story<n>$^འབྲི་<v><tv><vadj>/write<vblex><vadj>$^གཅིག་<num>/one<num>$^འདུག་<vbser>/is<vbser>$ Translation Output: #story #write #one #is
Sentence 9
Original Sentence: རྒྱ་མཚོ་ལས་ནོར་བུ་འཐོབ་ Intended English Translation: [We] get jewel from the ocean. Biltrans Output: ^རྒྱ་མཚོ་<n><abl>/ocean<n><abl>$^ནོར་བུ་<n>/pearl<n>$^འཐོབ་<v><iv>/get<vblex>$ Translation Output: #ocean #pearl #get
Sentence 10
Original Sentence: བྱི་ཙི་ཚུ་ཟུང་གེ་ Intended English Translation: Let's catch the rats. Biltrans Output: ^བྱི་ཙི་<n><adj><qnt>/rat<n><adj><qnt>$^ཟུང་<v><tv>/catch<vblex>$^གེ་<vaux><adh>/shall<vaux><adh>$ Translation Output: #rat #catch #shall
Additions
- Added 100 more stems in the bilingual dictionary:
- Initial # of stems: 66
- Final # of stems: 166
- Added two more lexical selection rules.
- Added two more structural transfer rules.
Final Evaluation
Dzongkha monolingual transducer:
- Precision and recall against the annotated.basic corpus: We get an error.
- Coverage over the large corpus: 3121/6718 (~0.465)
- # of words in the large corpus: 131018 characters
- # of stems in the transducer: 255 lexicon entries
Dzo-Eng MT:
- WER and PER over test phrases: 83.87 %
- WER and PER over longer corpus: 97.44 %
- Proportion of stems translated correctly in the longer corpus: 101
- Trimmed coverage over longer corpus: 101/132
- Trimmed coverage over large corpus: 9698/21689
- # of tokens in the longer corpus: 132
- # of tokens in the large corpus: 21689