Difference between revisions of "Dzongkha and English"
From LING073
(→Final Evaluation) |
|||
(54 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Dzongkha Dzongkha] and English. | ||
+ | == External Resources == | ||
+ | * Find our dzo-eng machine translation repo [https://github.swarthmore.edu/Ling073-sp21/ling073-dzo-eng here]. | ||
+ | * Find our dzo morphological transducer repo [https://github.swarthmore.edu/Ling073-sp21/ling073-dzo here]. | ||
+ | * Fine apertium-eng repo [https://github.com/apertium/apertium-eng here]. | ||
+ | * Find our dzo-eng corpus [https://github.swarthmore.edu/Ling073-sp21/ling073-dzo-eng-corpus here]. | ||
− | + | == dzo → eng evaluation == | |
− | [[Category:Sp21_TranslationPairs]] | + | === Coverage Analysis === |
+ | * Monolingual transducer coverage: 36 / 36 | ||
+ | * Bilingual transducer coverage: 36 / 36 | ||
+ | * Total number of tokens in the dzo.sentences.txt file: 55 | ||
+ | * Total number of tokens not found in the dictionary (number of unknown words): 0 | ||
+ | |||
+ | === Sentence Evaluation === | ||
+ | |||
+ | ==== Sentence 1 ==== | ||
+ | '''Original Sentence:''' བྱི་ལི་དེ་སྒྲོམ་ནང་འདུག་ | ||
+ | '''Intended English Translation:''' The cat is in the box. | ||
+ | '''Biltrans Output:''' ^བྱི་ལི་<n>/cat<n>$^དེ་<det>/the<det><def>$^སྒྲོམ་<n><loc>/box<n><loc>$^འདུག་<vbser>/is<vbser>$ | ||
+ | '''Translation Output:''' #cat #the #box #is | ||
+ | |||
+ | ==== Sentence 2 ==== | ||
+ | '''Original Sentence:''' ང་བཅས་ཀྱི་ཁྱིམ་སྦོམ་ཡོད་ | ||
+ | '''Intended English Translation:''' Our house is big. | ||
+ | '''Biltrans Output:''' ^ང་བཅས་<prn><p1><pl><gen>/we<prn><p1><pl><gen>$^ཁྱིམ་<n>/house<n>$^སྦོམ་<adj>/big<adj>$^ཡོད་<vbser>/has<vbser>$ | ||
+ | '''Translation Output:''' #we #big #house #has | ||
+ | |||
+ | ==== Sentence 3 ==== | ||
+ | '''Original Sentence:''' ཁོ་དགེ་སློང་མེན་ | ||
+ | '''Intended English Translation:''' He is not a monk. | ||
+ | '''Biltrans Output:''' ^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^དགེ་སློང་<n>/monk<n>$^ཨིན་<vbser><neg>/is<vbser><neg>$ | ||
+ | '''Translation Output:''' #he #monk #is not | ||
+ | |||
+ | ==== Sentence 4 ==== | ||
+ | '''Original Sentence:''' མོ་ལུ་སྲིངམོ་གཉིས་ཡོད་ | ||
+ | '''Intended English Translation:''' She has two younger sisters. | ||
+ | '''Biltrans Output:''' ^མོ་<prn><p3><sg><f><dat>/she<prn><p3><sg><f><dat>$^སྲིངམོ་<n>/sister<n>$^གཉིས་<num>/two<num>$^ཡོད་<vbser>/has<vbser>$ | ||
+ | '''Translation Output:''' #she #sister #two #has | ||
+ | |||
+ | ==== Sentence 5 ==== | ||
+ | '''Original Sentence:''' མོ་དེ་ཁོ་བ་རྒས་ | ||
+ | '''Intended English Translation:''' She is older than him. | ||
+ | '''Biltrans Output:''' ^མོ་<prn><p3><sg><f>/she<prn><p3><sg><f>$^དེ་<det>/the<det><def>$^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^རྒས་<adj><comp>/older<adj><comp>$ | ||
+ | '''Translation Output:''' #she #the #he #older | ||
+ | |||
+ | ==== Sentence 6 ==== | ||
+ | '''Original Sentence:''' ད་ཉིམ་ཤར་དོ་ | ||
+ | '''Intended English Translation:''' The sun is shining now. | ||
+ | '''Biltrans Output:''' ^ད་<adv>/now<adv>$^ཉིམ་<n>/sun<n>$^ཤར་<v><iv><pres><prog>/shine<vblex><pres><prog>$ | ||
+ | '''Translation Output:''' now #sun #shine | ||
+ | |||
+ | ==== Sentence 7 ==== | ||
+ | '''Original Sentence:''' ང་དུས་ཚོད་ཁར་ལྷོད་ཅི་ | ||
+ | '''Intended English Translation:''' I arrived on time. | ||
+ | '''Biltrans Output:''' ^ང་<prn><p1><sg>/I<prn><p1><sg>$^དུས་ཚོད་<n><loc>/time<n><loc>$^ལྷོད་<v><iv>/arrive<vblex>$ | ||
+ | '''Translation Output:''' #I #time arrived | ||
+ | |||
+ | ==== Sentence 8 ==== | ||
+ | '''Original Sentence:''' སྲུང་འབྲི་མི་དག་པ་གཅིག་འདུག་ | ||
+ | '''Intended English Translation:''' There are a few story writers. | ||
+ | '''Biltrans Output:''' ^སྲུང་<n>/story<n>$^འབྲི་<v><tv><vadj>/write<vblex><vadj>$^གཅིག་<num>/one<num>$^འདུག་<vbser>/is<vbser>$ | ||
+ | '''Translation Output:''' #story #write #one #is | ||
+ | |||
+ | ==== Sentence 9 ==== | ||
+ | '''Original Sentence:''' རྒྱ་མཚོ་ལས་ནོར་བུ་འཐོབ་ | ||
+ | '''Intended English Translation:''' [We] get jewel from the ocean. | ||
+ | '''Biltrans Output:''' ^རྒྱ་མཚོ་<n><abl>/ocean<n><abl>$^ནོར་བུ་<n>/pearl<n>$^འཐོབ་<v><iv>/get<vblex>$ | ||
+ | '''Translation Output:''' #ocean #pearl #get | ||
+ | |||
+ | ==== Sentence 10 ==== | ||
+ | '''Original Sentence:''' བྱི་ཙི་ཚུ་ཟུང་གེ་ | ||
+ | '''Intended English Translation:''' Let's catch the rats. | ||
+ | '''Biltrans Output:''' ^བྱི་ཙི་<n><adj><qnt>/rat<n><adj><qnt>$^ཟུང་<v><tv>/catch<vblex>$^གེ་<vaux><adh>/shall<vaux><adh>$ | ||
+ | '''Translation Output:''' #rat #catch #shall | ||
+ | |||
+ | == Additions == | ||
+ | |||
+ | * Added 100 more stems in the bilingual dictionary: | ||
+ | ** Initial # of stems: 66 | ||
+ | ** Final # of stems: 166 | ||
+ | * Added two more lexical selection rules. | ||
+ | * Added two more structural transfer rules. | ||
+ | |||
+ | == Final Evaluation == | ||
+ | |||
+ | Dzongkha monolingual transducer: | ||
+ | * Precision and recall against the annotated.basic corpus: We get an error. | ||
+ | * Coverage over the large corpus: 3121/6718 (~0.465) | ||
+ | * # of words in the large corpus: 131018 characters | ||
+ | * # of stems in the transducer: 255 lexicon entries | ||
+ | |||
+ | Dzo-Eng MT: | ||
+ | * WER and PER over test phrases: 83.87 % | ||
+ | * WER and PER over longer corpus: 97.44 % | ||
+ | * Proportion of stems translated correctly in the longer corpus: 101 | ||
+ | * Trimmed coverage over longer corpus: 101/132 | ||
+ | * Trimmed coverage over large corpus: 9698/21689 | ||
+ | * # of tokens in the longer corpus: 132 | ||
+ | * # of tokens in the large corpus: 21689 | ||
+ | |||
+ | [[Category:Sp21_TranslationPairs]] [[Category:Dzongkha]] [[Category:English]] |
Latest revision as of 04:32, 25 May 2021
Resources for machine translation between Dzongkha and English.
Contents
External Resources
- Find our dzo-eng machine translation repo here.
- Find our dzo morphological transducer repo here.
- Fine apertium-eng repo here.
- Find our dzo-eng corpus here.
dzo → eng evaluation
Coverage Analysis
- Monolingual transducer coverage: 36 / 36
- Bilingual transducer coverage: 36 / 36
- Total number of tokens in the dzo.sentences.txt file: 55
- Total number of tokens not found in the dictionary (number of unknown words): 0
Sentence Evaluation
Sentence 1
Original Sentence: བྱི་ལི་དེ་སྒྲོམ་ནང་འདུག་ Intended English Translation: The cat is in the box. Biltrans Output: ^བྱི་ལི་<n>/cat<n>$^དེ་<det>/the<det><def>$^སྒྲོམ་<n><loc>/box<n><loc>$^འདུག་<vbser>/is<vbser>$ Translation Output: #cat #the #box #is
Sentence 2
Original Sentence: ང་བཅས་ཀྱི་ཁྱིམ་སྦོམ་ཡོད་ Intended English Translation: Our house is big. Biltrans Output: ^ང་བཅས་<prn><p1><pl><gen>/we<prn><p1><pl><gen>$^ཁྱིམ་<n>/house<n>$^སྦོམ་<adj>/big<adj>$^ཡོད་<vbser>/has<vbser>$ Translation Output: #we #big #house #has
Sentence 3
Original Sentence: ཁོ་དགེ་སློང་མེན་ Intended English Translation: He is not a monk. Biltrans Output: ^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^དགེ་སློང་<n>/monk<n>$^ཨིན་<vbser><neg>/is<vbser><neg>$ Translation Output: #he #monk #is not
Sentence 4
Original Sentence: མོ་ལུ་སྲིངམོ་གཉིས་ཡོད་ Intended English Translation: She has two younger sisters. Biltrans Output: ^མོ་<prn><p3><sg><f><dat>/she<prn><p3><sg><f><dat>$^སྲིངམོ་<n>/sister<n>$^གཉིས་<num>/two<num>$^ཡོད་<vbser>/has<vbser>$ Translation Output: #she #sister #two #has
Sentence 5
Original Sentence: མོ་དེ་ཁོ་བ་རྒས་ Intended English Translation: She is older than him. Biltrans Output: ^མོ་<prn><p3><sg><f>/she<prn><p3><sg><f>$^དེ་<det>/the<det><def>$^ཁོ་<prn><p3><sg><m>/he<prn><p3><sg><m>$^རྒས་<adj><comp>/older<adj><comp>$ Translation Output: #she #the #he #older
Sentence 6
Original Sentence: ད་ཉིམ་ཤར་དོ་ Intended English Translation: The sun is shining now. Biltrans Output: ^ད་<adv>/now<adv>$^ཉིམ་<n>/sun<n>$^ཤར་<v><iv><pres><prog>/shine<vblex><pres><prog>$ Translation Output: now #sun #shine
Sentence 7
Original Sentence: ང་དུས་ཚོད་ཁར་ལྷོད་ཅི་ Intended English Translation: I arrived on time. Biltrans Output: ^ང་<prn><p1><sg>/I<prn><p1><sg>$^དུས་ཚོད་<n><loc>/time<n><loc>$^ལྷོད་<v><iv>/arrive<vblex>$ Translation Output: #I #time arrived
Sentence 8
Original Sentence: སྲུང་འབྲི་མི་དག་པ་གཅིག་འདུག་ Intended English Translation: There are a few story writers. Biltrans Output: ^སྲུང་<n>/story<n>$^འབྲི་<v><tv><vadj>/write<vblex><vadj>$^གཅིག་<num>/one<num>$^འདུག་<vbser>/is<vbser>$ Translation Output: #story #write #one #is
Sentence 9
Original Sentence: རྒྱ་མཚོ་ལས་ནོར་བུ་འཐོབ་ Intended English Translation: [We] get jewel from the ocean. Biltrans Output: ^རྒྱ་མཚོ་<n><abl>/ocean<n><abl>$^ནོར་བུ་<n>/pearl<n>$^འཐོབ་<v><iv>/get<vblex>$ Translation Output: #ocean #pearl #get
Sentence 10
Original Sentence: བྱི་ཙི་ཚུ་ཟུང་གེ་ Intended English Translation: Let's catch the rats. Biltrans Output: ^བྱི་ཙི་<n><adj><qnt>/rat<n><adj><qnt>$^ཟུང་<v><tv>/catch<vblex>$^གེ་<vaux><adh>/shall<vaux><adh>$ Translation Output: #rat #catch #shall
Additions
- Added 100 more stems in the bilingual dictionary:
- Initial # of stems: 66
- Final # of stems: 166
- Added two more lexical selection rules.
- Added two more structural transfer rules.
Final Evaluation
Dzongkha monolingual transducer:
- Precision and recall against the annotated.basic corpus: We get an error.
- Coverage over the large corpus: 3121/6718 (~0.465)
- # of words in the large corpus: 131018 characters
- # of stems in the transducer: 255 lexicon entries
Dzo-Eng MT:
- WER and PER over test phrases: 83.87 %
- WER and PER over longer corpus: 97.44 %
- Proportion of stems translated correctly in the longer corpus: 101
- Trimmed coverage over longer corpus: 101/132
- Trimmed coverage over large corpus: 9698/21689
- # of tokens in the longer corpus: 132
- # of tokens in the large corpus: 21689