Difference between revisions of "Wamesa and Tongan"
(→wad → ton evaluation) |
|||
Line 25: | Line 25: | ||
==Final Evaluation== | ==Final Evaluation== | ||
===Wamesa Transducer=== | ===Wamesa Transducer=== | ||
− | There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the <code>wad.annotated.basic.txt</code> corpus are both 100%. There are | + | There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the <code>wad.annotated.basic.txt</code> corpus are both 100%. There are 940 words in the <code>wad.large.txt</code> corpus, and the coverage over it is 26.17%. |
===Tongan Transducer=== | ===Tongan Transducer=== | ||
===wad → ton evaluation=== | ===wad → ton evaluation=== | ||
− | WER and PER over the <code>wad.longer.txt</code> corpus were both 100% | + | WER and PER over the <code>wad.longer.txt</code> corpus were both 100%. The coverage over the <code>large</code> corpus was 26.17%, and was the same for the <code>longer</code> corpus.<br/> |
There are 934 tokens in the <code>longer</code> corpus and 940 in the <code>large</code> corpus. | There are 934 tokens in the <code>longer</code> corpus and 940 in the <code>large</code> corpus. | ||
Line 43: | Line 43: | ||
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/> | [https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/> | ||
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/> | [https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/> | ||
+ | [https://github.swarthmore.edu/mcostag1/ling073-ton-wad Bidirectional Wamesa/Tongan Translator]<br/> | ||
Revision as of 20:32, 27 April 2017
This page is a resource for machine translation between Wamesa and Tongan.
Contents
Initial Evaluation
wad → ton evaluation
On the tests
file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.
ton → wad evaluation
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7%
Percentage of stems translated correctly: 93%
Percentage of unknown words in test file: 0%
Results when removing unknown word marks (stars) | WER: 100% PER: 100% Number of position-independent correct words: 1
Results when unknown word marks (stars) not removed | WER: 100% PER: 100%
- There is still one bug that we are trying to fix. siʻaku is being translated to #<poss>, but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.
Final Evaluation
Wamesa Transducer
There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the wad.annotated.basic.txt
corpus are both 100%. There are 940 words in the wad.large.txt
corpus, and the coverage over it is 26.17%.
Tongan Transducer
wad → ton evaluation
WER and PER over the wad.longer.txt
corpus were both 100%. The coverage over the large
corpus was 26.17%, and was the same for the longer
corpus.
There are 934 tokens in the longer
corpus and 940 in the large
corpus.
ton → wad evaluation
Documentation
Contrastive Grammar
Lexical Selection
Structural Transfer
Developed Resources
ton→wad translator
wad→ton translator
Bidirectional Wamesa/Tongan Translator