Wamesa and Tongan

From LING073
Revision as of 19:32, 27 April 2017 by Twarner2 (Talk | contribs)

Jump to: navigation, search

This page is a resource for machine translation between Wamesa and Tongan.

Initial Evaluation

wad → ton evaluation

On the tests file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.

ton → wad evaluation

Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7%

Percentage of stems translated correctly: 93%

Percentage of unknown words in test file: 0%

Results when removing unknown word marks (stars) | WER: 100% PER: 100% Number of position-independent correct words: 1

Results when unknown word marks (stars) not removed | WER: 100% PER: 100%

  • There is still one bug that we are trying to fix. siʻaku is being translated to #<poss>, but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.

Final Evaluation

Wamesa Transducer

There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the wad.annotated.basic.txt corpus are both 100%. There are 940 words in the wad.large.txt corpus, and the coverage over it is 26.17%.

Tongan Transducer

wad → ton evaluation

WER and PER over the wad.longer.txt corpus were both 100%. The coverage over the large corpus was 26.17%, and was the same for the longer corpus.
There are 934 tokens in the longer corpus and 940 in the large corpus.

ton → wad evaluation


Contrastive Grammar
Lexical Selection
Structural Transfer

Developed Resources

ton→wad translator
wad→ton translator
Bidirectional Wamesa/Tongan Translator