Difference between revisions of "Wamesa and Tongan"
From LING073
(→Documentation) |
|||
Line 1: | Line 1: | ||
This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan]. | This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan]. | ||
<br/> | <br/> | ||
+ | ==Initial Evaluation== | ||
===wad → ton evaluation=== | ===wad → ton evaluation=== | ||
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s. | On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s. | ||
Line 21: | Line 22: | ||
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%. | *There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%. | ||
+ | |||
+ | ==Final Evaluation== | ||
+ | ===wad → ton evaluation=== | ||
+ | |||
+ | ===ton → wad evaluation=== | ||
==Documentation== | ==Documentation== |
Revision as of 09:09, 27 April 2017
This page is a resource for machine translation between Wamesa and Tongan.
Contents
Initial Evaluation
wad → ton evaluation
On the tests
file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.
ton → wad evaluation
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7%
Percentage of stems translated correctly: 93%
Percentage of unknown words in test file: 0%
Results when removing unknown word marks (stars) | WER: 100% PER: 100% Number of position-independent correct words: 1
Results when unknown word marks (stars) not removed | WER: 100% PER: 100%
- There is still one bug that we are trying to fix. siʻaku is being translated to #<poss>, but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.
Final Evaluation
wad → ton evaluation
ton → wad evaluation
Documentation
Contrastive Grammar
Lexical Selection
Structural Transfer