Difference between revisions of "Wamesa and Tongan/Structural transfer"
(→Examples) |
|||
Line 4: | Line 4: | ||
===ton→wad=== | ===ton→wad=== | ||
+ | Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% | ||
+ | |||
+ | Percentage of stems translated correctly: 93% . | ||
+ | |||
+ | The WER and PER are both 100% when unknown-words are removed, and 100% when not removing unknown-words. | ||
+ | |||
+ | There was one bug in our pre-evaluation of the data: siʻaku is being translated to #<poss>, but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%. | ||
+ | |||
+ | (We resolved this - realized that I had forgotten to add the <poss> tag in my lexc file). | ||
Line 55: | Line 64: | ||
===ton→wad=== | ===ton→wad=== | ||
+ | On the test file, 100% of the words translate correctly. The WER and PER are both 100% when unknown-words are removed, and 100% when not removing unknown-words. | ||
Revision as of 16:42, 26 April 2017
Contents
Pre-Evaluation
wad→ton
On the tests file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.
ton→wad
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7%
Percentage of stems translated correctly: 93% .
The WER and PER are both 100% when unknown-words are removed, and 100% when not removing unknown-words.
There was one bug in our pre-evaluation of the data: siʻaku is being translated to #<poss>, but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.
(We resolved this - realized that I had forgotten to add the <poss> tag in my lexc file).
Examples
ton→wad
- Correct Translation
- (ton) ʻoku ʻikai ke puke → (wad) pota va
- (ton) ʻoku<vaux><present> ʻikai<neg> ke<prn><prepd><p3><sg> puke<adj> → (wad) pota<adj><p3><sg> va<neg>
- tagger
- ^ʻoku<vaux><pres>$ ^ʻikai<neg>$ ^ke<prn><prepd><p2><sg>$ ^puke<adj>$^.<sent>$
- biltrans
- ^ʻoku<vaux><pres>/$ ^ʻikai<neg>/va<neg>$ ^ke<prn><prepd><p2><sg>/$ ^puke<adj>/pota<adj>$^.<sent>/.<sent>$
- chunker
- apertium-transfer: Rule 1 ʻikai<neg>/va<neg>
- apertium-transfer: Rule 2 .<sent>/.<sent>
- ^neg<NEG>{}$ ^default<default>{^pota<adj>$}$ ^sent<SENT>{^va<neg>$}$^sent<SENT>{^.<sent>$}$
- interchunk
- ^neg<NEG>{}$ ^default<default>{^pota<adj>$}$ ^sent<SENT>{^va<neg>$}$^sent<SENT>{^.<sent>$}$
- postchunk
- ^pota<adj>$ ^va<neg>$^.<sent>$
- (ton-wad)
- pota va#
wad→ton
- Correct Translation
- (wad) pimunapat → (ton) siʻaku puaka
- (wad) pimuna<n><p1><sg><poss> → (ton) siʻaku<p1><poss> puaka<n><sg>
- tagger
- ^pimuna<n><p1><sg><poss><.sent>$
- biltrans
- ^pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>$
- chunker
- apertium-transfer: Rule 2 pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>
^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$
- apertium-transfer: Rule 2 pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>
- interchunk
- ^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$
- postchunk
- ^siʻaku$ ^puaka<n><sg>$
- (wad-ton)
- #siʻaku puaka
Post-Evaluation
wad→ton
On the tests file, 100% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 100% when *'d words are removed, and when not removing *'s.
ton→wad
On the test file, 100% of the words translate correctly. The WER and PER are both 100% when unknown-words are removed, and 100% when not removing unknown-words.