Wamesa and Tongan/Structural transfer

From LING073
Revision as of 21:54, 4 May 2017 by Twarner2 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Pre-Evaluation

wad→ton

On the tests file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.

ton→wad

Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7%

Percentage of stems translated correctly: 93% .

The WER and PER are both 100% when unknown-words are removed, and 100% when not removing unknown-words.

There was one bug in our pre-evaluation of the data: siʻaku is being translated to #<poss>, but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.

(We resolved this - realized that I had forgotten to add the <poss> tag in my lexc file).


Examples

ton→wad

  • Correct Translation
    • (ton) ʻoku ʻikai ke puke → (wad) pota va
    • (ton) ʻoku<vaux><present> ʻikai<neg> ke<prn><prepd><p3><sg> puke<adj> → (wad) pota<adj><p3><sg> va<neg>
  • tagger
    • ^ʻoku<vaux><pres>$ ^ʻikai<neg>$ ^ke<prn><prepd><p2><sg>$ ^puke<adj>$^.<sent>$
  • biltrans
    • ^ʻoku<vaux><pres>/$ ^ʻikai<neg>/va<neg>$ ^ke<prn><prepd><p2><sg>/$ ^puke<adj>/pota<adj>$^.<sent>/.<sent>$
  • chunker
    • apertium-transfer: Rule 1 ʻikai<neg>/va<neg>
    • apertium-transfer: Rule 2 .<sent>/.<sent>
    • ^neg<NEG>{}$ ^default<default>{^pota<adj>$}$ ^sent<SENT>{^va<neg>$}$^sent<SENT>{^.<sent>$}$
  • interchunk
    • ^neg<NEG>{}$ ^default<default>{^pota<adj>$}$ ^sent<SENT>{^va<neg>$}$^sent<SENT>{^.<sent>$}$
  • postchunk
    • ^pota<adj>$ ^va<neg>$^.<sent>$
  • (ton-wad)
    • pota va#

wad→ton

  • Correct Translation
    • (wad) pimunapat → (ton) siʻaku puaka
    • (wad) pimuna<n><p1><sg><poss> → (ton) siʻaku<p1><poss> puaka<n><sg>
  • tagger
    • ^pimuna<n><p1><sg><poss><.sent>$
  • biltrans
    • ^pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>$
  • chunker
    • apertium-transfer: Rule 2 pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>
      ^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$
  • interchunk
    • ^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$
  • postchunk
    • ^siʻaku$ ^puaka<n><sg>$
  • (wad-ton)
    • #siʻaku puaka

Post-Evaluation

wad→ton

On the tests file, 100% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 100% when *'d words are removed, and when not removing *'s.

ton→wad

On the test file, 100% of the words translate correctly. The WER and PER are both 100% when unknown-words are removed, and 100% when not removing unknown-words.