Khasi and Wôpanâak/Structural Transfer
From LING073
Contents
Pre-Evaluation
- Wam-Kha: Word error rate (WER): 93.75 %
- Wam-Kha: Number of position-independent correct words: 2
- Wam-Kha: Position-independent word error rate (PER): 93.75 %
- Kha-Wam: Word error rate (WER): 100.00 %
- Kha-Wam: Number of position-independent correct words: 2
- Kha-Wam: Position-independent word error rate (PER): 100.00 %
Coverage tests
Khasi
- Number of tokenised words in the corpus: 129
- Coverage: 100.00%
Wôpanâak
- Number of tokenised words in the corpus: 55
- Coverage: 89.09%
Example Sentences
u dngiem u phet "The bear runs"
- tagger: ^article<art><m><sg>$ ^dngiem<n><m>$ ^article<art><m><sg>$ ^phet<vblex><iv>$
- biltrans: ^article<art><m><sg>/$ ^dngiem<n><m>/masq<n><aa>$ ^article<art><m><sg>/$ ^phet<vblex><iv>/qaqee<vblex><iv>$
- chunker: ^n<SN><aa><sg>{^masq<n><2><3>$}$ ^default<default>{^qaqee<vblex><iv>$}$
- postchunk: ^masq<n><aa><sg>$ ^qaqee<vblex><iv>$
- wam-kha: masq #qaqee (There is not a yet a structural transfer rule for subject agreement on verbs)
masqak qaqeewak "The bears run"
- tagger: ^masq<n><aa><pl>$ ^qaqee<vblex><iv><aa><p3><pl>$
- biltrans: ^masq<n><aa><pl>/dngiem<n><m><pl>$ ^qaqee<vblex><iv><aa><p3><pl>/phet<vblex><iv><aa><p3><pl>$
- chunker: ^art<SA><m><pl>{^article<art><3>$}$ ^n<SN><m><pl>{^dngiem<n><2>$}$ ^default<default>{^phet<vblex><iv><aa><p3><pl>$}$
- postchunk: ^article<art><pl>$ ^dngiem<n><m>$ ^phet<vblex><iv><aa><p3><pl>$
- wam-kha: ki dngiem #phet (Verb agreement rules are not implemented in this direction either)
Post-Evaluation
WER test
- Still 13/17 correct, because we unfortunately counted outputs with # as correct. Technically, however, we have fixed 7 problems, so it would have been 2/17 first run, 9/17 second run if we were not counting forms with # signs.
- We also didn't realize that WER was measured in the same test as PER, so we hand-counted correct items and incorrect items in the tests file.
PER test
- Still 23.53% unknown words
Coverage tests
Khasi
Wôpanâak