Khasi and Wôpanâak/Structural Transfer

From LING073
Revision as of 16:25, 23 April 2017 by Nfeldba1 (talk | contribs) (Pre-Evaluation)

Jump to: navigation, search

Pre-Evaluation

  • Wam-Kha: Word error rate (WER): 93.75 %
  • Wam-Kha: Number of position-independent correct words: 2
  • Wam-Kha: Position-independent word error rate (PER): 93.75 %
  • Kha-Wam: Word error rate (WER): 100.00 %
  • Kha-Wam: Number of position-independent correct words: 2
  • Kha-Wam: Position-independent word error rate (PER): 100.00 %

Coverage tests

Khasi

  • Number of tokenised words in the corpus: 129
  • Coverage: 100.00%

Wôpanâak

  • Number of tokenised words in the corpus: 55
  • Coverage: 89.09%

Example Sentences

u dngiem u phet "The bear runs"

  • tagger: ^article<art><m><sg>$ ^dngiem<n><m>$ ^article<art><m><sg>$ ^phet<vblex><iv>$
  • biltrans: ^article<art><m><sg>/$ ^dngiem<n><m>/masq<n><aa>$ ^article<art><m><sg>/$ ^phet<vblex><iv>/qaqee<vblex><iv>$
  • chunker: ^n<SN><aa><sg>{^masq<n><2><3>$}$ ^default<default>{^qaqee<vblex><iv>$}$
  • postchunk: ^masq<n><aa><sg>$ ^qaqee<vblex><iv>$
  • wam-kha: masq #qaqee (There is not a yet a structural transfer rule for subject agreement on verbs)

masqak qaqeewak "The bears run"

  • tagger: ^masq<n><aa><pl>$ ^qaqee<vblex><iv><aa><p3><pl>$
  • biltrans: ^masq<n><aa><pl>/dngiem<n><m><pl>$ ^qaqee<vblex><iv><aa><p3><pl>/phet<vblex><iv><aa><p3><pl>$
  • chunker: ^art<SA><m><pl>{^article<art><3>$}$ ^n<SN><m><pl>{^dngiem<n><2>$}$ ^default<default>{^phet<vblex><iv><aa><p3><pl>$}$
  • postchunk: ^article<art><pl>$ ^dngiem<n><m>$ ^phet<vblex><iv><aa><p3><pl>$
  • wam-kha: ki dngiem #phet (Verb agreement rules are not implemented in this direction either)

Post-Evaluation

WER test

  • Still 13/17 correct, because we unfortunately counted outputs with # as correct. Technically, however, we have fixed 7 problems, so it would have been 2/17 first run, 9/17 second run if we were not counting forms with # signs.
  • We also didn't realize that WER was measured in the same test as PER, so we hand-counted correct items and incorrect items in the tests file.

PER test

  • Still 23.53% unknown words

Coverage tests

Khasi

Wôpanâak