Miyako/Transducer

From LING073
Revision as of 23:42, 17 February 2017 by Doldham1 (talk | contribs) (Evaluation)

Jump to: navigation, search

The code for the transducer can be found here.

Evaluation

  • There are currently 42 stems in the transducer.
  • The current coverage is 19.12%.
  • There are currently 81 tests passing in mvi.yaml and 2 passing in commonwords.yaml.
  • The current list of top unknown words is
    あい
    たるが
    ひー
    ひらいん
    くとぅー
    どぅみっさすたい
    あたい
    みんぬ
    つっち
    あいな
    っふぃ
    いつ
    うでぃー
    かでぃぬ
    みゃーくん
    あみゃあ
    っま

Notes

The following tests still do not work:

  • One verb because of an apostrophe problem, and one because I need lexicons for class 2 stem 1 and stem 2 verbs.
  • Most of the final particles, because the form of the tests need to be fixed.
  • All of the parts of speech, because I didn't put them in.

The first time I ran aq-covtest, 15.19% of my corpus was covered. I discovered that a number of the top unknown forms were in fact postpositions, which led me to realise that in part of my corpus, the postpositions were separate from the words. Having resolved that problem, I got up to 16.91%. By adding あみ<n><gen> ↔ あみぬ and みず<n><gen> ↔ みずぬ to the transducer, coverage increased to 19.12%.