Miyako/Transducer

From LING073
Revision as of 15:36, 22 February 2017 by Doldham1 (talk | contribs) (Initial analysis of morphological generation)

Jump to: navigation, search

The code for the transducer can be found here.

Analyser Evaluation

Evaluation

  • There are currently 42 stems in the transducer.
  • The current coverage is 19.12%.
  • There are currently 81 tests passing in mvi.yaml and 2 passing in commonwords.yaml.
  • The current list of top unknown words is
    あい
    たるが
    ひー
    ひらいん
    くとぅー
    どぅみっさすたい
    あたい
    みんぬ
    つっち
    あいな
    っふぃ
    いつ
    うでぃー
    かでぃぬ
    みゃーくん
    あみゃあ
    っま

Notes

As of submission of the analyser, the following tests did not work:

  • One verb because of an apostrophe problem, and one because I need lexicons for class 2 stem 1 and stem 2 verbs.
  • Most of the final particles, because the form of the tests need to be fixed.
  • All of the parts of speech, because I didn't put them in.

The first time I ran aq-covtest, 15.19% of my corpus was covered. I discovered that a number of the top unknown forms were in fact postpositions, which led me to realise that in part of my corpus, the postpositions were separate from the words. Having resolved that problem, I got up to 16.91%. By adding あみ<n><gen> ↔ あみぬ and みず<n><gen> ↔ みずぬ to the transducer, coverage increased to 19.12%.

Generator Evaluation

Initial analysis of morphological generation

  • 81 morphological analysis tests pass and 11 fail.
  • The current coverage is 19.12%. (Then I fixed the issues with the corpus and it dropped to 14.57%.)
  • 81 morphological generation tests pass and 53 fail.