Miyako/Transducer
From LING073
The code for the transducer can be found here.
Evaluation
- There are currently 42 stems in the transducer.
- The current coverage is 19.12%.
- There are currently 81 tests passing in mvi.yaml and 2 passing in commonwords.yaml.
- The current list of top unknown words is
- ー
- い
- あい
- たるが
- ひー
- ひらいん
- くとぅー
- どぅみっさすたい
- あたい
- が
- みんぬ
- つっち
- あいな
- っふぃ
- いつ
- うでぃー
- かでぃぬ
- みゃーくん
- あみゃあ
- っま
Notes
The following tests still do not work:
- One verb because of an apostrophe problem, and one because I need lexicons for class 2 stem 1 and stem 2 verbs.
- Most of the final particles, because the form of the tests need to be fixed.
- All of the parts of speech, because I didn't put them in.
The first time I ran aq-covtest, 15.19% of my corpus was covered. I discovered that a number of the top unknown forms were in fact postpositions, which led me to realise that in part of my corpus, the postpositions were separate from the words. Having resolved that problem, I got up to 16.91%. By adding あみ<n><gen> ↔ あみぬ and みず<n><gen> ↔ みずぬ to the transducer, coverage increased to 19.12%.