The code for the transducer can be found here.
- There are currently 42 stems in the transducer.
- The current coverage is 19.12%.
- There are currently 81 tests passing in mvi.yaml and 2 passing in commonwords.yaml.
- The current list of top unknown words is
As of submission of the analyser, the following tests did not work:
- One verb because of an apostrophe problem, and one because I need lexicons for class 2 stem 1 and stem 2 verbs.
- Most of the final particles, because the form of the tests need to be fixed.
- All of the parts of speech, because I didn't put them in.
The first time I ran aq-covtest, 15.19% of my corpus was covered. I discovered that a number of the top unknown forms were in fact postpositions, which led me to realise that in part of my corpus, the postpositions were separate from the words. Having resolved that problem, I got up to 16.91%. By adding あみ<n><gen> ↔ あみぬ and みず<n><gen> ↔ みずぬ to the transducer, coverage increased to 19.12%.
Initial analysis of morphological generation
- 81 morphological analysis tests pass and 11 fail.
- The current coverage is 19.12%. (Then I fixed the issues with the corpus and it dropped to 14.57%.)
- 81 morphological generation tests pass and 53 fail.
Final analysis of morphological generation
- 88 morphological generation tests pass and 4 fail.
- I added 5 twol rules.
- The current coverage is 16.02%.