Difference between revisions of "Berik/Transducer"
From LING073
(→Evaluation) |
|||
Line 3: | Line 3: | ||
== Evaluation == | == Evaluation == | ||
− | * Current corpus coverage: | + | * Current corpus coverage: 42.84% |
* Number of stems: 112 | * Number of stems: 112 | ||
** Does not include pronouns (4?) and verbs (3?) | ** Does not include pronouns (4?) and verbs (3?) | ||
* Top unknown words: | * Top unknown words: | ||
− | |||
** aa (64) | ** aa (64) | ||
** jeta (42) | ** jeta (42) | ||
− | |||
− | |||
** ge (31) | ** ge (31) | ||
** taterisi (11) | ** taterisi (11) | ||
Line 50: | Line 47: | ||
* {{morphTest|je{{tag|prn}}+wer{{tag|post}}|jewer}} | * {{morphTest|je{{tag|prn}}+wer{{tag|post}}|jewer}} | ||
* {{morphTest|gam{{tag|part}}|gam}} | * {{morphTest|gam{{tag|part}}|gam}} | ||
+ | |||
+ | Adding "jei" as a form of "je" raised coverage to 42.84%. | ||
+ | * {{morphTest|je{{tag|prn}}|jei}} | ||
+ | * {{morphTest|je{{tag|prn}}{{tag|dem}}{{tag|part}}|jeiserem}} |
Revision as of 11:44, 1 March 2018
Code: https://github.swarthmore.edu/jspring1/ling073-bkl
Evaluation
- Current corpus coverage: 42.84%
- Number of stems: 112
- Does not include pronouns (4?) and verbs (3?)
- Top unknown words:
- aa (64)
- jeta (42)
- ge (31)
- taterisi (11)
- aane (11)
- jebe (9)
- bosna (9)
- anes (9)
- asal (8)
- asala (8)
- Aamai (8)
- temawer (8)
- Jepga (8)
- aaiserem (8)
- enggame (7)
- Taterisi (7)
- Sanbagiri (6)
- Analyzer tests passing: 85/117 (72.65%)
- Generator tests passing: 85/152 (55.92%)
- Corpus tests passing: 14/33 (42.42%)
Notes
Initial corpus coverage was 21.98%.
Coverage was raised to 34.65% by adding
- jamere<locl> ↔ jamere
- Yesus<n> ↔ Yesus
- Yusuf<n> ↔ Yusuf
- Maria<n> ↔ Maria
- Daud<n> ↔ Daud
- angtane<n> ↔ angtane
- raja<n> ↔ raja
- taman<n> ↔ taman
- kapka<adj> ↔ kapka
- se<imp> ↔ se
- je<prn><pos> ↔ jemna
- je<prn><subj> ↔ jam
- je<prn>+wer<post> ↔ jewer
- gam<part> ↔ gam
Adding "jei" as a form of "je" raised coverage to 42.84%.
- je<prn> ↔ jei
- je<prn><dem><part> ↔ jeiserem