Difference between revisions of "User:Qfeng1/Final project"
From LING073
(→Evaluation) |
|||
Line 10: | Line 10: | ||
{| class="wikitable" | {| class="wikitable" | ||
|# of stems in transducer | |# of stems in transducer | ||
− | |||
|# of words in Wiki corpus | |# of words in Wiki corpus | ||
|Coverage Rate | |Coverage Rate | ||
Line 16: | Line 15: | ||
|Recall | |Recall | ||
|- | |- | ||
− | | | + | |608 |
− | + | |14,093,835 | |
− | | | + | |~80.10% |
− | | | + | |~75.34% |
− | | | + | |~56.97% |
− | | | ||
|} | |} | ||
+ | |||
==Further Improvement== | ==Further Improvement== | ||
[[Category:sp19_FinalProjects]] | [[Category:sp19_FinalProjects]] |
Revision as of 14:33, 14 May 2019
Our project is to expand what we have accomplished in class, the monolingual transducer for the Chechen language. We are aiming at a goal of over 85% coverage rate over the large corpus we extracted from Wikipedia pages that are in Chechen.
Major Steps
- To expand the morphology (in lexc and towl files)
- To generate the top list of unknown words in corpus and add more word stems to lexc file
- To measure the level of ambiguity in the large corpus and figure out more disambiguation rules to increase the accuracy of tagger.
Evaluation
# of stems in transducer | # of words in Wiki corpus | Coverage Rate | Precision | Recall |
608 | 14,093,835 | ~80.10% | ~75.34% | ~56.97% |