Difference between revisions of "User:Qfeng1/Final project"
From LING073
(→Evaluation) |
(→Major Steps) |
||
Line 3: | Line 3: | ||
==Major Steps== | ==Major Steps== | ||
*To expand the morphology (in lexc and towl files) | *To expand the morphology (in lexc and towl files) | ||
− | *To generate the top list of unknown words in corpus and add more word stems to lexc file | + | *To generate the top list of unknown words in corpus and add more word stems to lexc file |
− | |||
==Evaluation== | ==Evaluation== |
Revision as of 14:34, 14 May 2019
Our project is to expand what we have accomplished in class, the monolingual transducer for the Chechen language. We are aiming at a goal of over 85% coverage rate over the large corpus we extracted from Wikipedia pages that are in Chechen.
Major Steps
- To expand the morphology (in lexc and towl files)
- To generate the top list of unknown words in corpus and add more word stems to lexc file
Evaluation
# of stems in transducer | # of words in Wiki corpus | Coverage Rate | Precision | Recall |
608 | 14,093,835 | ~80.10% | ~75.34% | ~56.97% |