Difference between revisions of "User:Qfeng1/Final project"
From LING073
(→Code at GitHub) |
|||
Line 6: | Line 6: | ||
==Code at GitHub== | ==Code at GitHub== | ||
+ | https://github.com/sfeng233/Chechen_Transducer/blob/master/AUTHORS | ||
==Evaluation== | ==Evaluation== |
Revision as of 18:35, 14 May 2019
Our project is to expand what we have accomplished in class, the monolingual transducer for the Chechen language. The main goal is to increase the coverage rate of the transducer over a large corpus (we have a large corpus extracted from Wikipedia with all pages in Chechen and therefore there is a wide range of vocabularies included).
Major Steps
- To expand the morphology (in lexc and towl files)
- To generate the top list of unknown words in corpus and add more word stems to lexc file
Code at GitHub
https://github.com/sfeng233/Chechen_Transducer/blob/master/AUTHORS
Evaluation
# of stems in transducer | # of words in Wiki corpus | Coverage Rate | Precision | Recall |
608 | 14,093,835 | ~80.10% | ~75.34% | ~56.97% |
Further Improvement
- more verb morphology needs to be added:
- proverb
- deictic
- agreement on noun class (when happen and when do not)