Our project is to expand what we have accomplished in class, the monolingual transducer for the Chechen language. We are aiming at a goal of over 85% coverage rate over the large corpus we extracted from Wikipedia pages that are in Chechen.
- To expand the morphology (in lexc and towl files)
- To generate the top list of unknown words in corpus and add more word stems to lexc file
- To measure the level of ambiguity in the large corpus and figure out more disambiguation rules to increase the accuracy of tagger.
|# of stems in transducer||# of words in Wiki corpus||Coverage Rate||Precision||Recall|