Difference between revisions of "User:Qfeng1/Final project"

From LING073
Jump to: navigation, search
(Evaluation)
Line 10: Line 10:
 
{| class="wikitable"
 
{| class="wikitable"
 
|# of stems in transducer
 
|# of stems in transducer
|# of disambiguation rules
 
 
|# of words in Wiki corpus  
 
|# of words in Wiki corpus  
 
|Coverage Rate
 
|Coverage Rate
Line 16: Line 15:
 
|Recall  
 
|Recall  
 
|-
 
|-
|Bread
+
|608
|Pie
+
|14,093,835
|
+
|~80.10%
|
+
|~75.34%
|
+
|~56.97%
|
 
 
|}
 
|}
 +
 
==Further Improvement==
 
==Further Improvement==
  
  
 
[[Category:sp19_FinalProjects]]
 
[[Category:sp19_FinalProjects]]

Revision as of 14:33, 14 May 2019

Our project is to expand what we have accomplished in class, the monolingual transducer for the Chechen language. We are aiming at a goal of over 85% coverage rate over the large corpus we extracted from Wikipedia pages that are in Chechen.

Major Steps

  • To expand the morphology (in lexc and towl files)
  • To generate the top list of unknown words in corpus and add more word stems to lexc file
  • To measure the level of ambiguity in the large corpus and figure out more disambiguation rules to increase the accuracy of tagger.

Evaluation

# of stems in transducer # of words in Wiki corpus Coverage Rate Precision Recall
608 14,093,835 ~80.10% ~75.34% ~56.97%

Further Improvement