Difference between revisions of "Khasi/Final Project"

From LING073
Jump to: navigation, search
(Pre-Final Project)
Line 23: Line 23:
 
112 katei
 
112 katei
 
111 Jylla
 
111 Jylla
 +
==Step 1: added all the above words (and accompanying disambiguations) except for Jylla==
 +
Coverage: 62.61%
 +
Top unknown words in the corpus:
 +
111 Jylla
 +
109 tang
 +
108 ym
 +
106 shuh
 +
102 haduh
 +
100 skul
 +
98 sorkar
 +
97 Seng
 +
96 briew
 +
90 M
 +
88 jylla
 +
88 kala
 +
87 ri
 +
87 lang
 +
85 E
 +
83 kine
 +
82 seng
 +
77 Meghalaya
 +
77 tarik
 +
74 naduh
  
 
[[Category:sp17_FinalProjects]]
 
[[Category:sp17_FinalProjects]]

Revision as of 23:55, 1 May 2017

Pre-Final Project

Number of tokenised words in the corpus: 57847 Coverage: 57.26% Top unknown words in the corpus: 206 kam 200 kum 179 Bah 172 kiwei 171 bynta 170 baroh 166 lah 159 pat 147 mynta 130 noh 125 paidbah 124 ne 122 ïoh 119 por 118 wan 118 Shillong 117 namar 117 Khasi 112 katei 111 Jylla

Step 1: added all the above words (and accompanying disambiguations) except for Jylla

Coverage: 62.61% Top unknown words in the corpus: 111 Jylla 109 tang 108 ym 106 shuh 102 haduh 100 skul 98 sorkar 97 Seng 96 briew 90 M 88 jylla 88 kala 87 ri 87 lang 85 E 83 kine 82 seng 77 Meghalaya 77 tarik 74 naduh