Difference between revisions of "Chechen/Transducer"

From LING073
Jump to: navigation, search
(Additional Top Words)
Line 14: Line 14:
 
{|class="wikitable sortable"
 
{|class="wikitable sortable"
 
! Top Words !! Coverage Before !! Coverage After
 
! Top Words !! Coverage Before !! Coverage After
|-
 
! N/A
 
| -- || 1.59%
 
 
|-
 
|-
 
! {{morphTest|а{{tag|conj}}|а}}  
 
! {{morphTest|а{{tag|conj}}|а}}  

Revision as of 04:17, 22 February 2019

Transducer repository at Github: https://github.swarthmore.edu/Ling073-sp19/ling073-che.git

Note

All 73 tests in che.yaml file passed.

Hard-coded Forms

  • йоӏ<n><cl_j><abs><pl> ↔ мехкарий: girl, girls
  • стаг<n><cl_j2><abs><pl> ↔ нах: person, people
  • These forms are hard-coded because they are irregular plural forms, which do not conform to any rule.
  • а<conj> ↔ а: and
  • This form also hard-coded because there is no morphological change at all for conjunction class.

Additional Top Words

Top Words Coverage Before Coverage After
а<conj> ↔ а 1.59% 15.73%
ала<v><iv><past><rem><wit> ↔ элира 15.73% 17.03%
латта<v><iv><pres> ↔ лаьтта 17.03% 18.23%
Дала<np><abs><sg> ↔ Дала 18.23% 20.45%
масса<n><cl_j2><erg><sg> ↔ массо 20.45% 21.27%
иза<prn><pers><p3><pl><abs> ↔ иза 21.27% 22.09%
уьш<prn><pers><p3><pl><abs> ↔ уьш 22.09% 22.62%
и<prn><dem><dst> ↔ и 22.62% 23.44%
тайпа<n><cl_d><dat><sg> ↔ тайпана 23.44% 24.17%

Evaluation

  • Total number of stems in the transducer: 55
  • Current coverage over your combined corpus: 24.17%
  • The current list of top unknown words returned by aq-covtest:
  • 64 т
  • 60 у
  • 37 х
  • 22 Веза
  • 21 д
  • 20 Т
  • 20 аккха
  • 18 е
  • 18 ехь
  • 18 ера
  • 15 ду
  • 12 к
  • 12 хир
  • 12 аьлла
  • 11 хи
  • 11 дерриге
  • 11 адам
  • 10 де
  • 10 Иштта
  • 10 цу
  • Number of tests that pass in yaml files
yaml file # of passed tests # of total tests
che.yaml 73 73
common words.yaml 8 34