Dhivehi/Transducer
From LING073
Revision as of 00:56, 20 March 2019 by Xluo1 (talk | contribs) (→Final Evaluation of Morphological Generation)
Contents
Analyzer Evaluation
- div.yaml: 155/265 tests passed
- commonwords.yaml: 6/30 tests passed
- Number of Stems: 56
Top Unknown Words
Word | Occurrences | Gloss | Tags |
---|---|---|---|
އަދި | 50 | "and" | އަދި<cnjcoo> ↔ އަދި |
ނުވަތަ | 47 | "or" | ނުވަތަ<cnjcoo> ↔ ނުވަތަ |
ޙައްޤު | 35 | ޙައްޤު<unk> ↔ ޙައްޤު | |
ކޮންމެ | 32 | "some, any" | <unk> ↔ ކޮންމެ |
ލިބިގެންވެއެވެ | 30 | <unk> ↔ ލިބިގެންވެއެވެ | |
ވަނަ | 30 | suffix "-vana" (-th) ordinal | <unk> ↔ ވަނަ |
މާތްﷲ | 30 | <unk> ↔ މާތްﷲ | |
މާއްދާ | 29 | "article" | މާއްދާ<n><nHum><sg><def><dir> ↔ މާއްދާ |
މީހަކަށްމެ | 28 | <unk> ↔ މީހަކަށްމެ | |
ހަމަ | 25 | "just" | ހަމަ<adj> ↔ ހަމަ |
ދެން | 24 | <unk> ↔ ދެން | |
އެއްވެސް | 20 | <unk> ↔ އެއްވެސް | |
ހުރިހާ | 16 | <unk> ↔ ހުރިހާ | |
ލިބިގަތުމުގެ | 14 | <unk> ↔ ލިބިގަތުމުގެ | |
ނުވާނެއެވެ | 14 | <unk> ↔ ނުވާނެއެވެ | |
އެއީ | 14 | <unk> ↔ އެއީ | |
އޭނާގެ | 14 | <unk> ↔ އޭނާގެ |
Notes
The tests that still don't work are:
- We're not sure how to deal with the "+".
Adding Unknown Words
By adding unknown words އަދި<cnjcoo> ↔ އަދި, ނުވަތަ<cnjcoo> ↔ ނުވަތަ, and ހަމަ<adj> ↔ ހަމަ, the coverage went from 2.32% to 6.62%.
By adding unknown word މާއްދާ<n><sg> ↔ މާއްދާ, the coverage went from 6.62% to 7.90%.
Genarator Evaluation
Initial Evaluation of Morphological Generation
Current Analyser Evaluation
- Current number of tests passed (div.yaml): 155/265
- Current corpus coverage (div.corpus.basic.txt): 7.90%
Current Generator Evaluation
- Current number of tests passed (div.yaml): 81/144
Final Evaluation of Morphological Generation
- Current number of tests passed: 94/120 (morph-test.py -cl div.yaml)
- Number of twol rules added: 20
- Current corpus coverage (div.corpus.basic.txt): 24.14%
Note
- I added punctuation to the lexc file and the coverage jumped from 8.51% to 23.82%.