Dhivehi/Transducer

From LING073
Jump to: navigation, search

Analyzer Evaluation

  • div.yaml: 155/265 tests passed
  • commonwords.yaml: 6/30 tests passed
  • Number of Stems: 56

Top Unknown Words

Word Occurrences Gloss Tags
އަދި 50 "and" އަދި<cnjcoo> ↔ އަދި
ނުވަތަ 47 "or" ނުވަތަ<cnjcoo> ↔ ނުވަތަ
ޙައްޤު 35 ޙައްޤު<unk> ↔ ޙައްޤު
ކޮންމެ 32 "some, any" <unk> ↔ ކޮންމެ
ލިބިގެންވެއެވެ 30 <unk> ↔ ލިބިގެންވެއެވެ
ވަނަ 30 suffix "-vana" (-th) ordinal <unk> ↔ ވަނަ
މާތްﷲ 30 <unk> ↔ މާތްﷲ
މާއްދާ 29 "article" މާއްދާ<n><nHum><sg><def><dir> ↔ މާއްދާ
މީހަކަށްމެ 28 <unk> ↔ މީހަކަށްމެ
ހަމަ 25 "just" ހަމަ<adj> ↔ ހަމަ
ދެން 24 <unk> ↔ ދެން
އެއްވެސް 20 <unk> ↔ އެއްވެސް
ހުރިހާ 16 <unk> ↔ ހުރިހާ
ލިބިގަތުމުގެ 14 <unk> ↔ ލިބިގަތުމުގެ
ނުވާނެއެވެ 14 <unk> ↔ ނުވާނެއެވެ
އެއީ 14 <unk> ↔ އެއީ
އޭނާގެ 14 <unk> ↔ އޭނާގެ

Notes

The tests that still don't work are:

  1. We're not sure how to deal with the "+".

Adding Unknown Words

By adding unknown words އަދި<cnjcoo> ↔ އަދި, ނުވަތަ<cnjcoo> ↔ ނުވަތަ, and ހަމަ<adj> ↔ ހަމަ, the coverage went from 2.32% to 6.62%.

By adding unknown word މާއްދާ<n><sg> ↔ މާއްދާ, the coverage went from 6.62% to 7.90%.

Genarator Evaluation

Initial Evaluation of Morphological Generation

Current Analyser Evaluation

  • Current number of tests passed (div.yaml): 155/265
  • Current corpus coverage (div.corpus.basic.txt): 7.90%

Current Generator Evaluation

  • Current number of tests passed (div.yaml): 81/144

Final Evaluation of Morphological Generation

  • Current number of tests passed: 94/120 (morph-test.py -cl div.yaml)
  • Number of twol rules added: 20
  • Current corpus coverage (div.corpus.basic.txt): 24.14%

Note

  • I added punctuation to the lexc file and the coverage jumped from 8.51% to 23.82%.

Github

Dhivehi Transducer