Central Kurdish/Transducer

From LING073
Revision as of 16:06, 20 March 2021 by Rkamal1 (talk | contribs)

Jump to: navigation, search

Code

GitHub Repository

Analyser Evaluation

Stems

The total number of stems can be found below:

  • 8 N-Stems
  • 4 Definite/Plural
  • 4 Verbs_Inf (infinitives)
  • 4 V-Stems_1
  • 4 V-Stems_2
  • 6 Subject_Prn
  • 4 Imperatives
  • 6 Prns
  • 3 Adj-Stem
  • 2 Comparatives
  • 3 Prepositions
  • 3 Conjunctions
  • 2 Adverbs
  • 2 Npast

Coverage

The total coverage over the ckb.basic corpus is 12.02%. This was an increase of 3 percentage points that came after adding three common words for "water", "earth", and "god" (all just <n>), two prepositions for "which" and "on" (<pr>), and one conjunction for "so" (<conjcoo>). The current top unknown words are:

  • بوو
  • هەموو
  • با
  • ئەمە
  • فەرمووی

The total coverage over the Wikipedia corpus is 21.48%.

Tests

The transducer currently passes 76/101 (75%) tests on the main yaml file and 3/6 (50%) on the commonwords file. It seems to do well with noun morphology and most verb morphology.

Generator Evaluation

Initial Evaluation of Morphological Generation

Notes

The remaining 26 morphological analysis tests fail for the following reasons:

  • There is an issue with some words containing the letter 'ە' that are possibly encoded strangely in Unicode, and it is making some straightforward tests fail.
  • The izafa enclitic was skipped (not implemented). All other grammar points were attempted in some way.
  • Some verbs, particularly هاتن (to come), have different imperative/non-past stems. Because only one lexicon was used for both types of verbs, this could not be accounted for.