Central Kurdish/Transducer
From LING073
Contents
Code
Analyser Evaluation
Stems
The total number of stems can be found below:
- 8 N-Stems
- 4 Definite/Plural
- 4 Verbs_Inf (infinitives)
- 4 V-Stems_1
- 4 V-Stems_2
- 6 Subject_Prn
- 4 Imperatives
- 6 Prns
- 3 Adj-Stem
- 2 Comparatives
- 3 Prepositions
- 3 Conjunctions
- 2 Adverbs
- 2 Npast
Coverage
The total coverage over the ckb.basic corpus is 12.02%. This was an increase of 3 percentage points that came after adding three common words for "water", "earth", and "god" (all just <n>), two prepositions for "which" and "on" (<pr>), and one conjunction for "so" (<conjcoo>). The current top unknown words are:
- بوو
- هەموو
- با
- ئەمە
- فەرمووی
The total coverage over the Wikipedia corpus is 21.48%.
Tests
The transducer currently passes 76/101 (75%) tests on the main yaml file and 3/6 (50%) on the commonwords file. It seems to do well with noun morphology and most verb morphology.
Generator Evaluation
Initial Evaluation of Morphological Generation
Notes
The remaining 26 morphological analysis tests fail for the following reasons:
- There is an issue with some words containing the letter 'ە' that are possibly encoded strangely in Unicode, and it is making some straightforward tests fail.
- The izafa enclitic was skipped (not implemented). All other grammar points were attempted in some way.
- Some verbs, particularly هاتن (to come), have different imperative/non-past stems. Because only one lexicon was used for both types of verbs, this could not be accounted for.