Difference between revisions of "Central Kurdish/Transducer"
From LING073
(→Final Evaluation of Morphological Generation) |
|||
(26 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
[https://github.swarthmore.edu/Ling073-sp21/ling073-ckb GitHub Repository] | [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb GitHub Repository] | ||
− | == Evaluation == | + | == Analyser Evaluation == |
=== Stems === | === Stems === | ||
Line 26: | Line 26: | ||
=== Coverage === | === Coverage === | ||
− | The total coverage over the corpus | + | The total coverage over the ''ckb.basic'' corpus is 12.02%. This was an increase of 3 percentage points that came after adding three common words for "water", "earth", and "god" (all just {{tag|n}}), two prepositions for "which" and "on" ({{tag|pr}}), and one conjunction for "so" ({{tag|conjcoo}}). The current top unknown words are: |
+ | |||
+ | * بوو | ||
+ | * هەموو | ||
+ | * با | ||
+ | * ئەمە | ||
+ | * فەرمووی | ||
+ | |||
+ | The total coverage over the ''Wikipedia'' corpus is 21.48%. | ||
=== Tests === | === Tests === | ||
− | The transducer currently passes | + | The transducer currently passes 80/101 (79%) tests on the main ''yaml'' file and 3/6 (50%) on the ''commonwords'' file. It seems to do well with noun morphology and most verb morphology. |
− | * | + | |
− | * | + | == Generator Evaluation == |
− | * | + | |
+ | === Initial Evaluation of Morphological Generation === | ||
+ | |||
+ | Morphological Analysis | ||
+ | * 85 passes, 16 fails, 101 total (84%) | ||
+ | * 21.48% coverage over ''Wikipedia'' corpus | ||
+ | |||
+ | Morphological Generation | ||
+ | * 85 passes, 49 fails, 134 total (63%) | ||
+ | |||
+ | === Final Evaluation of Morphological Generation === | ||
+ | * 85 passes, 33 fails, 118 total (72%) | ||
+ | * Number of ''twol'' tests added: 3 | ||
== Notes == | == Notes == | ||
+ | |||
+ | The remaining 16 morphological analysis tests fail for the following reasons: | ||
+ | * The izafa enclitic and the demonstrative adjectives were skipped (not implemented). All other grammar points were attempted in some way. | ||
+ | * Some verbs, particularly هاتن (to come), have different imperative/non-past stems. Because only one lexicon was used for both types of verbs, this could not be accounted for. | ||
+ | |||
[[Category: Sp21_Transducers]] [[Category: Central Kurdish]] | [[Category: Sp21_Transducers]] [[Category: Central Kurdish]] |
Latest revision as of 14:48, 21 March 2021
Contents
Code
Analyser Evaluation
Stems
The total number of stems can be found below:
- 8 N-Stems
- 4 Definite/Plural
- 4 Verbs_Inf (infinitives)
- 4 V-Stems_1
- 4 V-Stems_2
- 6 Subject_Prn
- 4 Imperatives
- 6 Prns
- 3 Adj-Stem
- 2 Comparatives
- 3 Prepositions
- 3 Conjunctions
- 2 Adverbs
- 2 Npast
Coverage
The total coverage over the ckb.basic corpus is 12.02%. This was an increase of 3 percentage points that came after adding three common words for "water", "earth", and "god" (all just <n>), two prepositions for "which" and "on" (<pr>), and one conjunction for "so" (<conjcoo>). The current top unknown words are:
- بوو
- هەموو
- با
- ئەمە
- فەرمووی
The total coverage over the Wikipedia corpus is 21.48%.
Tests
The transducer currently passes 80/101 (79%) tests on the main yaml file and 3/6 (50%) on the commonwords file. It seems to do well with noun morphology and most verb morphology.
Generator Evaluation
Initial Evaluation of Morphological Generation
Morphological Analysis
- 85 passes, 16 fails, 101 total (84%)
- 21.48% coverage over Wikipedia corpus
Morphological Generation
- 85 passes, 49 fails, 134 total (63%)
Final Evaluation of Morphological Generation
- 85 passes, 33 fails, 118 total (72%)
- Number of twol tests added: 3
Notes
The remaining 16 morphological analysis tests fail for the following reasons:
- The izafa enclitic and the demonstrative adjectives were skipped (not implemented). All other grammar points were attempted in some way.
- Some verbs, particularly هاتن (to come), have different imperative/non-past stems. Because only one lexicon was used for both types of verbs, this could not be accounted for.