Dzongkha/Transducer
Contents
Code
Find our GitHub repo here.
Analyser Evaluation
- Initial Coverage: 1/361
- Remaining unknown forms: 360
- Current Coverage: 302 / 952
- Remaining unknown forms: 650
Lexicon Info
- Lexicons: 21
- Lexicon entries: 112
- Patterns: 5
- Pattern entries: 13
Counts For Individual Lexicons
- N-Stems: 24
- Cases: 6
- Vbser-Stems: 7
- Tenses: 5
- Vaux-Stems: 2
- Neg: 1
- V-Stems: 19
- Adj-Stems: 5
- Adj-Sup: 1
- Adj-Comp: 1
- Prns: 9
- Punctuation: 24
- All anonymous lexicons: 8
Current Top Unknown Words
TOP UNKNOWN WORDS:
TOP UNKNOWN WORDS: 31 ^།/*།$ 26 ^།།/*།།$ 12 ^མས/*མས$ 11 ^པ/*པ$ 8 ^ར/*ར$ 8 ^་/*་$ 7 ^གིས་/*གིས་$ 6 ^རུ/*རུ$ 6 ^དེ་ལས་/*དེ་ལས་$ 5 ^སྤྱ/*སྤྱ$ 5 ^ལ/*ལ$ 4 ^ས/*ས$ 4 ^ར་/*ར་$ 4 ^ཡ/*ཡ$ 4 ^ནུག/*ནུག$ 4 ^ཌོརན་འདི་/*ཌོརན་འདི་$ 4 ^ཌ/*ཌ$ 3 ^ེ་ན/*ེ་ན$ 3 ^སངྱས་རྡོ་རྗེ་གིས་/*སངྱས་རྡོ་རྗེ་གིས་$ 3 ^ལཱ་/*ལཱ་$
Tests Passed
61/61 of dzo.yaml tests pass.
commonwords.yaml is empty due to tokenization problem.
Generator Evaluation
Initial Evaluation of Morphological Generation
Number of passes and fails for the analysis tests:
- Total passes: 61, Total fails: 0, Total: 61
Number of passes and fails for the generation test:
- Total passes: 61, Total fails: 39, Total: 100
Current coverage info: Unable to work on due to tokenization problems.
Final Evaluation of Morphological Generation
Number of passing and failing tests after adding our first set of twol rules:
- Total passes: 61, Total fails: 22, Total: 83
Number of twol rules we added: 9
Current coverage info: Unable to work on due to tokenization problems.
Notes
As of 2021/3/16, all 61 tests have passed successfully after lexd-U update.
There are 8 tests in total out of the 61 original dzo.yaml file that do not pass.
4 of the tests are for honorific nouns and gendered nouns, both of which are non-productive, and are not part of the morphology.
The rest of the 4 failed tests are listed as follows:
Test 9: Dative Suffix (Surface/Analysis)
[4/4][FAIL] ཡིག་ཚང་ནང་ལུ་ => Missing results: ཡིག་ཚང་<n><loc><dat>
Test 14: Genitive Suffixes (Surface/Analysis)
[6/7][FAIL] མདའ་ཡི་ => Missing results: མདའ་<n><gen>
Test 24: Past Tense Suffixes (འདས་པ་) (Surface/Analysis)
[4/4][FAIL] སྦྱངས་ཡི་ => Missing results: སྦྱངས་<v><tv><past>
Test 28: Quantitative Adjectives (གྲངས་ཀྱི་ཁྱད་ཚིག།) (Surface/Analysis)
[2/2][FAIL] བྱི་ལི་ཚུ་ => Missing results: བྱི་ལི་<n><adj><qnt>
Test 9 fails because the analyzer, for some reason, cannot output analysis of two tags despite writing the LEXICON tag twice, "Cases? Cases?" in the PATTERNS section.
Test 14, 24, and 28 fail because while hfst-expand does contain the respective analysis, the morphtest does not pass for some reason.