Dzongkha/Transducer

From LING073
Jump to: navigation, search

Code

Find our GitHub repo here.

Analyser Evaluation

  • Initial Coverage: 1/361
  • Remaining unknown forms: 360
  • Current Coverage: 302 / 952
  • Remaining unknown forms: 650

Lexicon Info

  • Lexicons: 21
  • Lexicon entries: 112
  • Patterns: 5
  • Pattern entries: 13

Counts For Individual Lexicons

  • N-Stems: 24
  • Cases: 6
  • Vbser-Stems: 7
  • Tenses: 5
  • Vaux-Stems: 2
  • Neg: 1
  • V-Stems: 19
  • Adj-Stems: 5
  • Adj-Sup: 1
  • Adj-Comp: 1
  • Prns: 9
  • Punctuation: 24
  • All anonymous lexicons: 8

Current Top Unknown Words

TOP UNKNOWN WORDS:

    TOP UNKNOWN WORDS:
    31 ^།/*།$
    26 ^།།/*།།$
    12 ^མས/*མས$
    11 ^པ/*པ$
     8 ^ར/*ར$
     8 ^་/*་$
     7 ^གིས་/*གིས་$
     6 ^རུ/*རུ$
     6 ^དེ་ལས་/*དེ་ལས་$
     5 ^སྤྱ/*སྤྱ$
     5 ^ལ/*ལ$
     4 ^ས/*ས$
     4 ^ར་/*ར་$
     4 ^ཡ/*ཡ$
     4 ^ནུག/*ནུག$
     4 ^ཌོརན་འདི་/*ཌོརན་འདི་$
     4 ^ཌ/*ཌ$
     3 ^ེ་ན/*ེ་ན$
     3 ^སངྱས་རྡོ་རྗེ་གིས་/*སངྱས་རྡོ་རྗེ་གིས་$
     3 ^ལཱ་/*ལཱ་$

Tests Passed

61/61 of dzo.yaml tests pass.

commonwords.yaml is empty due to tokenization problem.

Generator Evaluation

Initial Evaluation of Morphological Generation

Number of passes and fails for the analysis tests:

  • Total passes: 61, Total fails: 0, Total: 61

Number of passes and fails for the generation test:

  • Total passes: 61, Total fails: 39, Total: 100

Current coverage info: Unable to work on due to tokenization problems.

Final Evaluation of Morphological Generation

Number of passing and failing tests after adding our first set of twol rules:

  • Total passes: 61, Total fails: 22, Total: 83

Number of twol rules we added: 9

Current coverage info: Unable to work on due to tokenization problems.

Notes

As of 2021/3/16, all 61 tests have passed successfully after lexd-U update.

There are 8 tests in total out of the 61 original dzo.yaml file that do not pass. 4 of the tests are for honorific nouns and gendered nouns, both of which are non-productive, and are not part of the morphology. The rest of the 4 failed tests are listed as follows:

Test 9: Dative Suffix (Surface/Analysis) [4/4][FAIL] ཡིག་ཚང་ནང་ལུ་ => Missing results: ཡིག་ཚང་<n><loc><dat>

Test 14: Genitive Suffixes (Surface/Analysis) [6/7][FAIL] མདའ་ཡི་ => Missing results: མདའ་<n><gen>

Test 24: Past Tense Suffixes (འདས་པ་) (Surface/Analysis) [4/4][FAIL] སྦྱངས་ཡི་ => Missing results: སྦྱངས་<v><tv><past>

Test 28: Quantitative Adjectives (གྲངས་ཀྱི་ཁྱད་ཚིག།) (Surface/Analysis) [2/2][FAIL] བྱི་ལི་ཚུ་ => Missing results: བྱི་ལི་<n><adj><qnt>

Test 9 fails because the analyzer, for some reason, cannot output analysis of two tags despite writing the LEXICON tag twice, "Cases? Cases?" in the PATTERNS section.

Test 14, 24, and 28 fail because while hfst-expand does contain the respective analysis, the morphtest does not pass for some reason.