Difference between revisions of "Magahi/Transducer"

From LING073
Jump to: navigation, search
(Evaluation)
(Evaluation)
Line 5: Line 5:
 
* Total number of stems in the transducer.  You can use the following method, or count the stems manually.
 
* Total number of stems in the transducer.  You can use the following method, or count the stems manually.
 
** Lexicons: 14
 
** Lexicons: 14
Lexicon entries: 74
+
  Lexicon entries: 74
Patterns: 1
+
  Patterns: 1
Pattern entries: 6
+
  Pattern entries: 6
  
Counts for individual lexicons:
+
  Counts for individual lexicons:
NounRoot: 7
+
  NounRoot: 7
Case: 7
+
  Case: 7
Punctuation: 23
+
  Punctuation: 23
VerbRoot: 3
+
  VerbRoot: 3
Tense: 4
+
  Tense: 4
Subject: 3
+
  Subject: 3
ObjectHonorific: 4
+
  ObjectHonorific: 4
Aspect: 6
+
  Aspect: 6
PersonalPronoun: 10
+
  PersonalPronoun: 10
Postposition: 4
+
  Postposition: 4
All anonymous lexicons: 3
+
  All anonymous lexicons: 3
 
*: <code>lexd -x apertium-xyz.xyz.lexd  > /dev/null</code> (then add the counts for the relevant individual lexicons)
 
*: <code>lexd -x apertium-xyz.xyz.lexd  > /dev/null</code> (then add the counts for the relevant individual lexicons)
 
* Current coverage over your combined corpus: 26.84%
 
* Current coverage over your combined corpus: 26.84%

Revision as of 22:56, 16 March 2021

Code

Evaluation

  • Total number of stems in the transducer. You can use the following method, or count the stems manually.
    • Lexicons: 14
 Lexicon entries: 74
 Patterns: 1
 Pattern entries: 6
 Counts for individual lexicons:
 NounRoot: 7
 Case: 7
 Punctuation: 23
 VerbRoot: 3
 Tense: 4
 Subject: 3
 ObjectHonorific: 4
 Aspect: 6
 PersonalPronoun: 10
 Postposition: 4
 All anonymous lexicons: 3
  • lexd -x apertium-xyz.xyz.lexd > /dev/null (then add the counts for the relevant individual lexicons)
  • Current coverage over your combined corpus: 26.84%
  • The current list of top unknown words returned by aq-covtest
    • TOP UNKNOWN WORDS:
 58322 ^में/*में$
 47253 ^से/*से$
 34108 ^आउ/*आउ$
 31801 ^हे/*हे$
 29917 ^हल/*हल$
 21874 ^पर/*पर$
 21698 ^कि/*कि$
 17942 ^ऊ/*ऊ$
 17754 ^1/*1$
 16677 ^ई/*ई$
 16539 ^हइ/*हइ$
 15114 ^तो/*तो$
 14681 ^गेल/*गेल$
 13642 ^2/*2$
 13306 ^हम/*हम$
 13166 ^हो/*हो$
 12158 ^न/*न$
 10945 ^सब/*सब$
  9535 ^नयँ/*नयँ$
  9269 ^ओकरा/*ओकरा$
  • Tests
    • mag.yaml Total passes: 39, Total fails: 15, Total: 54
    • commonwords.yaml Total passes: 4, Total fails: 11, Total: 15

Notes

At the moment the pronoun tests other than the personal pronouns don't pass because those pronouns are quite irregular and would require a lot of hard coding. The same thing is true of the auxiliary. The forms of "घोरा" don't work because some phonological process happens with the long vowel at the end of the word we weren't sure how to fix. Also the instrumental doesn't work because lexd doesn't like the combining characters necessary to write a long, nasal e. We also didn't implement the future imperative, because there was only one example of it in our source and the general ending given disagrees with the specific example, so we're not sure how it works exactly.
Adding the dunda (|), the postpositions (ke, mẽ, par, se), and the noun magahī updated our basic coverage from 21.16% to 26.84%.