Difference between revisions of "Berik/Transducer"

From LING073
Jump to: navigation, search
(Evaluation)
Line 4: Line 4:
  
 
* Current corpus coverage: 34.65%
 
* Current corpus coverage: 34.65%
* Number of stems: UNKNOWN
+
* Number of stems: 112
 +
** Does not include pronouns (4?) and verbs (3?)
 
* Top unknown words:
 
* Top unknown words:
 
** jei (84)
 
** jei (84)

Revision as of 11:18, 1 March 2018

Code: https://github.swarthmore.edu/jspring1/ling073-bkl

Evaluation

  • Current corpus coverage: 34.65%
  • Number of stems: 112
    • Does not include pronouns (4?) and verbs (3?)
  • Top unknown words:
    • jei (84)
    • aa (64)
    • jeta (42)
    • jeiserem (36)
    • Jei (35)
    • ge (31)
    • taterisi (11)
    • aane (11)
    • jebe (9)
    • bosna (9)
    • anes (9)
    • asal (8)
    • asala (8)
    • Aamai (8)
    • temawer (8)
    • Jepga (8)
    • aaiserem (8)
    • enggame (7)
    • Taterisi (7)
    • Sanbagiri (6)
  • Analyzer tests passing: 85/117 (72.65%)
  • Generator tests passing: 85/152 (55.92%)
  • Corpus tests passing: 14/33 (42.42%)

Notes

Initial corpus coverage was 21.98%.

Coverage was raised to 34.65% by adding

  • jamere<locl> ↔ jamere
  • Yesus<n> ↔ Yesus
  • Yusuf<n> ↔ Yusuf
  • Maria<n> ↔ Maria
  • Daud<n> ↔ Daud
  • angtane<n> ↔ angtane
  • raja<n> ↔ raja
  • taman<n> ↔ taman
  • kapka<adj> ↔ kapka
  • se<imp> ↔ se
  • je<prn><pos> ↔ jemna
  • je<prn><subj> ↔ jam
  • je<prn>+wer<post> ↔ jewer
  • gam<part> ↔ gam