Berik/Transducer

From LING073
Revision as of 11:44, 1 March 2018 by Dswanso1 (Talk | contribs)

Jump to: navigation, search

Code: https://github.swarthmore.edu/jspring1/ling073-bkl

Evaluation

  • Current corpus coverage: 42.84%
  • Number of stems: 112
    • Does not include pronouns (4?) and verbs (3?)
  • Top unknown words:
    • aa (64)
    • jeta (42)
    • ge (31)
    • taterisi (11)
    • aane (11)
    • jebe (9)
    • bosna (9)
    • anes (9)
    • asal (8)
    • asala (8)
    • Aamai (8)
    • temawer (8)
    • Jepga (8)
    • aaiserem (8)
    • enggame (7)
    • Taterisi (7)
    • Sanbagiri (6)
  • Analyzer tests passing: 85/117 (72.65%)
  • Generator tests passing: 85/152 (55.92%)
  • Corpus tests passing: 14/33 (42.42%)

Notes

Initial corpus coverage was 21.98%.

Coverage was raised to 34.65% by adding

  • jamere<locl> ↔ jamere
  • Yesus<n> ↔ Yesus
  • Yusuf<n> ↔ Yusuf
  • Maria<n> ↔ Maria
  • Daud<n> ↔ Daud
  • angtane<n> ↔ angtane
  • raja<n> ↔ raja
  • taman<n> ↔ taman
  • kapka<adj> ↔ kapka
  • se<imp> ↔ se
  • je<prn><pos> ↔ jemna
  • je<prn><subj> ↔ jam
  • je<prn>+wer<post> ↔ jewer
  • gam<part> ↔ gam

Adding "jei" as a form of "je" raised coverage to 42.84%.

  • je<prn> ↔ jei
  • je<prn><dem><part> ↔ jeiserem