Difference between revisions of "Magahi/Final Project"

From LING073
Jump to: navigation, search
(What we did)
Line 1: Line 1:
 
= What we did =
 
= What we did =
 
We improved our morphological transducer by figuring out how to analyze the top unknown words repeatedly.
 
We improved our morphological transducer by figuring out how to analyze the top unknown words repeatedly.
 +
 +
This consisted of adding new lemmas to our lexd as well as new patterns to cover compound verbs and adjectives.
  
 
= Evaluation =
 
= Evaluation =

Revision as of 21:15, 20 May 2021

What we did

We improved our morphological transducer by figuring out how to analyze the top unknown words repeatedly.

This consisted of adding new lemmas to our lexd as well as new patterns to cover compound verbs and adjectives.

Evaluation

Initial Evaluation

  • Coverage: 94567 / 187701 (~0.50381724125071256946)
  • Totals: 142 forms, 339 tp, 12 fp, 0 tn, 65 fn
  • Precision: 96.58120%
  • Recall: 83.91089%
  • Unknown words

   2785 ^हे/*हे$
   960 ^नञ/*नञ$
   793 ^ऊ/*ऊ$
   655 ^अपपन/*अपपन$
   478 ^हममर/*हममर$
   437 ^तो/*तो$
   429 ^कवि/*कवि$
   424 ^न/*न$
   414 ^तऽ/*तऽ$
   397 ^ले/*ले$
   396 ^कुमार/*कुमार$
   375 ^जे/*जे$
   367 ^हऽ/*हऽ$
   352 ^सिंह/*सिंह$
   344 ^की/*की$
   317 ^लेल/*लेल$
   309 ^साहित/*साहित$
   290 ^घर/*घर$
   280 ^कविता/*कविता$
   255 ^उनखर/*उनखर$

  • Remaining unknown forms: 93134
  • Total number of forms: 362
  • Lexical forms (not morphology): 273

Final Evaluation

  • Coverage: 150272 / 187701 (~0.80059243158001289285)
  • Totals: 142 forms, 340 tp, 103 fp, 0 tn, 64 fn
  • Precision: 76.74944% (went down because we introduced ambiguity by massively expanding the lexicon; it was artificially high before because all of our lexicon was based on that story)
  • Recall: 84.15842%
  • Unknown words

    58 ^डॉ०/*डॉ०$
    21 ^हौले/*हौले$
    20 ^गाड/*गाड$
    19 ^सुनावे/*सुनावे$
    19 ^छो/*छो$
    18 ^हलूं/*हलूं$
    18 ^विदवान/*विदवान$
    18 ^तोरे/*तोरे$
    18 ^जुगाड़/*जुगाड़$
    18 ^गते/*गते$
    17 ^होते/*होते$
    17 ^हिंछा/*हिंछा$
    17 ^हाँथ/*हाँथ$
    17 ^सथान/*सथान$
    17 ^सजल/*सजल$
    17 ^वाह/*वाह$
    17 ^योगदान/*योगदान$
    17 ^मुनचुन/*मुनचुन$
    17 ^महाकवि/*महाकवि$
    17 ^मरदाना/*मरदाना$

  • Remaining unknown forms: 37429
  • Total number of forms: 1172
  • Lexical forms (not morphology): 1012