Difference between revisions of "Berik/Transducer"

From LING073
Jump to: navigation, search
(Evaluation)
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
Code: [https://github.swarthmore.edu/jspring1/ling073-bkl https://github.swarthmore.edu/jspring1/ling073-bkl]
 
Code: [https://github.swarthmore.edu/jspring1/ling073-bkl https://github.swarthmore.edu/jspring1/ling073-bkl]
  
== Evaluation ==
+
=== General Evaluation ===
  
* Current corpus coverage: 34.65%
+
* Current corpus coverage: 44.80%
 
* Number of stems: 112
 
* Number of stems: 112
 
** Does not include pronouns (4?) and verbs (3?)
 
** Does not include pronouns (4?) and verbs (3?)
 
* Top unknown words:
 
* Top unknown words:
** jei (84)
 
 
** aa (64)
 
** aa (64)
 
** jeta (42)
 
** jeta (42)
** jeiserem (36)
 
** Jei (35)
 
 
** ge (31)
 
** ge (31)
 
** taterisi (11)
 
** taterisi (11)
Line 27: Line 24:
 
** Taterisi (7)
 
** Taterisi (7)
 
** Sanbagiri (6)
 
** Sanbagiri (6)
* Analyzer tests passing: 85/117 (72.65%)
+
 
* Generator tests passing: 85/152 (55.92%)
 
 
* Corpus tests passing: 14/33 (42.42%)
 
* Corpus tests passing: 14/33 (42.42%)
 +
 +
=== Analyzer Evaluation ===
 +
 +
* Analyzer tests passing: 92/118 (77.97%)
 +
 +
=== Generator Evaluation ===
 +
 +
* Generator tests passing: 92/136 (67.65%)
  
 
== Notes ==
 
== Notes ==
Line 50: Line 54:
 
* {{morphTest|je{{tag|prn}}+wer{{tag|post}}|jewer}}
 
* {{morphTest|je{{tag|prn}}+wer{{tag|post}}|jewer}}
 
* {{morphTest|gam{{tag|part}}|gam}}
 
* {{morphTest|gam{{tag|part}}|gam}}
 +
 +
Adding "jei" as a form of "je" raised coverage to 42.84%.
 +
* {{morphTest|je{{tag|prn}}|jei}}
 +
* {{morphTest|je{{tag|prn}}{{tag|dem}}{{tag|part}}|jeiserem}}
 +
 +
=== Certain tests still do not work ===
 +
 +
Distance
 +
* {{morphTest|disultena{{tag|v}}{{tag|tv}}{{tag|dst}}|disultetna}}
 +
* {{morphTest|gwerana{{tag|v}}{{tag|tv}}{{tag|dst}}|gwerantetna}}
 +
 +
Gender
 +
* {{morphTest|sarbana{{tag|v}}{{tag|tv}}{{tag|f}}|sarbili}}
 +
* {{morphTest|eyebana{{tag|v}}{{tag|tv}}{{tag|f}}|eyebili}}
 +
* {{morphTest|gwebana{{tag|v}}{{tag|tv}}{{tag|f}}|gwebili}}
 +
* {{morphTest|gerbana{{tag|v}}{{tag|tv}}{{tag|f}}|golbili}}
 +
* {{morphTest|damtana{{tag|v}}{{tag|tv}}{{tag|f}}|domola}}
 +
* {{morphTest|saftana{{tag|v}}{{tag|tv}}{{tag|f}}|sofola}}
 +
 +
Number
 +
* {{morphTest|jila{{tag|v}}{{tag|iv}}{{tag|du}}|ge jila}}
 +
* {{morphTest|sofwa{{tag|v}}{{tag|iv}}{{tag|du}}|ge sofwa}}
 +
* {{morphTest|nasona{{tag|v}}{{tag|iv}}{{tag|du}}|ge nasona}}
 +
* {{morphTest|fina{{tag|v}}{{tag|iv}}{{tag|du}}|ge fina}}
 +
* {{morphTest|jila{{tag|v}}{{tag|iv}}{{tag|pl}}|ge jalbili}}
 +
* {{morphTest|sofwa{{tag|v}}{{tag|iv}}{{tag|pl}}|ge sofwabili}}
 +
* {{morphTest|nasona{{tag|v}}{{tag|iv}}{{tag|pl}}|ge nasbawena}}
 +
* {{morphTest|fina{{tag|v}}{{tag|iv}}{{tag|pl}}|ge fibili}}
 +
 +
These tests do not work since we did not get around to implementing them, and were not common enough in our corpus to warrant implementing, compared to other more common forms.
 +
 +
 +
 +
[[Category:Sp18_Transducers]]

Latest revision as of 21:05, 5 March 2018

Code: https://github.swarthmore.edu/jspring1/ling073-bkl

General Evaluation

  • Current corpus coverage: 44.80%
  • Number of stems: 112
    • Does not include pronouns (4?) and verbs (3?)
  • Top unknown words:
    • aa (64)
    • jeta (42)
    • ge (31)
    • taterisi (11)
    • aane (11)
    • jebe (9)
    • bosna (9)
    • anes (9)
    • asal (8)
    • asala (8)
    • Aamai (8)
    • temawer (8)
    • Jepga (8)
    • aaiserem (8)
    • enggame (7)
    • Taterisi (7)
    • Sanbagiri (6)
  • Corpus tests passing: 14/33 (42.42%)

Analyzer Evaluation

  • Analyzer tests passing: 92/118 (77.97%)

Generator Evaluation

  • Generator tests passing: 92/136 (67.65%)

Notes

Initial corpus coverage was 21.98%.

Coverage was raised to 34.65% by adding

  • jamere<locl> ↔ jamere
  • Yesus<n> ↔ Yesus
  • Yusuf<n> ↔ Yusuf
  • Maria<n> ↔ Maria
  • Daud<n> ↔ Daud
  • angtane<n> ↔ angtane
  • raja<n> ↔ raja
  • taman<n> ↔ taman
  • kapka<adj> ↔ kapka
  • se<imp> ↔ se
  • je<prn><pos> ↔ jemna
  • je<prn><subj> ↔ jam
  • je<prn>+wer<post> ↔ jewer
  • gam<part> ↔ gam

Adding "jei" as a form of "je" raised coverage to 42.84%.

  • je<prn> ↔ jei
  • je<prn><dem><part> ↔ jeiserem

Certain tests still do not work

Distance

  • disultena<v><tv><dst> ↔ disultetna
  • gwerana<v><tv><dst> ↔ gwerantetna

Gender

  • sarbana<v><tv><f> ↔ sarbili
  • eyebana<v><tv><f> ↔ eyebili
  • gwebana<v><tv><f> ↔ gwebili
  • gerbana<v><tv><f> ↔ golbili
  • damtana<v><tv><f> ↔ domola
  • saftana<v><tv><f> ↔ sofola

Number

  • jila<v><iv><du> ↔ ge jila
  • sofwa<v><iv><du> ↔ ge sofwa
  • nasona<v><iv><du> ↔ ge nasona
  • fina<v><iv><du> ↔ ge fina
  • jila<v><iv><pl> ↔ ge jalbili
  • sofwa<v><iv><pl> ↔ ge sofwabili
  • nasona<v><iv><pl> ↔ ge nasbawena
  • fina<v><iv><pl> ↔ ge fibili

These tests do not work since we did not get around to implementing them, and were not common enough in our corpus to warrant implementing, compared to other more common forms.