Difference between revisions of "Magahi/Transducer"

From LING073
Jump to: navigation, search
(Created page with "= Code = <a href="https://github.swarthmore.edu/Ling073-sp21/ling073-mag">Code on Github</a> = Evaluation = * Total number of stems in the transducer. You can use the follow...")
 
(Notes)
 
(28 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
= Code =  
 
= Code =  
<a href="https://github.swarthmore.edu/Ling073-sp21/ling073-mag">Code on Github</a>
+
* Code on GitHub ["https://github.swarthmore.edu/Ling073-sp21/ling073-mag"]
= Evaluation =
+
 
* Total number of stems in the transducer.  You can use the following method, or count the stems manually.
+
= Analyser Evaluation =
*: <code>lexd -x apertium-xyz.xyz.lexd  > /dev/null</code> (then add the counts for the relevant individual lexicons)
+
== Evaluation ==
* Current coverage over your combined corpus
+
* Total number of stems in the transducer: 74
 +
* Current coverage over your combined corpus: 33.63%
 
* The current list of top unknown words returned by <code>aq-covtest</code>
 
* The current list of top unknown words returned by <code>aq-covtest</code>
**
+
**TOP UNKNOWN WORDS:
 +
  34108 ^आउ/*आउ$
 +
  31801 ^हे/*हे$
 +
  29917 ^हल/*हल$
 +
  21698 ^कि/*कि$
 +
  17942 ^ऊ/*ऊ$
 +
  17754 ^1/*1$
 +
  16677 ^ई/*ई$
 +
  16539 ^हइ/*हइ$
 +
  15114 ^तो/*तो$
 +
  14681 ^गेल/*गेल$
 +
  13642 ^2/*2$
 +
  13166 ^हो/*हो$
 +
  12158 ^न/*न$
 +
  10945 ^सब/*सब$
 +
  9535 ^नयँ/*नयँ$
 +
  9269 ^ओकरा/*ओकरा$
 +
  9129 ^3/*3$
 +
  8973 ^हलइ/*हलइ$
 +
  8948 ^हमरा/*हमरा$
 +
  8908 ^भी/*भी$
 
* Tests
 
* Tests
 
** <b>mag.yaml</b> Total passes: 39, Total fails: 15, Total: 54
 
** <b>mag.yaml</b> Total passes: 39, Total fails: 15, Total: 54
 
** <b>commonwords.yaml</b> Total passes: 4, Total fails: 11, Total: 15
 
** <b>commonwords.yaml</b> Total passes: 4, Total fails: 11, Total: 15
  
 +
== Notes ==
 +
At the moment the pronoun tests other than the personal pronouns don't pass because those pronouns are quite irregular and would require a lot of hard coding. The same thing is true of the auxiliary. The forms of "घोरा" don't work because some phonological process happens with the long vowel at the end of the word we weren't sure how to fix. Also the instrumental doesn't work because lexd doesn't like the combining characters necessary to write a long, nasal e. We also didn't implement the future imperative, because there was only one example of it in our source and the general ending given disagrees with the specific example, so we're not sure how it works exactly.
 +
 +
Adding the danda (|), the postpositions (ke, mẽ, par, se), and the noun magahī updated our basic coverage from 21.16% to 31.71%. Improving the transliteration transducer increased that up to 33.63%.
 +
 +
Since working on the generator and the transliteration transducer, we have fixed some of these issues. We changed the transliteration we were using to more fit typical Devanagari transliterations, and this fixed the nasal e problem. We also figured out the problem with the future imperative and fixed it. We also initially didn't quite understand how the twol files worked and so had some problems with the oblique that worked for analysis, but did generate incorrect forms, but have since fixed that using rules. We also didn't know that the morpheme boundary tags were automatically removed, so we added those to our lexd file.
 +
 +
= Generator Evaluation =
 +
== Initial Evaluation of Morphological Generation==
 +
=== Analysis ===
 +
<code>morph-test -csi tests/mag.yaml</code>
 +
* Total passes: 39, Total fails: 15, Total: 54, Percentage: 72%
 +
* Current coverage over your combined corpus: 33.63%
 +
 +
=== Generation ===
 +
<code>morph-test -cl tests/mag.yaml</code>
 +
* Total passes: 39, Total fails: 15, Total: 54, Percentage: 72%
 +
 +
== Later Evaluation ==
 +
=== Analysis ===
 +
<code>morph-test -csi tests/mag.yaml</code>
 +
* Total passes: 41, Total fails: 13, Total: 54, Percent: 76%
 +
=== Generation ===
 +
<code>morph-test -cl tests/mag.yaml</code>
 +
* Total passes: 41, Total fails: 13, Total: 54, Percent: 76%
  
= Notes =
+
[[Category:Magahi]] [[Category:Sp21 Transducers]]

Latest revision as of 16:27, 21 March 2021

Code

Analyser Evaluation

Evaluation

  • Total number of stems in the transducer: 74
  • Current coverage over your combined corpus: 33.63%
  • The current list of top unknown words returned by aq-covtest
    • TOP UNKNOWN WORDS:
 34108 ^आउ/*आउ$
 31801 ^हे/*हे$
 29917 ^हल/*हल$
 21698 ^कि/*कि$
 17942 ^ऊ/*ऊ$
 17754 ^1/*1$
 16677 ^ई/*ई$
 16539 ^हइ/*हइ$
 15114 ^तो/*तो$
 14681 ^गेल/*गेल$
 13642 ^2/*2$
 13166 ^हो/*हो$
 12158 ^न/*न$
 10945 ^सब/*सब$
  9535 ^नयँ/*नयँ$
  9269 ^ओकरा/*ओकरा$
  9129 ^3/*3$
  8973 ^हलइ/*हलइ$
  8948 ^हमरा/*हमरा$
  8908 ^भी/*भी$
  • Tests
    • mag.yaml Total passes: 39, Total fails: 15, Total: 54
    • commonwords.yaml Total passes: 4, Total fails: 11, Total: 15

Notes

At the moment the pronoun tests other than the personal pronouns don't pass because those pronouns are quite irregular and would require a lot of hard coding. The same thing is true of the auxiliary. The forms of "घोरा" don't work because some phonological process happens with the long vowel at the end of the word we weren't sure how to fix. Also the instrumental doesn't work because lexd doesn't like the combining characters necessary to write a long, nasal e. We also didn't implement the future imperative, because there was only one example of it in our source and the general ending given disagrees with the specific example, so we're not sure how it works exactly.

Adding the danda (|), the postpositions (ke, mẽ, par, se), and the noun magahī updated our basic coverage from 21.16% to 31.71%. Improving the transliteration transducer increased that up to 33.63%.

Since working on the generator and the transliteration transducer, we have fixed some of these issues. We changed the transliteration we were using to more fit typical Devanagari transliterations, and this fixed the nasal e problem. We also figured out the problem with the future imperative and fixed it. We also initially didn't quite understand how the twol files worked and so had some problems with the oblique that worked for analysis, but did generate incorrect forms, but have since fixed that using rules. We also didn't know that the morpheme boundary tags were automatically removed, so we added those to our lexd file.

Generator Evaluation

Initial Evaluation of Morphological Generation

Analysis

morph-test -csi tests/mag.yaml

  • Total passes: 39, Total fails: 15, Total: 54, Percentage: 72%
  • Current coverage over your combined corpus: 33.63%

Generation

morph-test -cl tests/mag.yaml

  • Total passes: 39, Total fails: 15, Total: 54, Percentage: 72%

Later Evaluation

Analysis

morph-test -csi tests/mag.yaml

  • Total passes: 41, Total fails: 13, Total: 54, Percent: 76%

Generation

morph-test -cl tests/mag.yaml

  • Total passes: 41, Total fails: 13, Total: 54, Percent: 76%