Difference between revisions of "Siberian Yupik/Transducer"

From LING073
Jump to: navigation, search
(Corpus Coverage)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
 
=Transducer and Generator=
 
=Transducer and Generator=
 +
* [https://github.swarthmore.edu/Ling073-sp22/ling073-ess ling073-ess repository]
 
===Corpus Coverage===
 
===Corpus Coverage===
 
When I test the transducer over my corpus, I get 28% coverage. I used almost exclusively twol rules for my transducer, so I never had a point where there were only lexd rules for me to test coverage with. Because of this, I didn't really end up with any points of comparison to see which additions to my transducer made large improvements in my corpus coverage. In general though, I only entered rules that added one suffix to a verb or a noun and almost all written words in Siberian Yupik have multiple postbases, endings, and other additions that make them real words. This means that in the precision and recall tests, I got basically zero precision and zero recall in terms of proper analysis of full words. Getting full words to analyze correctly would take a significant amount of further work to implement all of the postbases, endings, and other morphological rules (many of which are not very well suited to the lexd and twol systems).
 
When I test the transducer over my corpus, I get 28% coverage. I used almost exclusively twol rules for my transducer, so I never had a point where there were only lexd rules for me to test coverage with. Because of this, I didn't really end up with any points of comparison to see which additions to my transducer made large improvements in my corpus coverage. In general though, I only entered rules that added one suffix to a verb or a noun and almost all written words in Siberian Yupik have multiple postbases, endings, and other additions that make them real words. This means that in the precision and recall tests, I got basically zero precision and zero recall in terms of proper analysis of full words. Getting full words to analyze correctly would take a significant amount of further work to implement all of the postbases, endings, and other morphological rules (many of which are not very well suited to the lexd and twol systems).
Line 16: Line 17:
 
*{{morphTest|qiya{{tag|v}}{{tag|iv}}{{tag|sg3}}|qiyaaq}} "he/she/it cried"
 
*{{morphTest|qiya{{tag|v}}{{tag|iv}}{{tag|sg3}}|qiyaaq}} "he/she/it cried"
 
**Outputs as qiyauq for the same reason as above, vowel assimilation failed across morpheme boundary with control characters that counted as consonants for other reasons.
 
**Outputs as qiyauq for the same reason as above, vowel assimilation failed across morpheme boundary with control characters that counted as consonants for other reasons.
 +
[[Category:Siberian Yupik]]
 
[[Category:Sp22_Transducers]]
 
[[Category:Sp22_Transducers]]

Latest revision as of 17:30, 18 May 2022

Transducer and Generator

Corpus Coverage

When I test the transducer over my corpus, I get 28% coverage. I used almost exclusively twol rules for my transducer, so I never had a point where there were only lexd rules for me to test coverage with. Because of this, I didn't really end up with any points of comparison to see which additions to my transducer made large improvements in my corpus coverage. In general though, I only entered rules that added one suffix to a verb or a noun and almost all written words in Siberian Yupik have multiple postbases, endings, and other additions that make them real words. This means that in the precision and recall tests, I got basically zero precision and zero recall in terms of proper analysis of full words. Getting full words to analyze correctly would take a significant amount of further work to implement all of the postbases, endings, and other morphological rules (many of which are not very well suited to the lexd and twol systems).

Failed Morphtests

5 out of 50 morphtests failed for a success rate of 90%. None of the inputs return more than one possible output. Almost all of the words that are tested with one rule also succeed with the other rules that apply to them. For example even though there is only a morphtest for "riigte" with an absolute singular case, if riigte is given the tags for ablative-modalis singular case or absolutive plural case the correct output will be produced. I didn't write out morphtests for all of them but tested a good number.

  • iye<n><abl><sg><impers> ↔ iiymeng "from an eye"
    • Outputs as "iymeng" because I could not add vowel duplication.
  • taqegh<n><abs><pl><impers> ↔ taaqghet "veins"
    • Outputs as "taqghet" because I could not add vowel duplication.
  • esleqe<v><iv><sg1> ↔ esleqtenga "I am full"
    • Outputs as esleqenga because the +t addition and ~sf semifinal e deletion didn't interact well.
  • tagi<v><iv><sg3> ↔ tagiiq "he/she/it came"
    • Outputs as "tagiuq" because I couldn't get the vowel assimilation to work across a morpheme boundary when the +t and +g rules were also in use. +t and +g counted as consonants which kept the vowels from appearing adjacent, difficult to fix because the +t and +g needed to be in the consonant set for e-addition rules to work properly.
  • qiya<v><iv><sg3> ↔ qiyaaq "he/she/it cried"
    • Outputs as qiyauq for the same reason as above, vowel assimilation failed across morpheme boundary with control characters that counted as consonants for other reasons.