User:Ekenast1/Final project

From LING073
Jump to: navigation, search

Siberian Yupik Transducer Expansion

Transducer GitHub

Final Poster

For my project I substantially expanded the number of suffixes that my transducer could support by implementing nearly all of the noun suffixes in the grammar book I have and a lot of verb suffixes. This ended up being more than 350 new suffixes added to the transducer and required me to change a bit of the tagging system that I was previously using. Before this, I was not tagging any noun suffixes with the number of noun itself and I was not tagging any of the verb suffixes with person and number of the verb object (mostly because I didn't understand how it worked and partially because I had not dealt with any forms that required tags of that sort). I added these tags and organized the tags that I previously had in a more systematic way. For nouns the tag order goes <number and possessor><number of noun><noun suffix type>, for verbs the tag order goes <subject number and possessor><object number and possessor or intransitive><verb suffix type>. While doing this all of this I had to make some changes to my twol file so that all the rules were carried out correctly.

Evaluation

The grammar book I was referencing had a couple tables showing the correct forms of the suffixes when applied to specific words. I tested my transducer with the specific words and found that all of the nouns that I tested were generating correctly. The verbs were also mostly working well but there were some issues with twol rules that have caused problems previously in the semester and popped up again here. There are some operations in Siberian Yupik that happen only after one other operation is carried out and because twol carries out all operations in parallel they are difficult to get working.

I also tested corpus coverage and had the same questionable results I have been having the whole semester. Because there are postbases and endings on Siberian Yupik words as well as the suffixes (which I don't really have implemented), my transducer struggles to fully analyze most of the words in the corpus. Before this project I had 27% coverage over my corpus and now I have a 30% coverage. This small increase is also likely attributed to the fact that the lexicon within my monolingual transducer is small and there were few words for the new suffixes to apply to.