User:Tfeshba1/Final Project

From LING073
Jump to: navigation, search


I chose to expand my tagger and try to improve its performance. My Ladino transducer can be found here, my Ladino corpus can be found here, my Ladino-English translator can be found here, and my Ladino-English parallel corpus can be found here.


By adding more stems and grammar, I was able to increase coverage over my corpus, the Ladino wikipedia, to ~0.54691384997907486276. This is not a high as I would have liked, but seeing as there are a lot of unique proper nouns, and Ladino doesn't have a standardized orthography, it's likely I wouldn't have been able to get it much higher without hand-adding hundreds of new stems.


I added some more disambiguation rules and fixed some of the earlier ones.

Precision and Recall

With my randomly selected words, I got:

Precision: 100.00000%

Recall: 0.47733%

While disappointing, these numbers make sense. There were a lot of unique proper nouns in the random words, most of which would have no reason to be in the analyzer. Many of the other words were nouns and verbs that were uncommon as well.

To Do

There are still a lot of stems that should be added to the analyzer, and more stems to disambiguate in the rlx file. With more time, these can be expanded. However, due to the irregularity of the Ladino orthography, there will likely still be many stems that aren't caught by the analyzer. A robust spellrelax may help to combat this, so that would need to be significantly developed as well.