Difference between revisions of "User:Tfeshba1/Final Project"

From LING073
Jump to: navigation, search
(Created page with "==Overview== I chose to expand my tagger and try to improve its performance. ==Analyzer== By adding more stems and grammar, I was able to increase coverage over my corpus, th...")
 
(Precision and Recall)
Line 16: Line 16:
  
 
While disappointing, these numbers make sense. There were a lot of unique proper nouns in the random words, most of which would have no reason to be in the analyzer. Many of the other words were nouns and verbs that were uncommon as well.
 
While disappointing, these numbers make sense. There were a lot of unique proper nouns in the random words, most of which would have no reason to be in the analyzer. Many of the other words were nouns and verbs that were uncommon as well.
 
  
 
==To Do==
 
==To Do==
 
There are still a lot of stems that should be added to the analyzer, and more stems to disambiguate in the rlx file. With more time, these can be expanded. However, due to the irregularity of the Ladino orthography, there will likely still be many stems that aren't caught by the analyzer. A robust spellrelax may help to combat this, so that would need to be significantly developed as well.
 
There are still a lot of stems that should be added to the analyzer, and more stems to disambiguate in the rlx file. With more time, these can be expanded. However, due to the irregularity of the Ladino orthography, there will likely still be many stems that aren't caught by the analyzer. A robust spellrelax may help to combat this, so that would need to be significantly developed as well.

Revision as of 11:48, 20 May 2021

Overview

I chose to expand my tagger and try to improve its performance.

Analyzer

By adding more stems and grammar, I was able to increase coverage over my corpus, the Ladino wikipedia, to ~0.54691384997907486276. This is not a high as I would have liked, but seeing as there are a lot of unique proper nouns, and Ladino doesn't have a standardized orthography, it's likely I wouldn't have been able to get it much higher without hand-adding hundreds of new stems.

Disambiguation

I added some more disambiguation rules and fixed some of the earlier ones.

Precision and Recall

With my randomly selected words, I got:

Precision: 100.00000%

Recall: 0.47733%

While disappointing, these numbers make sense. There were a lot of unique proper nouns in the random words, most of which would have no reason to be in the analyzer. Many of the other words were nouns and verbs that were uncommon as well.

To Do

There are still a lot of stems that should be added to the analyzer, and more stems to disambiguate in the rlx file. With more time, these can be expanded. However, due to the irregularity of the Ladino orthography, there will likely still be many stems that aren't caught by the analyzer. A robust spellrelax may help to combat this, so that would need to be significantly developed as well.