Initial Level of Ambiguity
The only ambiguity currently present in my analyses is el (pronoun) and el (determiner). I found 26 cases of el/el ambiguity in my basic corpus text file.
Yo komo el limon. [I ate the lemon] ^Yo/yo<prn><pers><p1><sg><nom>$ ^komo/komer<v><tv><pres><p1><sg>$ ^el/el<det><def><m><sg>/el<prn><pers><p3><m><sg><nom>$ ^limon/limón<n><m><sg>$^./.<sent>$
El kanta. [He sings]. ^El/el<det><def><m><sg>/el<prn><pers><p3><m><sg><nom>$ ^kanta/kantar<v><iv><pres><p3><sg>$^./.<sent>$
I created these examples so that I could do disambiguation faster, because I still need to add essential words like 'de', 'en', 'i', and other words that would be very common in sentences from my corpus. Once I add those, I may do this again with sentences from my corpus.
The rules I created are :
- If there is a singular noun to the right, el cannot be a pronoun and is a determiner - remove the pronoun reading
- REMOVE Pronoun IF (1 (n m sg)) ;
- If there is a not singular noun to the right, el cannot be a determiner and is a pronoun - remove the determiner reading
- REMOVE Determiner IF (NOT 1 (n m sg)) ;
The git repository for lad disambiguation rules can be found here.
Final Evaluation of Ambiguity
Ambiguity before disambiguation: ~1.02079002079002079002
Ambiguity after disambiguation: ~1.00207900207900207900