Difference between revisions of "Waray/Disambiguation"
From LING073
(Created page with "==Morphological Disambiguation== ===Initial Level(s) of Ambiguity=== ===Differentiating a Pronoun from a Determiner=== There are two readings of 'nira': while common nouns us...") |
(→Morphological Disambiguation) |
||
(28 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==Morphological Disambiguation== | ==Morphological Disambiguation== | ||
− | ===Initial | + | ===Github Repository=== |
+ | * [https://github.swarthmore.edu/Ling073-sp21/ling073-war/blob/master/apertium-war.war.rlx Link to <code>.rlx</code> file] | ||
+ | * [https://github.swarthmore.edu/Ling073-sp21/ling073-war Link to parent repository] | ||
+ | |||
+ | ===Initial Evaluation of Ambiguity=== | ||
+ | Level of ambiguity before disambiguation: ~1.05008077544426494346 | ||
===Differentiating a Pronoun from a Determiner=== | ===Differentiating a Pronoun from a Determiner=== | ||
− | There are two readings of 'nira': while common nouns usually proceed the word ' | + | There are two readings of 'hira' ('they') and its Class II possessive variation 'nira' ('their'): while common nouns usually proceed the word 'hira', the following words can help determine whether or not the word is a pronoun or a determiner. In the determiner case, 'hira' is usually followed by at least two—usually proper—nouns (a compound subject). |
+ | |||
====Pronoun Reading==== | ====Pronoun Reading==== | ||
− | + | ; Kwarta nira [''Their money''] * | |
+ | : Before Disambiguation: | ||
+ | ::* ^Kwarta/kwarta<n>$ '''^nira/hira<prn><pers><p3><pl><pos>/ni<det><pos>+ra<det><pl>$'''^./.<sent>$ | ||
+ | : After Disambiguation: | ||
+ | ::* ^Kwarta<n>$ ^'''hira<prn><pers><p3><pl><pos>$'''^.<sent>$ | ||
+ | |||
+ | ; Ngan amo ini an gintikangan pagbuhat nira [''They have begun to do this''] | ||
+ | : Before Disambiguation: | ||
+ | ::* ^Ngan/ngan<det><pl>$ ^amo/*amo$ ^ini/ini<prn><det><dem>$ ^an/an<det><nom>$ ^gintikangan/*gintikangan$ ^pagbuhat/*pagbuhat$ '''^nira/hira<prn><pers><p3><pl><pos>/ni<det><pos>+ra<det><pl>$'''^./.<sent>$ | ||
+ | : After Disambiguation: | ||
+ | ::* ^Ngan<det><pl>$ ^*amo$ ^ini<prn><det><dem>$ ^an<det><nom>$ ^*gintikangan$ ^*pagbuhat$ '''^hira<prn><pers><p3><pl><pos>$'''^.<sent>$ | ||
====Determiner Reading==== | ====Determiner Reading==== | ||
− | + | ; Kwarta nira Rico ngan Bobong [''Rico and Bobong's money OR Money of Rico and Bobong''] * | |
− | * Mgbantay pa hiya ha balay nira ni Rhoda ngan Romulo | + | : Before Disambiguation: |
+ | ::* ^Kwarta/kwarta<n>$ '''^nira/hira<prn><pers><p3><pl><pos>/ni<det><pos>+ra<det><pl>$''' ^Rico/Rico<np><ant><m>$ ^ngan/ngan<det><pl>$ ^Bobong/Bobong<np><ant><m>$^./.<sent>$ | ||
+ | :; After Disambiguation: | ||
+ | ::* ^Kwarta<n>$ '''^ni<det><pos>+ra<det><pl>$''' ^Rico<np><ant><m>$ ^ngan<det><pl>$ ^Bobong<np><ant><m>$^.<sent>$ | ||
+ | |||
+ | ; Mgbantay pa hiya ha balay nira ni Rhoda ngan Romulo [''He will still guard Rhoda and Romulo’s house''] | ||
+ | : Before Disambiguation: | ||
+ | ::* ^Mgbantay/*Mgbantay$ ^pa/pa<adv>$ ^hiya/hiya<prn><pers><p3><sg><nom>$ ^ha/ha<det><obl>$ ^balay/balay<n>$ '''^nira/hira<prn><pers><p3><pl><pos>/ni<det><pos>+ra<det><pl>$''' ^ni/ni<det><pos>$ ^Rhoda/Rhoda<np><ant><f>$ ^ngan/ngan<det><pl>$ ^Romulo/Romulo<np><ant><man>$^./.<sent>$ | ||
+ | : After Disambiguation: | ||
+ | ::* ^*Mgbantay$ ^pa<adv>$ ^hiya<prn><pers><p3><sg><nom>$ ^ha<det><obl>$ ^balay<n>$ '''^ni<det><pos>+ra<det><pl>$''' ^ni<det><pos>$ ^Rhoda<np><ant><f>$ ^ngan<det><pl>$ ^Romulo<np><ant><man>$^.<sent>$ | ||
+ | |||
+ | ''* Not in corpus'' | ||
+ | |||
+ | ===Added Rules=== | ||
+ | # If word precedes at least one proper noun, select determiner reading | ||
+ | #: SELECT Determiner IF (0 (prn)) (1 (np)) ; | ||
+ | # If word precedes a determiner and at least one proper noun, select determiner reading | ||
+ | #:SELECT Determiner IF (0 (prn)) (1 (det)) (2 (np)) ; | ||
+ | # Pronoun case | ||
+ | #: REMOVE Determiner IF (0 (prn)) (1 EOS) ; | ||
+ | |||
+ | ===Final Evaluation of Ambiguity=== | ||
+ | ''(As of Apr 27, 2021)'' | ||
+ | Level of ambiguity after disambiguation: ~1.05008077544426494346 ** | ||
+ | |||
+ | * Total number of forms: 136 | ||
+ | * Number of unique forms: 133 | ||
+ | |||
+ | ''** The ambiguity is the same because there is not yet enough context in the lexicon for the disambiguation to properly disambiguate'' | ||
+ | |||
[[Category:Sp21_Disambiguation]] [[Category:Waray]] | [[Category:Sp21_Disambiguation]] [[Category:Waray]] |
Latest revision as of 14:30, 29 April 2021
Contents
Morphological Disambiguation
Github Repository
Initial Evaluation of Ambiguity
Level of ambiguity before disambiguation: ~1.05008077544426494346
Differentiating a Pronoun from a Determiner
There are two readings of 'hira' ('they') and its Class II possessive variation 'nira' ('their'): while common nouns usually proceed the word 'hira', the following words can help determine whether or not the word is a pronoun or a determiner. In the determiner case, 'hira' is usually followed by at least two—usually proper—nouns (a compound subject).
Pronoun Reading
- Kwarta nira [Their money] *
- Before Disambiguation:
- ^Kwarta/kwarta<n>$ ^nira/hira<prn><pers><p3><pl><pos>/ni<det><pos>+ra<det><pl>$^./.<sent>$
- After Disambiguation:
- ^Kwarta<n>$ ^hira<prn><pers><p3><pl><pos>$^.<sent>$
- Ngan amo ini an gintikangan pagbuhat nira [They have begun to do this]
- Before Disambiguation:
- ^Ngan/ngan<det><pl>$ ^amo/*amo$ ^ini/ini<prn><det><dem>$ ^an/an<det><nom>$ ^gintikangan/*gintikangan$ ^pagbuhat/*pagbuhat$ ^nira/hira<prn><pers><p3><pl><pos>/ni<det><pos>+ra<det><pl>$^./.<sent>$
- After Disambiguation:
- ^Ngan<det><pl>$ ^*amo$ ^ini<prn><det><dem>$ ^an<det><nom>$ ^*gintikangan$ ^*pagbuhat$ ^hira<prn><pers><p3><pl><pos>$^.<sent>$
Determiner Reading
- Kwarta nira Rico ngan Bobong [Rico and Bobong's money OR Money of Rico and Bobong] *
- Before Disambiguation:
- ^Kwarta/kwarta<n>$ ^nira/hira<prn><pers><p3><pl><pos>/ni<det><pos>+ra<det><pl>$ ^Rico/Rico<np><ant><m>$ ^ngan/ngan<det><pl>$ ^Bobong/Bobong<np><ant><m>$^./.<sent>$
- After Disambiguation
-
- ^Kwarta<n>$ ^ni<det><pos>+ra<det><pl>$ ^Rico<np><ant><m>$ ^ngan<det><pl>$ ^Bobong<np><ant><m>$^.<sent>$
- Mgbantay pa hiya ha balay nira ni Rhoda ngan Romulo [He will still guard Rhoda and Romulo’s house]
- Before Disambiguation:
- ^Mgbantay/*Mgbantay$ ^pa/pa<adv>$ ^hiya/hiya<prn><pers><p3><sg><nom>$ ^ha/ha<det><obl>$ ^balay/balay<n>$ ^nira/hira<prn><pers><p3><pl><pos>/ni<det><pos>+ra<det><pl>$ ^ni/ni<det><pos>$ ^Rhoda/Rhoda<np><ant><f>$ ^ngan/ngan<det><pl>$ ^Romulo/Romulo<np><ant><man>$^./.<sent>$
- After Disambiguation:
- ^*Mgbantay$ ^pa<adv>$ ^hiya<prn><pers><p3><sg><nom>$ ^ha<det><obl>$ ^balay<n>$ ^ni<det><pos>+ra<det><pl>$ ^ni<det><pos>$ ^Rhoda<np><ant><f>$ ^ngan<det><pl>$ ^Romulo<np><ant><man>$^.<sent>$
* Not in corpus
Added Rules
- If word precedes at least one proper noun, select determiner reading
- SELECT Determiner IF (0 (prn)) (1 (np)) ;
- If word precedes a determiner and at least one proper noun, select determiner reading
- SELECT Determiner IF (0 (prn)) (1 (det)) (2 (np)) ;
- Pronoun case
- REMOVE Determiner IF (0 (prn)) (1 EOS) ;
Final Evaluation of Ambiguity
(As of Apr 27, 2021) Level of ambiguity after disambiguation: ~1.05008077544426494346 **
- Total number of forms: 136
- Number of unique forms: 133
** The ambiguity is the same because there is not yet enough context in the lexicon for the disambiguation to properly disambiguate