Amis/Disambiguation

From LING073
Jump to: navigation, search

ling073-ami github: https://github.swarthmore.edu/Ling073-sp23/ling073-ami

initial evaluation of ambiguity

to remove weird characters from tests, run command line measurements with options like this:
cat ami.corpus.basic.txt | apertium-destxt | hfst-proc ./ami.automorf.hfst | apertium-retxt | sed 's/$\W*\^/$\n^/g' | wc -l

ex., to get all of the ambiguous analyses do:
cat ami.corpus.basic.txt | apertium-destxt | hfst-proc ./ami.automorf.hfst | apertium-retxt | sed 's/$\W*\^/$\n^/g' | grep '\/.*\/'

The ambiguity of a corpus is the average number of analyses provided per token. Our initial ambiguity is 2066/1941 = 1.064

The number of all forms is 6283. The number of unique forms is 6273.

The comparison:
40,41d39
< ako
< aku
68d65
< iso
70,71d66
< isu
< ita
4827,4828d4821
< nami
< namo
4831,4832d4823
< namu
< nangra
5020d5010
< ningra


Sentences

For genitive vs. possessive disambiguation. Possessive only appears directly after noun marker (<nm>).

echo "Ulahen namu ku ina namu" | apertium -d. ami-morph
^Ulahen/ulah<v><uv><futi>$ ^namu/amu<prn><pers><pl><p2><gen>/amu<prn><pers><pl><p2><pos>$ ^ku/u<nm><un><nom>$ ^ina/ina<n>$ ^namu/amu<prn><pers><pl><p2><gen>/amu<prn><pers><pl><p2><pos>$

echo "Ngaʼay ho ku namu" | apertium -d. ami-morph
^Ngaʼay/ngaʼay<adj>$ ^ho/ho<asp><impf>$ ^ku/u<nm><un><nom>$ ^namu/amu<prn><pers><pl><p2><pos>/amu<prn><pers><pl><p2><gen>$

final evaluation of ambiguity

The number of tokens is 2114. The total number of analyses is 2239. The ambiguity is 2239/2114 = 1.059