Difference between revisions of "Magahi/Disambiguation"
(→Initial Evaluation of Ambiguity) |
(→Repository) |
||
(21 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | = Repository = | ||
+ | [https://github.swarthmore.edu/Ling073-sp21/ling073-mag Repository] | ||
+ | |||
= Initial Evaluation of Ambiguity = | = Initial Evaluation of Ambiguity = | ||
+ | ./disambiguation-test.sh | ||
+ | basic corpus (mag.corpus.basic.txt):<br> | ||
+ | Ambiguity before disambiguation: ~1.11354466858789625360<br> | ||
+ | Ambiguity after disambiguation: ~1.04495677233429394813<br> | ||
+ | full corpus (mag.corpus.full.txt):<br> | ||
+ | Ambiguity before disambiguation: ~1.04468066337332392378<br> | ||
+ | Ambiguity after disambiguation: ~1.03625617501764290755 | ||
+ | == Corpus == | ||
+ | Forms in corpus with more than one analysis | ||
+ | cat corpus.txt | lt-proc /path/to/mag.automorf.bin | sed 's/$\W*\^/$\n^/g' | grep '\/.*\/' | ||
+ | |||
+ | == Transducer == | ||
Number of all forms: 4293 | Number of all forms: 4293 | ||
hfst-expand mag.automorf.hfst | wc -l | hfst-expand mag.automorf.hfst | wc -l | ||
Line 7: | Line 22: | ||
hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u | wc -l | hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u | wc -l | ||
− | Ambiguity of | + | Ambiguity of 1.43 (4293/2985). |
+ | |||
+ | Multiple Analyses: | ||
+ | hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort > /tmp/totalforms | ||
+ | hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u > /tmp/uniqforms | ||
+ | diff /tmp/totalforms /tmp/uniqforms | ||
+ | |||
+ | = Sentences = | ||
+ | *<code>^laṛikā/<b>laṛikā<n></b>/laṛikā<n><obl>/laṛikā<n><obl>$ ^i/<b><prn><dem></b>/<prn><dem><att>/<prn><pers><prx><p3><sg>$ ^likho/<b>likh<v><pres><s_p3></b>/likh<v><pres><o_p2><hi>$^./.<sent>$</code> | ||
+ | * <code>^ham/*ham$ ^laṛikā/laṛikā<n>/<b>laṛikā<n><obl></b>/laṛikā<n><obl>$ ^ke/ke<post>$ ^dekhi/dekh<v><pres><s_p1>$^./.<sent>$</code> | ||
[[Category:Sp21_Disambiguation]][[Category:Magahi]] | [[Category:Sp21_Disambiguation]][[Category:Magahi]] | ||
+ | |||
+ | = Final Evaluation of Ambiguity = | ||
+ | basic corpus (mag.corpus.basic.txt):<br> | ||
+ | Ambiguity before disambiguation: ~1.06512968299711815562<br> | ||
+ | Ambiguity after disambiguation: ~1.03400576368876080692<br> | ||
+ | |||
+ | full corpus (mag.corpus.full.txt):<br> | ||
+ | Ambiguity before disambiguation: ~1.04150494001411432604<br> | ||
+ | Ambiguity after disambiguation: ~1.02999294283697953423 |
Latest revision as of 22:09, 4 April 2021
Contents
Repository
Initial Evaluation of Ambiguity
./disambiguation-test.sh
basic corpus (mag.corpus.basic.txt):
Ambiguity before disambiguation: ~1.11354466858789625360
Ambiguity after disambiguation: ~1.04495677233429394813
full corpus (mag.corpus.full.txt):
Ambiguity before disambiguation: ~1.04468066337332392378
Ambiguity after disambiguation: ~1.03625617501764290755
Corpus
Forms in corpus with more than one analysis
cat corpus.txt | lt-proc /path/to/mag.automorf.bin | sed 's/$\W*\^/$\n^/g' | grep '\/.*\/'
Transducer
Number of all forms: 4293
hfst-expand mag.automorf.hfst | wc -l
Number of unique forms: 2985
hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u | wc -l
Ambiguity of 1.43 (4293/2985).
Multiple Analyses:
hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort > /tmp/totalforms hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u > /tmp/uniqforms diff /tmp/totalforms /tmp/uniqforms
Sentences
^laṛikā/laṛikā<n>/laṛikā<n><obl>/laṛikā<n><obl>$ ^i/<prn><dem>/<prn><dem><att>/<prn><pers><prx><p3><sg>$ ^likho/likh<v><pres><s_p3>/likh<v><pres><o_p2><hi>$^./.<sent>$
-
^ham/*ham$ ^laṛikā/laṛikā<n>/laṛikā<n><obl>/laṛikā<n><obl>$ ^ke/ke<post>$ ^dekhi/dekh<v><pres><s_p1>$^./.<sent>$
Final Evaluation of Ambiguity
basic corpus (mag.corpus.basic.txt):
Ambiguity before disambiguation: ~1.06512968299711815562
Ambiguity after disambiguation: ~1.03400576368876080692
full corpus (mag.corpus.full.txt):
Ambiguity before disambiguation: ~1.04150494001411432604
Ambiguity after disambiguation: ~1.02999294283697953423