Difference between revisions of "Magahi/Disambiguation"

From LING073
Jump to: navigation, search
(Repository)
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
= Repository =
 +
[https://github.swarthmore.edu/Ling073-sp21/ling073-mag Repository]
 +
 
= Initial Evaluation of Ambiguity =  
 
= Initial Evaluation of Ambiguity =  
 +
./disambiguation-test.sh
 +
basic corpus (mag.corpus.basic.txt):<br>
 +
Ambiguity before disambiguation:        ~1.11354466858789625360<br>
 +
Ambiguity after disambiguation:        ~1.04495677233429394813<br>
 +
full corpus (mag.corpus.full.txt):<br>
 +
Ambiguity before disambiguation: ~1.04468066337332392378<br>
 +
Ambiguity after disambiguation: ~1.03625617501764290755
 +
 +
== Corpus ==
 +
Forms in corpus with more than one analysis
 +
cat corpus.txt | lt-proc /path/to/mag.automorf.bin | sed 's/$\W*\^/$\n^/g' | grep '\/.*\/'
 +
 +
== Transducer ==
 +
Number of all forms: 4293
 +
hfst-expand mag.automorf.hfst | wc -l
 +
 +
Number of unique forms: 2985
 +
hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u | wc -l
 +
 +
Ambiguity of 1.43 (4293/2985).
 +
 +
Multiple Analyses:
 +
hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort > /tmp/totalforms
 +
hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u > /tmp/uniqforms
 +
diff /tmp/totalforms /tmp/uniqforms
  
 +
= Sentences =
 +
*<code>^laṛikā/<b>laṛikā<n></b>/laṛikā<n><obl>/laṛikā<n><obl>$ ^i/<b><prn><dem></b>/<prn><dem><att>/<prn><pers><prx><p3><sg>$ ^likho/<b>likh<v><pres><s_p3></b>/likh<v><pres><o_p2><hi>$^./.<sent>$</code>
 +
* <code>^ham/*ham$ ^laṛikā/laṛikā<n>/<b>laṛikā<n><obl></b>/laṛikā<n><obl>$ ^ke/ke<post>$ ^dekhi/dekh<v><pres><s_p1>$^./.<sent>$</code>
 
[[Category:Sp21_Disambiguation]][[Category:Magahi]]
 
[[Category:Sp21_Disambiguation]][[Category:Magahi]]
 +
 +
= Final Evaluation of Ambiguity =
 +
basic corpus (mag.corpus.basic.txt):<br>
 +
Ambiguity before disambiguation: ~1.06512968299711815562<br>
 +
Ambiguity after disambiguation: ~1.03400576368876080692<br>
 +
 +
full corpus (mag.corpus.full.txt):<br>
 +
Ambiguity before disambiguation:        ~1.04150494001411432604<br>
 +
Ambiguity after disambiguation:        ~1.02999294283697953423

Latest revision as of 22:09, 4 April 2021

Repository

Repository

Initial Evaluation of Ambiguity

./disambiguation-test.sh

basic corpus (mag.corpus.basic.txt):
Ambiguity before disambiguation: ~1.11354466858789625360
Ambiguity after disambiguation: ~1.04495677233429394813
full corpus (mag.corpus.full.txt):
Ambiguity before disambiguation: ~1.04468066337332392378
Ambiguity after disambiguation: ~1.03625617501764290755

Corpus

Forms in corpus with more than one analysis

cat corpus.txt | lt-proc /path/to/mag.automorf.bin | sed 's/$\W*\^/$\n^/g' | grep '\/.*\/'

Transducer

Number of all forms: 4293

hfst-expand mag.automorf.hfst | wc -l

Number of unique forms: 2985

hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u | wc -l

Ambiguity of 1.43 (4293/2985).

Multiple Analyses:

hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort > /tmp/totalforms
hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u > /tmp/uniqforms
diff /tmp/totalforms /tmp/uniqforms

Sentences

  • ^laṛikā/laṛikā<n>/laṛikā<n><obl>/laṛikā<n><obl>$ ^i/<prn><dem>/<prn><dem><att>/<prn><pers><prx><p3><sg>$ ^likho/likh<v><pres><s_p3>/likh<v><pres><o_p2><hi>$^./.<sent>$
  • ^ham/*ham$ ^laṛikā/laṛikā<n>/laṛikā<n><obl>/laṛikā<n><obl>$ ^ke/ke<post>$ ^dekhi/dekh<v><pres><s_p1>$^./.<sent>$

Final Evaluation of Ambiguity

basic corpus (mag.corpus.basic.txt):
Ambiguity before disambiguation: ~1.06512968299711815562
Ambiguity after disambiguation: ~1.03400576368876080692

full corpus (mag.corpus.full.txt):
Ambiguity before disambiguation: ~1.04150494001411432604
Ambiguity after disambiguation: ~1.02999294283697953423