Difference between revisions of "Magahi/Disambiguation"
From LING073
(→Initial Evaluation of Ambiguity) |
(→Initial Evaluation of Ambiguity) |
||
Line 1: | Line 1: | ||
= Initial Evaluation of Ambiguity = | = Initial Evaluation of Ambiguity = | ||
+ | == Corpus == | ||
+ | Forms in corpus with more than one analysis | ||
+ | cat corpus.txt | lt-proc /path/to/mag.automorf.bin | sed 's/$\W*\^/$\n^/g' | grep '\/.*\/' | ||
+ | |||
+ | == Transducer == | ||
Number of all forms: 4293 | Number of all forms: 4293 | ||
hfst-expand mag.automorf.hfst | wc -l | hfst-expand mag.automorf.hfst | wc -l |
Revision as of 20:20, 4 April 2021
Initial Evaluation of Ambiguity
Corpus
Forms in corpus with more than one analysis
cat corpus.txt | lt-proc /path/to/mag.automorf.bin | sed 's/$\W*\^/$\n^/g' | grep '\/.*\/'
Transducer
Number of all forms: 4293
hfst-expand mag.automorf.hfst | wc -l
Number of unique forms: 2985
hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u | wc -l
Ambiguity of 1.43 (4293/2985).
Multiple Analyses:
hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort > /tmp/totalforms hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u > /tmp/uniqforms diff /tmp/totalforms /tmp/uniqforms