Difference between revisions of "Magahi/Disambiguation"

From LING073
Jump to: navigation, search
(Initial Evaluation of Ambiguity)
(Initial Evaluation of Ambiguity)
Line 1: Line 1:
 
= Initial Evaluation of Ambiguity =  
 
= Initial Evaluation of Ambiguity =  
  
 +
== Corpus ==
 +
Forms in corpus with more than one analysis
 +
cat corpus.txt | lt-proc /path/to/mag.automorf.bin | sed 's/$\W*\^/$\n^/g' | grep '\/.*\/'
 +
 +
== Transducer ==
 
Number of all forms: 4293
 
Number of all forms: 4293
 
  hfst-expand mag.automorf.hfst | wc -l
 
  hfst-expand mag.automorf.hfst | wc -l

Revision as of 20:20, 4 April 2021

Initial Evaluation of Ambiguity

Corpus

Forms in corpus with more than one analysis

cat corpus.txt | lt-proc /path/to/mag.automorf.bin | sed 's/$\W*\^/$\n^/g' | grep '\/.*\/'

Transducer

Number of all forms: 4293

hfst-expand mag.automorf.hfst | wc -l

Number of unique forms: 2985

hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u | wc -l

Ambiguity of 1.43 (4293/2985).

Multiple Analyses:

hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort > /tmp/totalforms
hfst-expand mag.automorf.hfst | cut -f1 -d':' | sort -u > /tmp/uniqforms
diff /tmp/totalforms /tmp/uniqforms