Shan/Disambiguation
Initial Evaluation
In the last coverage test of a large file, the ambiguity was 2052616 / 1743796 (~1.177)
So there are some ambiguity in the transducer.
Example
ၵဝ် ၽုၵ်ႇ ၵုၺ်တူၼ်ႈၼိုင်ႈ
I plant one cotton plant
^ၵဝ်/ၵဝ်<prn><person><p1><sg>$ ^ၽ/ၽ<qst>$ ^ုၵ်ႇ/*ုၵ်ႇ$ ^ၵုၺ်/*ၵုၺ်$ ^တူၼ်ႈ/တူၼ်ႈ<clf>/တူၼ်ႈ<n>$ ^ၼိုင်ႈ/ၼိုင်ႈ<num>$ ^./.<sent>$
ၶုၼ်ၽီတူၼ်ႈမႆႉ
^ၶုၼ်/*ၶုၼ်$ ^ၽ/ၽ<qst>$ ^ီ/*ီ$ ^တူၼ်ႈ/တူၼ်ႈ<clf>/တူၼ်ႈ<n>$ ^မႆႉ/*မႆႉ$ ^./.<sent>$
Gods of trees
The word တူၼ်ႈ means tree or trunk and can both mean a tree or it can be a classifier used for trees and plants. So to reduce the ambiguity in situations like this, we created a select rule where if there were numerals <num> right before or after a noun/classifier, then it is a classifer and not a noun.
Final Evaluation
The final ambiguity of the large corpus was 1924881 / 1743796 (~1.10384528924254901376)