Shan/Disambiguation

From LING073
Jump to: navigation, search

Initial Evaluation

In the last coverage test of a large file, the ambiguity was 2052616 / 1743796 (~1.177)

So there are some ambiguity in the transducer.

Example

ၵဝ် ၽုၵ်ႇ ၵုၺ်တူၼ်ႈၼိုင်ႈ

I plant one cotton plant

^ၵဝ်/ၵဝ်<prn><person><p1><sg>$ ^ၽ/ၽ<qst>$ ^ုၵ်ႇ/*ုၵ်ႇ$ ^ၵုၺ်/*ၵုၺ်$ ^တူၼ်ႈ/တူၼ်ႈ<clf>/တူၼ်ႈ<n>$ ^ၼိုင်ႈ/ၼိုင်ႈ<num>$ ^./.<sent>$

ၶုၼ်ၽီတူၼ်ႈမႆႉ

^ၶုၼ်/*ၶုၼ်$ ^ၽ/ၽ<qst>$ ^ီ/*ီ$ ^တူၼ်ႈ/တူၼ်ႈ<clf>/တူၼ်ႈ<n>$ ^မႆႉ/*မႆႉ$ ^./.<sent>$

Gods of trees

The word တူၼ်ႈ means tree or trunk and can both mean a tree or it can be a classifier used for trees and plants. So to reduce the ambiguity in situations like this, we created a select rule where if there were numerals <num> right before or after a noun/classifier, then it is a classifer and not a noun.

Final Evaluation

The final ambiguity of the large corpus was 1924881 / 1743796 (~1.10384528924254901376)