Jeju and Ainu

From LING073
Jump to: navigation, search

Resources for machine translation between Jeju and Ainu

ain → jje evaluation

tests.txt evaluation

  • WER: 100%
  • PER: 100%
  • Proportion of stems translated correctly: 11/11

sentences.txt evaluation

  • WER: 100%
  • PER: 100%
  • Proportion of stems translated correctly: 73/296

jje → ain evaluation

tests.txt evaluation

  • WER: 100%
  • PER: 100%
  • Proportion of stems translated correctly: 8/14

sentences.txt evaluation

  • WER: 177.98% (not sure how these error rates can be above 100% but somehow we managed it)
  • PER: 177.98%
  • Proportion of stems translated correctly: 150/502

Lexical Selection

jje -> ain

  • (jje)눈 -> (ain) ウパㇱ (snow)
  • (jje)눈 -> (ain) シㇰ (eye)
  • (jje)배 -> (ain) チㇷ゚ (ship)
  • (jje)배 -> (ain) ト゚ィ (stomach)

ain -> jje

  • (ain) サポ (older sister) -> 누나 (older sister speaker:male)
  • (ain) サポ (older sister) -> 언니 (older sister, speaker:female)
  • (ain) チュㇷ゚ (sun/moon/month) -> 달 (moon/month)
  • (ain) チュㇷ゚ (sun/moon/month) -> 해 (sun)

Developed resources

Parallel Corpus Repository

Contrastive grammar

Contrastive grammar wiki page

Structural transfer

Structural transfer wiki page

Final evaluation

ain->jje

Transducer

  • Precision and Recall against basic corpus:
  • Coverage: ~52%
  • Number of words in the large corpus: 2035
  • Number of stems in the transducer: 382

Machine Translation

  • WER and PER over large corpus: WER 100%, PER 100%
  • Proportion of stems translated correctly in large corpus: 2036 total, 6 unknown. (This seems very unlikely.)
  • Trimmed coverage over large corpus
  • Number of stems in large corpus

jje->ain

Implemented

  • 200 more stems to the bilingual dictionary
  • Expanded morphology
    • Different politeness level verb ending added
    • Different various endings used in Jeju were added(disambiguation yet to be added)
    • Verb stem + -게 => adv alteration added
    • Noun stem + -이다 ==> v alteration added
    • Added adverbs with negative subcategory
    • Added various determiners
  • Lexical selection rules added

Transducer

  • Precision and Recall against basic corpus: Precision 29.44785% / Recall 41.37931%
  • Coverage over the large corpus: 2608/6325 ~ 0.4123
  • Number of words in the large corpus: 6325
  • Number of stems in the transducer: lexccounter gives 85, but handcounting resulted in 385.

Machine Translation

  • WER and PER over large corpus: Somehow both WER and PER are 177.98%, edit distance of 493.
  • Proportion of stems translated correctly in large corpus: 5551 total words, 3296 unknown.
  • WER: 176%, PER: 176% (Out Ainu-side corpus is not complete yet.)
  • Trimmed coverage over large corpus: 0%
  • Number of stems in large corpus: 0%

For the 0% coverages, I am in the process of figuring out if it is a failture of machine translation, or a result of flawed pipeline and compilation. I will update as soon the matter is determined

=====For the following statistics, aq-covtest corpus jje-ain.automorf.bin results in TransducerHasWrongType error, I will update as soon as this bug is fixed=====

  • Trimmed coverage over large corpus: TransducerHasWrongType error encountered, fixing.
  • Number of stems in large corpus: TransducerHasWrongType error encountered, fixing.