Okinawan and Miyako

From LING073
Revision as of 15:37, 9 May 2017 by Doldham1 (talk | contribs) (Transfer Rules)

Jump to: navigation, search

Resources for machine translation between Okinawan and Miyako

Lexical Selection

  • In Miyako, はい can mean field, needle, or south, and is also a causative auxiliary verb. If it is followed by string, it is probably needle.
  • Hand and arm are the same word in Okinawan (てぃい), but different words in Miyako. If てぃい is followed by an instrumental, we are assuming it is hand.

Evaluation

Evaluation as of lexical selection

ryu → mvi Evaluation

Evaluation of tests
  • WER: 100 %
  • PER: 100 %
  • Coverage: 89%
Evaluation of sentences
  • WER: 90.32 %
  • PER: 90.32 %
  • Coverage: 62.1%

mvi → ryu Evaluation

Evaluation of tests
  • WER: 88.89 %
  • PER: 77.78 %
  • Coverage: 100 %
Evaluation of sentences
  • WER: 100%
  • PER: 100%
  • Coverage: 80.65%

Final Evaluation

ryu Evaluation

  • Precision:
  • Recall:
  • Coverage of large:
  • Number of words in large
  • Number of stems in the transducer:

mvi Evaluation

  • Precision: 97.4%
  • Recall: 64.9%
  • Coverage of large: 55.67%
  • Number of words in large: 650
  • Number of stems in the transducer: 116

ryu → mvi Evaluation

Evaluation of longer
  • WER:
  • PER:
  • Coverage:
  • Trimmed coverage:
  • Number of tokens:
Evaluation of large
  • Trimmed coverage:
  • Number of tokens:

mvi → ryu Evaluation

Evaluation of longer
  • WER:
  • PER:
  • Coverage:
  • Trimmed coverage:
  • Number of tokens:
Evaluation of large
  • Trimmed coverage:
  • Number of tokens:

Expansion of Miyako transducer and ryu → mvi

Morphology
  • I added more numbers. 6 is not included because I am confused as to what it is.
  • I expanded classifiers, adding days, portion, group, and people
  • I changed how the focus marker worked, such that it was a lexicon rather than being hard-coded. This also means that the verbs which take focus markers can now get them.
    Old output: ^ぼーしなてぃどぅ/*ぼーしなてぃどぅ$
    New output: ^ぼーしなてぃどぅ/ぼーし<n><abs>+な<mod><quot><foc>$
    Old output: ^そぅだてぃどぅ/*そぅだてぃどぅ$
    New output: ^そぅだてぃどぅ/そぅだてぃ<v><cvb_abs><foc>$
  • I implemented the resultative and causal.
    Old output: ^あいば/*あいば$
    New output: ^あいば/あ<vaux>+ば<vaux>$
  • I hard-coded the forms I have for do.
  • I added an additional lexicon certain verb forms go through to attach but to the end of them.
twol
  • I added a rule such that the accusative marker goes to う after something ending in a or u.
  • I added a rule such that the accusative marker changes to the appropriate thing, such as going to ぬ after something ending in ん.
    Old output: ^みんぬ/みん<n><gen>$
    New output: ^みんぬ/みん<n><acc>/みん<n><gen>$
  • I added a rule such that {っ} goes to じ after a word ending in ず.
  • I added a rule such that ず goes to っ before the topic and accusative markers.
  • I added a rule such that す goes to っ before the accusative marker.
  • I added a rule such that {っ} goes to そ after a word ending in す and before an accusative marker.
Transfer Rules
  • I added a rule that attaches but to the end of the verb in Miyako (since it is sometimes a separate word in Okinawan but never in Miyako).
    Old output: すぐいん やいが/#どぅみ そぅが#
    New output: どぅみそぅが/どぅみーそぅが
  • I added a rule in the ryu disambiguation that puts the tag @acc after a noun when there is no intervening noun between it and a verb. I then added a transfer rule to convert that into an acc tag in Miyako.
    Old output: ぴいじゃあや んんじゃん/#ひんじゃ みーん#
    New output: ひんじゃう みーん
  • I added a rule that converts verbalised adjectives to nominalised adjectives.
    Old output: うりや たかはん/#うら #たかだい (possibly?)
    New output:
  • I added a rule that converts the tag <top> to <top1> (in theory)
    Old output: うりや/#うら
    New output:

Expansion of Okinawan transducer and mvi → ryu