Difference between revisions of "Okinawan and Miyako"

From LING073
Jump to: navigation, search
(Morphology)
(Final Evaluation)
 
(15 intermediate revisions by 2 users not shown)
Line 39: Line 39:
 
===Final Evaluation===
 
===Final Evaluation===
 
====ryu Evaluation====
 
====ryu Evaluation====
*Precision:
+
*Precision: 91.43%
*Recall:
+
*Recall: 48.85%
*Coverage of large:
+
*Coverage of large: 24.64%
*Number of words in large
+
*Number of words in large 1055
*Number of stems in the transducer:
+
*Number of stems in the transducer: 115
  
 
====mvi Evaluation====
 
====mvi Evaluation====
*Precision:
+
*Precision: 97.4%
*Recall:
+
*Recall: 64.9%
*Coverage of large:
+
*Coverage of large: 55.67%
*Number of words in large
+
*Number of words in large: 650
*Number of stems in the transducer:
+
*Number of stems in the transducer: 116
  
 
====ryu → mvi Evaluation ====
 
====ryu → mvi Evaluation ====
 
=====Evaluation of longer=====
 
=====Evaluation of longer=====
*WER:  
+
*WER: 87.50%
*PER:  
+
*PER: 85.00%
*Coverage:
+
*Proportion of stems translated correctly: 71.8%
*Trimmed coverage:
+
*Trimmed coverage: 70%
*Number of tokens:
+
*Number of tokens: 39
  
 
=====Evaluation of large=====
 
=====Evaluation of large=====
*Trimmed coverage:
+
*Trimmed coverage: 11.29%
*Number of tokens:
+
*Number of tokens: 1014
  
 
====mvi → ryu Evaluation ====
 
====mvi → ryu Evaluation ====
 
=====Evaluation of longer=====
 
=====Evaluation of longer=====
*WER:  
+
*WER: 97.3%
*PER:  
+
*PER: 97.3%
*Coverage:
+
*Coverage: 76.2%
*Trimmed coverage:
+
*Trimmed coverage: 22.37%
*Number of tokens:
+
*Number of tokens: 39
  
 
=====Evaluation of large=====
 
=====Evaluation of large=====
*Trimmed coverage:
+
*Trimmed coverage: 15.85%
*Number of tokens:
+
*Number of tokens: 665
  
 
====Expansion of Miyako transducer and ryu → mvi====
 
====Expansion of Miyako transducer and ryu → mvi====
Line 100: Line 100:
 
*I added a rule such that す goes to っ before the accusative marker.
 
*I added a rule such that す goes to っ before the accusative marker.
 
*I added a rule such that {っ} goes to そ after a word ending in す and before an accusative marker.
 
*I added a rule such that {っ} goes to そ after a word ending in す and before an accusative marker.
 +
 +
=====Transfer Rules=====
 +
*I added a rule that attaches but to the end of the verb in Miyako (since it is sometimes a separate word in Okinawan but never in Miyako).
 +
*: Old output: <code>すぐいん やいが/#どぅみ そぅが#</code>
 +
*: New output: <code>どぅみそぅが/どぅみーそぅが</code>
 +
*I added a rule in the ryu disambiguation that puts the tag @acc after a noun when there is no intervening noun between it and a verb.  I then added a transfer rule to convert that into an acc tag in Miyako.
 +
*: Old output: <code>ぴいじゃあや んんじゃん/#ひんじゃ みーん#</code>
 +
*: New output: <code>ひんじゃう みーん</code>
 +
*I added a rule that converts the tag <top> to <top1>.
 +
*: Old output: <code>うりや/#うら</code>
 +
*: New output: <code>うりや/うらあ</code>
 +
*I added a rule that converts verbalised adjectives to nominalised adjectives.  While it does that tag transfer successfully, it doesn't generate properly because there isn't an abs tag on the nominalised adjective.
 +
*I (Andrew) added a rule that funnels all <loc1>/<loc2>/<loc3> tags to <dat>.
 +
*: Old output: <code>ぴいじゃあんか/#ひんじゃ</code>
 +
*: New output: <code>ぴいじゃあんか/ひんじゃん</code>
  
 
====Expansion of Okinawan transducer and mvi → ryu====
 
====Expansion of Okinawan transducer and mvi → ryu====
 +
 +
====== Morphology x2 ======
 +
 +
*Now supports adjectives.
 +
*: Old output: <code>^たかさん/*たかさん$^./.<sent>$ </code>
 +
*: New output: <code>^たかさん/たかさん<adj><abs>$^./.<sent>$</code>
 +
*Adjectives can be adverbialized.
 +
*: Old output: <code>^たかく/*たかく$^./.<sent>$</code>
 +
*: New output: <code>^たかく/たかさん<adj><avz>$^./.<sent>$ </code>
 +
*Adjectives can be verbialized.
 +
*: Old output: <code>^たかはん/*たかはん$^./.<sent>$</code>
 +
*: New output: <code>^たかはん/たかさん<adj><vz>$^./.<sent>$ </code>
 +
*Adverbs are now supported.
 +
*: Old output: <code>^じこう/*じこう$^./.<sent>$  </code>
 +
*: New output: <code>^じこう/じこう<adv>$^./.<sent>$ </code>
 +
*Past tense for group 1 verbs is now supported.
 +
*: Old output: <code>^ぷみたん/*ぷみたん$^./.<sent>$ </code>
 +
*: New output: <code>^ぷみたん/ぷみいん<v><past>$^./.<sent>$</code>
 +
*Verbs can now be nominalized.
 +
*: Old output: <code>^ぷみやあ/*ぷみやあ$^./.<sent>$  </code>
 +
*: New output: <code>^ぷみやあ/ぷみいん<v><nz><abs>$^./.<sent>$</code>
 +
*Nominals can be emphatic.
 +
*: Old output: <code>^わらびる/*わらびる$^./.<sent>$  </code>
 +
*: New output: <code>^わらびる/わらび<n><emp>$^./.<sent>$  </code>
 +
*Verbs can now be expressed as hearsay.
 +
*: Old output: <code> ^あっちゅんち/*あっちゅんち$^./.<sent>$</code>
 +
*: New output: <code> ^あっちゅんち/あっちゅん<v><npst><hs>$^./.<sent>$</code>
 +
*Many filler/discourse marker words were added.
 +
*Nominals can now be ambiguous.
 +
*: Old output: <code>^わらびなかんて/*わらびなかんて$^./.<sent>$</code>
 +
*: New output: <code>^わらびなかんて/わらび<n><ambg>$^./.<sent>$  </code>
 +
*The medial case for group 2 verbs was added, then feeding to NominalInfl.
 +
*: Old output: <code>^かき/かくん<v><imp>$^./.<sent>$</code>
 +
*: New output: <code>^かき/かくん<v><imp>/かくん<v><med><abs>$^./.<sent>$</code>
 +
*Added support for Japonic period.
 +
 +
====== Disambiguation ======
 +
 +
* あんち is now disambiguated between its usage as a discourse marker and adverb.
 +
* すぐ is disambiguated in the same way.
 +
* むん is disambiguated between a discourse marker and a nominal meaning "thing"
 +
* The homophonous <gen> and <nom> tags have been disambiguated.
 +
 +
====== Transfer rules ======
 +
(see the other direction for Andrew's third rule)
 +
 +
*The accusative tag is now dropped.
 +
*: Old output: <code>ひんじゃう/#ぴいじゃあ</code>
 +
*: New output: <code>ひんじゃう/ぴいじゃあ</code>
 +
 +
*Words tagged with top1/top2 are now transferred to simply top.
 +
*: Old output: <code>ひんじゃあ/#ぴいじゃあ</code>
 +
*: New output: <code>ひんじゃあ/ぴいじゃあや</code>

Latest revision as of 17:15, 14 May 2017

Resources for machine translation between Okinawan and Miyako

Lexical Selection

  • In Miyako, はい can mean field, needle, or south, and is also a causative auxiliary verb. If it is followed by string, it is probably needle.
  • Hand and arm are the same word in Okinawan (てぃい), but different words in Miyako. If てぃい is followed by an instrumental, we are assuming it is hand.

Evaluation

Evaluation as of lexical selection

ryu → mvi Evaluation

Evaluation of tests
  • WER: 100 %
  • PER: 100 %
  • Coverage: 89%
Evaluation of sentences
  • WER: 90.32 %
  • PER: 90.32 %
  • Coverage: 62.1%

mvi → ryu Evaluation

Evaluation of tests
  • WER: 88.89 %
  • PER: 77.78 %
  • Coverage: 100 %
Evaluation of sentences
  • WER: 100%
  • PER: 100%
  • Coverage: 80.65%

Final Evaluation

ryu Evaluation

  • Precision: 91.43%
  • Recall: 48.85%
  • Coverage of large: 24.64%
  • Number of words in large 1055
  • Number of stems in the transducer: 115

mvi Evaluation

  • Precision: 97.4%
  • Recall: 64.9%
  • Coverage of large: 55.67%
  • Number of words in large: 650
  • Number of stems in the transducer: 116

ryu → mvi Evaluation

Evaluation of longer
  • WER: 87.50%
  • PER: 85.00%
  • Proportion of stems translated correctly: 71.8%
  • Trimmed coverage: 70%
  • Number of tokens: 39
Evaluation of large
  • Trimmed coverage: 11.29%
  • Number of tokens: 1014

mvi → ryu Evaluation

Evaluation of longer
  • WER: 97.3%
  • PER: 97.3%
  • Coverage: 76.2%
  • Trimmed coverage: 22.37%
  • Number of tokens: 39
Evaluation of large
  • Trimmed coverage: 15.85%
  • Number of tokens: 665

Expansion of Miyako transducer and ryu → mvi

Morphology
  • I added more numbers. 6 is not included because I am confused as to what it is.
  • I expanded classifiers, adding days, portion, group, and people
  • I changed how the focus marker worked, such that it was a lexicon rather than being hard-coded. This also means that the verbs which take focus markers can now get them.
    Old output: ^ぼーしなてぃどぅ/*ぼーしなてぃどぅ$
    New output: ^ぼーしなてぃどぅ/ぼーし<n><abs>+な<mod><quot><foc>$
    Old output: ^そぅだてぃどぅ/*そぅだてぃどぅ$
    New output: ^そぅだてぃどぅ/そぅだてぃ<v><cvb_abs><foc>$
  • I implemented the resultative and causal.
    Old output: ^あいば/*あいば$
    New output: ^あいば/あ<vaux>+ば<vaux>$
  • I hard-coded the forms I have for do.
  • I added an additional lexicon certain verb forms go through to attach but to the end of them.
twol
  • I added a rule such that the accusative marker goes to う after something ending in a or u.
  • I added a rule such that the accusative marker changes to the appropriate thing, such as going to ぬ after something ending in ん.
    Old output: ^みんぬ/みん<n><gen>$
    New output: ^みんぬ/みん<n><acc>/みん<n><gen>$
  • I added a rule such that {っ} goes to じ after a word ending in ず.
  • I added a rule such that ず goes to っ before the topic and accusative markers.
  • I added a rule such that す goes to っ before the accusative marker.
  • I added a rule such that {っ} goes to そ after a word ending in す and before an accusative marker.
Transfer Rules
  • I added a rule that attaches but to the end of the verb in Miyako (since it is sometimes a separate word in Okinawan but never in Miyako).
    Old output: すぐいん やいが/#どぅみ そぅが#
    New output: どぅみそぅが/どぅみーそぅが
  • I added a rule in the ryu disambiguation that puts the tag @acc after a noun when there is no intervening noun between it and a verb. I then added a transfer rule to convert that into an acc tag in Miyako.
    Old output: ぴいじゃあや んんじゃん/#ひんじゃ みーん#
    New output: ひんじゃう みーん
  • I added a rule that converts the tag <top> to <top1>.
    Old output: うりや/#うら
    New output: うりや/うらあ
  • I added a rule that converts verbalised adjectives to nominalised adjectives. While it does that tag transfer successfully, it doesn't generate properly because there isn't an abs tag on the nominalised adjective.
  • I (Andrew) added a rule that funnels all <loc1>/<loc2>/<loc3> tags to <dat>.
    Old output: ぴいじゃあんか/#ひんじゃ
    New output: ぴいじゃあんか/ひんじゃん

Expansion of Okinawan transducer and mvi → ryu

Morphology x2
  • Now supports adjectives.
    Old output: ^たかさん/*たかさん$^./.<sent>$
    New output: ^たかさん/たかさん<adj><abs>$^./.<sent>$
  • Adjectives can be adverbialized.
    Old output: ^たかく/*たかく$^./.<sent>$
    New output: ^たかく/たかさん<adj><avz>$^./.<sent>$
  • Adjectives can be verbialized.
    Old output: ^たかはん/*たかはん$^./.<sent>$
    New output: ^たかはん/たかさん<adj><vz>$^./.<sent>$
  • Adverbs are now supported.
    Old output: ^じこう/*じこう$^./.<sent>$
    New output: ^じこう/じこう<adv>$^./.<sent>$
  • Past tense for group 1 verbs is now supported.
    Old output: ^ぷみたん/*ぷみたん$^./.<sent>$
    New output: ^ぷみたん/ぷみいん<v><past>$^./.<sent>$
  • Verbs can now be nominalized.
    Old output: ^ぷみやあ/*ぷみやあ$^./.<sent>$
    New output: ^ぷみやあ/ぷみいん<v><nz><abs>$^./.<sent>$
  • Nominals can be emphatic.
    Old output: ^わらびる/*わらびる$^./.<sent>$
    New output: ^わらびる/わらび<n><emp>$^./.<sent>$
  • Verbs can now be expressed as hearsay.
    Old output: ^あっちゅんち/*あっちゅんち$^./.<sent>$
    New output: ^あっちゅんち/あっちゅん<v><npst><hs>$^./.<sent>$
  • Many filler/discourse marker words were added.
  • Nominals can now be ambiguous.
    Old output: ^わらびなかんて/*わらびなかんて$^./.<sent>$
    New output: ^わらびなかんて/わらび<n><ambg>$^./.<sent>$
  • The medial case for group 2 verbs was added, then feeding to NominalInfl.
    Old output: ^かき/かくん<v><imp>$^./.<sent>$
    New output: ^かき/かくん<v><imp>/かくん<v><med><abs>$^./.<sent>$
  • Added support for Japonic period.
Disambiguation
  • あんち is now disambiguated between its usage as a discourse marker and adverb.
  • すぐ is disambiguated in the same way.
  • むん is disambiguated between a discourse marker and a nominal meaning "thing"
  • The homophonous <gen> and <nom> tags have been disambiguated.
Transfer rules

(see the other direction for Andrew's third rule)

  • The accusative tag is now dropped.
    Old output: ひんじゃう/#ぴいじゃあ
    New output: ひんじゃう/ぴいじゃあ
  • Words tagged with top1/top2 are now transferred to simply top.
    Old output: ひんじゃあ/#ぴいじゃあ
    New output: ひんじゃあ/ぴいじゃあや