Difference between revisions of "Tiwi and English"

From LING073
Jump to: navigation, search
(**)
(Final Tiw → Eng Evaluation)
 
(9 intermediate revisions by the same user not shown)
Line 77: Line 77:
 
** "#hungry #i"
 
** "#hungry #i"
  
== Final Tiw → Eng Evaluation ==
+
== Additions ==
== Expanded Morphological coverage ==
+
=== Expanded Morphological coverage ===
 
* Added temporal prefix:
 
* Added temporal prefix:
** Example
+
** watu-
 +
*** awatupirni: He fights in the morning
 +
** ki-
 +
*** akipirni: He fights in the evening
 
* Added reciprocal and reflexive suffixes:
 
* Added reciprocal and reflexive suffixes:
** Example
+
** -ajirri
 +
*** ngaripirnajirri: They hit eachother
 +
** -amiya
 +
***ngaripirnamiya: I hit myself
  
== Developed Lexical Selection Rules
+
=== Developed Lexical Selection Rules===
*  
+
==== Case 1 ====
*  
+
moyila → unlucky, non-pay week
 +
 
 +
Example sentences:
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Tiwi Sentence !! English Translation
 +
|-
 +
|  Ngiya '''moyila''' naki awarra jurra api ngiya karrikamini kunawini ||  This is my '''non-pay week''' so I have no money.
 +
|-
 +
| Ngiya '''moyila''' ngirimi kapi jupuluwu japini. ||  I had '''bad luck''' at cards last night.
 +
|}
 +
 
 +
==== Case 2 ====
 +
pijara → eye, bullet
 +
 
 +
Example sentences:
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Tiwi Sentence !! English Translation
 +
|-
 +
|  Ngiya ngurru-wuriyi kularlaga, api ngi-ri-marruriyi jurruwarli yukurri '''pijara''' ||  I went hunting and took a gun and four '''bullets'''.
 +
|-
 +
| Ngarra yi-pirraya '''pijara''' pili jan ||  He washed his '''eyes''' because they were sore.
 +
|}
 +
=== Additional Structural Transfer Rules ===
 +
 
 +
==== Insert "Be" Verbs =====
 +
===== Tiw-Eng =====
 +
(tiw) kamini naki? → (eng) what is this?
 +
===== Tagger =====
 +
{{transferMorphTest|tiw|eng|kami{{tag|prn}}{{tag|itg}}{{tag|m}} naki{{tag|prn}}{{tag|dem}}{{tag|sg}} | what {{tag|prn}}{{tag|itg}}{{tag|mf}}{{tag|sp}} be {{tag|vbser}}{{tag|pres}}{{tag|p3}}{{tag|sg}} this{{tag|prn}}{{tag|dem}}{{tag|mf}}{{tag|sg}}    }}
 +
 
 +
===== Biltrans =====
 +
^kami<prn><itg><m>/what<prn><itg><m>/which<prn><itg><m>$ ^naki<prn><dem><sg>/this><prn><dem><sg>$^.<sent>/.<sent>$
 +
==== Transfer ====
 +
^what<prn><itg><m>$ ^be<vbser><pres><p3><sg>$ ^this><prn><dem><sg>$^.<sent>$
 +
 
 +
===== Word to Word Phrase =====
 +
===== Tiw-Eng =====
 +
{{transferTest|tiw|eng|jupijupi awi kiyija mirrawu | soup and a little bit of tobacco}}
 +
===== Tagger =====
 +
{{transferMorphTest|tiw|eng|jupijupi{{tag|n}} awi {{tag|cnjcoo}} kiyija{{tag|prn}}{{tag|qnt}} mirrawu {{tag|n}} | soup{{tag|n}}{{tag|sg}} and {{tag|cnjcoo}} a little bit {{tag|adv}} of {{tag|pr}} tobacco {{tag|n}}{{tag|sg}} }}
 +
===== Biltrans =====
 +
^jupijupi<n>/soup<n>$ ^awi<cnjcoo>/and<cnjcoo>$ ^kiyija<prn><qnt>/a little bit<adv>$ ^mirrawu<n>/tobacco<n>$^.<sent>/.<sent>$
 +
===== Transfer =====
 +
^soup<n><sg>$ ^and<cnjcoo>$ ^a little bit<adv>$ ^of<pr>$ ^tobacco<n><sg>$^.<sent>$
 +
 
 +
== Final Tiw → Eng Evaluation ==
 +
=== Transducer Coverage ===
 +
* Precision and recall against the annotated.basic corpus: Getting a division by zero error
 +
* Number of words in large text: 3754
 +
* Coverage over large text: 0.39
 +
* Total number of stems: 118
  
 +
=== Translator Coverage ===
 +
* Word error rate (WER): 94.64 %
 +
* Position-independent word error rate (PER): 93.45 %
 +
* Trimmed coverage
 +
** longer corpora: 0.44
 +
** large corpora: 0.30
 +
* Number of tokens:
 +
** longer: 154
 +
** large: 3501
  
 
[[Category: Tiwi]][[Category:English]][[Category:Sp21_TranslationPairs]]
 
[[Category: Tiwi]][[Category:English]][[Category:Sp21_TranslationPairs]]

Latest revision as of 13:37, 8 May 2021

Note: Resources for machine translation between Tiwi and English

External Resources

Github Repo for Language Pair

Tiwi Transducer

English Transducer

Developed Resources

Bilingual Corpus

Contrastive Grammar

Lexical selection

Structural Transfer

Initial Tiw → Eng Evaluation

The coverage of our monolingual transducer: coverage: 388 / 1133 (~0.34245366284201235658)

The coverage of our bilingual transducer: coverage: 20 / 30 (~0.66666666666666666667)

Sentence Analysis

Sentence 1

  • pirripakijiti: They floated
    • ^akijiti<vblex><iv><past><s_pl3>/float<vblex><iv><past><s_pl3>$^.<sent>/.<sent>$
    • "#float"

Sentence 2

  • pirripakirlumurri: They were tired
    • ^akirlumurri<vblex><iv><past><s_pl3>/be tired<vblex><iv><past><s_pl3>$^.<sent>/.<sent>$
    • "#be tired"

Sentence 3

  • pirriwapa: They ate
    • ^wapa<vblex><iv><past><s_pl3>/eat<vblex><iv><past><s_pl3>$^.<sent>/.<sent>$
    • "#eat"

Sentence 4

  • kamini naki?: what is this?
    • ^kami<prn><itg><m>/what<prn><itg><m>/which<prn><itg><m>$ ^naki<prn><dem><sg>/this<prn><dem><sg>$^?<sent>/?<sent>$^.<sent>/.<sent>$
    • "#what #this?"

Sentence 5

  • ngarra minimarti: he is generous
    • ^ngarra<prn><m><p3><sg>/he<prn><m><p3><sg>$ ^minimarti<adj><m>/generous<adj>$^.<sent>/.<sent>$
    • "#generous #he"

Sentence 6

  • jupijupi awi kiyija mirrawu: soup and a little bit of tobacco
    • ^jupijupi<n>/soup<n>$ ^awi<cnjcoo>/and<cnjcoo>$ ^kiyija<n><prn><qnt>/little<n><prn><qnt>$ ^mirrawu<n>/tobacco<n>$^.<sent>/.<sent>$
    • "#soup and #little #tobacco"

Sentence 7

  • ngarra kijinga: he is small
    • ^ngarra<prn><m><p3><sg>/he<n><prn><m><p3><sg>$ ^kiji<adj><f>/small<n><f>$^.<sent>/.<sent>$
    • "#he #small"

Sentence 8

  • yirrikipayi ngarra tuwara: the crocodile's tail
    • ^yirrikipayi<n><m>/crocodile<n><m>$ ^ngarra<prn><m><p3><sg>/he<prn><m><p3><sg>$ ^tuwara<n>/tail<n>$^.<sent>/.<sent>$
    • "#crocodile #he #tail"

Sentence 9

  • awurra wawurruwi: those are men
    • ^awurra<prn><dem><pl>/those<prn><dem><pl>$ ^wawurru<n><pl>/man<n><pl>$^.<sent>/.<sent>$
    • "#those men"

Sentence 10

  • ngiya paruwani: i am hungry
    • ^ngiya<prn><p1><sg>/i<prn><p1><sg>$ ^paruwani<adj>/hungry<adj>$^.<sent>/.<sent>$
    • "#hungry #i"

Additions

Expanded Morphological coverage

  • Added temporal prefix:
    • watu-
      • awatupirni: He fights in the morning
    • ki-
      • akipirni: He fights in the evening
  • Added reciprocal and reflexive suffixes:
    • -ajirri
      • ngaripirnajirri: They hit eachother
    • -amiya
      • ngaripirnamiya: I hit myself

Developed Lexical Selection Rules

Case 1

moyila → unlucky, non-pay week

Example sentences:

Tiwi Sentence English Translation
Ngiya moyila naki awarra jurra api ngiya karrikamini kunawini This is my non-pay week so I have no money.
Ngiya moyila ngirimi kapi jupuluwu japini. I had bad luck at cards last night.

Case 2

pijara → eye, bullet

Example sentences:

Tiwi Sentence English Translation
Ngiya ngurru-wuriyi kularlaga, api ngi-ri-marruriyi jurruwarli yukurri pijara I went hunting and took a gun and four bullets.
Ngarra yi-pirraya pijara pili jan He washed his eyes because they were sore.

Additional Structural Transfer Rules

Insert "Be" Verbs =

Tiw-Eng

(tiw) kamini naki? → (eng) what is this?

Tagger

(tiw) kami<prn><itg><m> naki<prn><dem><sg> → (eng) what <prn><itg><mf><sp> be <vbser><pres><p3><sg> this<prn><dem><mf><sg>

Biltrans

^kami<prn><itg><m>/what<prn><itg><m>/which<prn><itg><m>$ ^naki<prn><dem><sg>/this><prn><dem><sg>$^.<sent>/.<sent>$

Transfer

^what<prn><itg><m>$ ^be<vbser><pres><p3><sg>$ ^this><prn><dem><sg>$^.<sent>$

Word to Word Phrase
Tiw-Eng

(tiw) jupijupi awi kiyija mirrawu → (eng) soup and a little bit of tobacco

Tagger

(tiw) jupijupi<n> awi <cnjcoo> kiyija<prn><qnt> mirrawu <n> → (eng) soup<n><sg> and <cnjcoo> a little bit <adv> of <pr> tobacco <n><sg>

Biltrans

^jupijupi<n>/soup<n>$ ^awi<cnjcoo>/and<cnjcoo>$ ^kiyija<prn><qnt>/a little bit<adv>$ ^mirrawu<n>/tobacco<n>$^.<sent>/.<sent>$

Transfer

^soup<n><sg>$ ^and<cnjcoo>$ ^a little bit<adv>$ ^of<pr>$ ^tobacco<n><sg>$^.<sent>$

Final Tiw → Eng Evaluation

Transducer Coverage

  • Precision and recall against the annotated.basic corpus: Getting a division by zero error
  • Number of words in large text: 3754
  • Coverage over large text: 0.39
  • Total number of stems: 118

Translator Coverage

  • Word error rate (WER): 94.64 %
  • Position-independent word error rate (PER): 93.45 %
  • Trimmed coverage
    • longer corpora: 0.44
    • large corpora: 0.30
  • Number of tokens:
    • longer: 154
    • large: 3501