Difference between revisions of "Tiwi and English"
(→Sentence Analysis) |
(→Final Tiw → Eng Evaluation) |
||
(30 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | <i> Note: Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Tiwi Tiwi] and [https://wikis.swarthmore.edu/ling073/English English] | + | <i> Note: Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Tiwi Tiwi] and [https://wikis.swarthmore.edu/ling073/English English] </i> |
== External Resources == | == External Resources == | ||
Line 13: | Line 13: | ||
[https://github.swarthmore.edu/Ling073-sp21/ling073-tiw-eng-corpus Bilingual Corpus] | [https://github.swarthmore.edu/Ling073-sp21/ling073-tiw-eng-corpus Bilingual Corpus] | ||
− | == | + | [https://wikis.swarthmore.edu/ling073/Tiwi_and_English/Contrastive_Grammar Contrastive Grammar] |
+ | |||
+ | [https://wikis.swarthmore.edu/ling073/Tiwi_and_English/Lexical_selection Lexical selection] | ||
+ | |||
+ | [https://wikis.swarthmore.edu/ling073/Tiwi_and_English/Structural_transfer Structural Transfer] | ||
+ | |||
+ | == Initial Tiw → Eng Evaluation == | ||
The coverage of our monolingual transducer: | The coverage of our monolingual transducer: | ||
coverage: 388 / 1133 (~0.34245366284201235658) | coverage: 388 / 1133 (~0.34245366284201235658) | ||
The coverage of our bilingual transducer: | The coverage of our bilingual transducer: | ||
− | + | coverage: 20 / 30 (~0.66666666666666666667) | |
− | |||
=== Sentence Analysis === | === Sentence Analysis === | ||
==== Sentence 1 ==== | ==== Sentence 1 ==== | ||
− | pirripakijiti: They floated | + | * pirripakijiti: They floated |
− | #float | + | ** ^akijiti<vblex><iv><past><s_pl3>/float<vblex><iv><past><s_pl3>$^.<sent>/.<sent>$ |
+ | ** "#float" | ||
==== Sentence 2 ==== | ==== Sentence 2 ==== | ||
* pirripakirlumurri: They were tired | * pirripakirlumurri: They were tired | ||
− | + | ** ^akirlumurri<vblex><iv><past><s_pl3>/be tired<vblex><iv><past><s_pl3>$^.<sent>/.<sent>$ | |
** "#be tired" | ** "#be tired" | ||
==== Sentence 3 ==== | ==== Sentence 3 ==== | ||
*pirriwapa: They ate | *pirriwapa: They ate | ||
− | + | ** ^wapa<vblex><iv><past><s_pl3>/eat<vblex><iv><past><s_pl3>$^.<sent>/.<sent>$ | |
** "#eat" | ** "#eat" | ||
==== Sentence 4 ==== | ==== Sentence 4 ==== | ||
* kamini naki?: what is this? | * kamini naki?: what is this? | ||
+ | ** ^kami<prn><itg><m>/what<prn><itg><m>/which<prn><itg><m>$ ^naki<prn><dem><sg>/this<prn><dem><sg>$^?<sent>/?<sent>$^.<sent>/.<sent>$ | ||
** "#what #this?" | ** "#what #this?" | ||
==== Sentence 5 ==== | ==== Sentence 5 ==== | ||
* ngarra minimarti: he is generous | * ngarra minimarti: he is generous | ||
+ | ** ^ngarra<prn><m><p3><sg>/he<prn><m><p3><sg>$ ^minimarti<adj><m>/generous<adj>$^.<sent>/.<sent>$ | ||
** "#generous #he" | ** "#generous #he" | ||
==== Sentence 6 ==== | ==== Sentence 6 ==== | ||
* jupijupi awi kiyija mirrawu: soup and a little bit of tobacco | * jupijupi awi kiyija mirrawu: soup and a little bit of tobacco | ||
+ | ** ^jupijupi<n>/soup<n>$ ^awi<cnjcoo>/and<cnjcoo>$ ^kiyija<n><prn><qnt>/little<n><prn><qnt>$ ^mirrawu<n>/tobacco<n>$^.<sent>/.<sent>$ | ||
** "#soup and #little #tobacco" | ** "#soup and #little #tobacco" | ||
==== Sentence 7 ==== | ==== Sentence 7 ==== | ||
* ngarra kijinga: he is small | * ngarra kijinga: he is small | ||
+ | ** ^ngarra<prn><m><p3><sg>/he<n><prn><m><p3><sg>$ ^kiji<adj><f>/small<n><f>$^.<sent>/.<sent>$ | ||
** "#he #small" | ** "#he #small" | ||
==== Sentence 8 ==== | ==== Sentence 8 ==== | ||
* yirrikipayi ngarra tuwara: the crocodile's tail | * yirrikipayi ngarra tuwara: the crocodile's tail | ||
+ | ** ^yirrikipayi<n><m>/crocodile<n><m>$ ^ngarra<prn><m><p3><sg>/he<prn><m><p3><sg>$ ^tuwara<n>/tail<n>$^.<sent>/.<sent>$ | ||
** "#crocodile #he #tail" | ** "#crocodile #he #tail" | ||
==== Sentence 9 ==== | ==== Sentence 9 ==== | ||
− | * awurra wawurruwi | + | * awurra wawurruwi: those are men |
+ | ** ^awurra<prn><dem><pl>/those<prn><dem><pl>$ ^wawurru<n><pl>/man<n><pl>$^.<sent>/.<sent>$ | ||
** "#those men" | ** "#those men" | ||
==== Sentence 10 ==== | ==== Sentence 10 ==== | ||
− | * ngiya | + | * ngiya paruwani: i am hungry |
+ | ** ^ngiya<prn><p1><sg>/i<prn><p1><sg>$ ^paruwani<adj>/hungry<adj>$^.<sent>/.<sent>$ | ||
** "#hungry #i" | ** "#hungry #i" | ||
+ | |||
+ | == Additions == | ||
+ | === Expanded Morphological coverage === | ||
+ | * Added temporal prefix: | ||
+ | ** watu- | ||
+ | *** awatupirni: He fights in the morning | ||
+ | ** ki- | ||
+ | *** akipirni: He fights in the evening | ||
+ | * Added reciprocal and reflexive suffixes: | ||
+ | ** -ajirri | ||
+ | *** ngaripirnajirri: They hit eachother | ||
+ | ** -amiya | ||
+ | ***ngaripirnamiya: I hit myself | ||
+ | |||
+ | === Developed Lexical Selection Rules=== | ||
+ | ==== Case 1 ==== | ||
+ | moyila → unlucky, non-pay week | ||
+ | |||
+ | Example sentences: | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Tiwi Sentence !! English Translation | ||
+ | |- | ||
+ | | Ngiya '''moyila''' naki awarra jurra api ngiya karrikamini kunawini || This is my '''non-pay week''' so I have no money. | ||
+ | |- | ||
+ | | Ngiya '''moyila''' ngirimi kapi jupuluwu japini. || I had '''bad luck''' at cards last night. | ||
+ | |} | ||
+ | |||
+ | ==== Case 2 ==== | ||
+ | pijara → eye, bullet | ||
+ | |||
+ | Example sentences: | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Tiwi Sentence !! English Translation | ||
+ | |- | ||
+ | | Ngiya ngurru-wuriyi kularlaga, api ngi-ri-marruriyi jurruwarli yukurri '''pijara''' || I went hunting and took a gun and four '''bullets'''. | ||
+ | |- | ||
+ | | Ngarra yi-pirraya '''pijara''' pili jan || He washed his '''eyes''' because they were sore. | ||
+ | |} | ||
+ | === Additional Structural Transfer Rules === | ||
+ | |||
+ | ==== Insert "Be" Verbs ===== | ||
+ | ===== Tiw-Eng ===== | ||
+ | (tiw) kamini naki? → (eng) what is this? | ||
+ | ===== Tagger ===== | ||
+ | {{transferMorphTest|tiw|eng|kami{{tag|prn}}{{tag|itg}}{{tag|m}} naki{{tag|prn}}{{tag|dem}}{{tag|sg}} | what {{tag|prn}}{{tag|itg}}{{tag|mf}}{{tag|sp}} be {{tag|vbser}}{{tag|pres}}{{tag|p3}}{{tag|sg}} this{{tag|prn}}{{tag|dem}}{{tag|mf}}{{tag|sg}} }} | ||
+ | |||
+ | ===== Biltrans ===== | ||
+ | ^kami<prn><itg><m>/what<prn><itg><m>/which<prn><itg><m>$ ^naki<prn><dem><sg>/this><prn><dem><sg>$^.<sent>/.<sent>$ | ||
+ | ==== Transfer ==== | ||
+ | ^what<prn><itg><m>$ ^be<vbser><pres><p3><sg>$ ^this><prn><dem><sg>$^.<sent>$ | ||
+ | |||
+ | ===== Word to Word Phrase ===== | ||
+ | ===== Tiw-Eng ===== | ||
+ | {{transferTest|tiw|eng|jupijupi awi kiyija mirrawu | soup and a little bit of tobacco}} | ||
+ | ===== Tagger ===== | ||
+ | {{transferMorphTest|tiw|eng|jupijupi{{tag|n}} awi {{tag|cnjcoo}} kiyija{{tag|prn}}{{tag|qnt}} mirrawu {{tag|n}} | soup{{tag|n}}{{tag|sg}} and {{tag|cnjcoo}} a little bit {{tag|adv}} of {{tag|pr}} tobacco {{tag|n}}{{tag|sg}} }} | ||
+ | ===== Biltrans ===== | ||
+ | ^jupijupi<n>/soup<n>$ ^awi<cnjcoo>/and<cnjcoo>$ ^kiyija<prn><qnt>/a little bit<adv>$ ^mirrawu<n>/tobacco<n>$^.<sent>/.<sent>$ | ||
+ | ===== Transfer ===== | ||
+ | ^soup<n><sg>$ ^and<cnjcoo>$ ^a little bit<adv>$ ^of<pr>$ ^tobacco<n><sg>$^.<sent>$ | ||
+ | |||
+ | == Final Tiw → Eng Evaluation == | ||
+ | === Transducer Coverage === | ||
+ | * Precision and recall against the annotated.basic corpus: Getting a division by zero error | ||
+ | * Number of words in large text: 3754 | ||
+ | * Coverage over large text: 0.39 | ||
+ | * Total number of stems: 118 | ||
+ | |||
+ | === Translator Coverage === | ||
+ | * Word error rate (WER): 94.64 % | ||
+ | * Position-independent word error rate (PER): 93.45 % | ||
+ | * Trimmed coverage | ||
+ | ** longer corpora: 0.44 | ||
+ | ** large corpora: 0.30 | ||
+ | * Number of tokens: | ||
+ | ** longer: 154 | ||
+ | ** large: 3501 | ||
[[Category: Tiwi]][[Category:English]][[Category:Sp21_TranslationPairs]] | [[Category: Tiwi]][[Category:English]][[Category:Sp21_TranslationPairs]] |
Latest revision as of 14:37, 8 May 2021
Note: Resources for machine translation between Tiwi and English
Contents
External Resources
Developed Resources
Initial Tiw → Eng Evaluation
The coverage of our monolingual transducer: coverage: 388 / 1133 (~0.34245366284201235658)
The coverage of our bilingual transducer: coverage: 20 / 30 (~0.66666666666666666667)
Sentence Analysis
Sentence 1
- pirripakijiti: They floated
- ^akijiti<vblex><iv><past><s_pl3>/float<vblex><iv><past><s_pl3>$^.<sent>/.<sent>$
- "#float"
Sentence 2
- pirripakirlumurri: They were tired
- ^akirlumurri<vblex><iv><past><s_pl3>/be tired<vblex><iv><past><s_pl3>$^.<sent>/.<sent>$
- "#be tired"
Sentence 3
- pirriwapa: They ate
- ^wapa<vblex><iv><past><s_pl3>/eat<vblex><iv><past><s_pl3>$^.<sent>/.<sent>$
- "#eat"
Sentence 4
- kamini naki?: what is this?
- ^kami<prn><itg><m>/what<prn><itg><m>/which<prn><itg><m>$ ^naki<prn><dem><sg>/this<prn><dem><sg>$^?<sent>/?<sent>$^.<sent>/.<sent>$
- "#what #this?"
Sentence 5
- ngarra minimarti: he is generous
- ^ngarra<prn><m><p3><sg>/he<prn><m><p3><sg>$ ^minimarti<adj><m>/generous<adj>$^.<sent>/.<sent>$
- "#generous #he"
Sentence 6
- jupijupi awi kiyija mirrawu: soup and a little bit of tobacco
- ^jupijupi<n>/soup<n>$ ^awi<cnjcoo>/and<cnjcoo>$ ^kiyija<n><prn><qnt>/little<n><prn><qnt>$ ^mirrawu<n>/tobacco<n>$^.<sent>/.<sent>$
- "#soup and #little #tobacco"
Sentence 7
- ngarra kijinga: he is small
- ^ngarra<prn><m><p3><sg>/he<n><prn><m><p3><sg>$ ^kiji<adj><f>/small<n><f>$^.<sent>/.<sent>$
- "#he #small"
Sentence 8
- yirrikipayi ngarra tuwara: the crocodile's tail
- ^yirrikipayi<n><m>/crocodile<n><m>$ ^ngarra<prn><m><p3><sg>/he<prn><m><p3><sg>$ ^tuwara<n>/tail<n>$^.<sent>/.<sent>$
- "#crocodile #he #tail"
Sentence 9
- awurra wawurruwi: those are men
- ^awurra<prn><dem><pl>/those<prn><dem><pl>$ ^wawurru<n><pl>/man<n><pl>$^.<sent>/.<sent>$
- "#those men"
Sentence 10
- ngiya paruwani: i am hungry
- ^ngiya<prn><p1><sg>/i<prn><p1><sg>$ ^paruwani<adj>/hungry<adj>$^.<sent>/.<sent>$
- "#hungry #i"
Additions
Expanded Morphological coverage
- Added temporal prefix:
- watu-
- awatupirni: He fights in the morning
- ki-
- akipirni: He fights in the evening
- watu-
- Added reciprocal and reflexive suffixes:
- -ajirri
- ngaripirnajirri: They hit eachother
- -amiya
- ngaripirnamiya: I hit myself
- -ajirri
Developed Lexical Selection Rules
Case 1
moyila → unlucky, non-pay week
Example sentences:
Tiwi Sentence | English Translation |
---|---|
Ngiya moyila naki awarra jurra api ngiya karrikamini kunawini | This is my non-pay week so I have no money. |
Ngiya moyila ngirimi kapi jupuluwu japini. | I had bad luck at cards last night. |
Case 2
pijara → eye, bullet
Example sentences:
Tiwi Sentence | English Translation |
---|---|
Ngiya ngurru-wuriyi kularlaga, api ngi-ri-marruriyi jurruwarli yukurri pijara | I went hunting and took a gun and four bullets. |
Ngarra yi-pirraya pijara pili jan | He washed his eyes because they were sore. |
Additional Structural Transfer Rules
Insert "Be" Verbs =
Tiw-Eng
(tiw) kamini naki? → (eng) what is this?
Tagger
(tiw) kami<prn><itg><m> naki<prn><dem><sg> → (eng) what <prn><itg><mf><sp> be <vbser><pres><p3><sg> this<prn><dem><mf><sg>
Biltrans
^kami<prn><itg><m>/what<prn><itg><m>/which<prn><itg><m>$ ^naki<prn><dem><sg>/this><prn><dem><sg>$^.<sent>/.<sent>$
Transfer
^what<prn><itg><m>$ ^be<vbser><pres><p3><sg>$ ^this><prn><dem><sg>$^.<sent>$
Word to Word Phrase
Tiw-Eng
(tiw) jupijupi awi kiyija mirrawu → (eng) soup and a little bit of tobacco
Tagger
(tiw) jupijupi<n> awi <cnjcoo> kiyija<prn><qnt> mirrawu <n> → (eng) soup<n><sg> and <cnjcoo> a little bit <adv> of <pr> tobacco <n><sg>
Biltrans
^jupijupi<n>/soup<n>$ ^awi<cnjcoo>/and<cnjcoo>$ ^kiyija<prn><qnt>/a little bit<adv>$ ^mirrawu<n>/tobacco<n>$^.<sent>/.<sent>$
Transfer
^soup<n><sg>$ ^and<cnjcoo>$ ^a little bit<adv>$ ^of<pr>$ ^tobacco<n><sg>$^.<sent>$
Final Tiw → Eng Evaluation
Transducer Coverage
- Precision and recall against the annotated.basic corpus: Getting a division by zero error
- Number of words in large text: 3754
- Coverage over large text: 0.39
- Total number of stems: 118
Translator Coverage
- Word error rate (WER): 94.64 %
- Position-independent word error rate (PER): 93.45 %
- Trimmed coverage
- longer corpora: 0.44
- large corpora: 0.30
- Number of tokens:
- longer: 154
- large: 3501