Difference between revisions of "Biak and English"
From LING073
(→Coverage Analysis) |
(→Developed Resources) |
||
(33 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | ==Developed Resources== | ||
+ | *[https://wikis.swarthmore.edu/ling073/Biak Transducer Resources] | ||
+ | *[https://wikis.swarthmore.edu/ling073/Biak_and_English/Lexical_selection Lexical Selection] | ||
+ | *[https://wikis.swarthmore.edu/ling073/Biak_and_English/Contrastive_grammar Contrastive Grammar] | ||
+ | *[https://wikis.swarthmore.edu/ling073/Biak_and_English/Structural_transfer Structural Transfer] | ||
+ | *[https://wikis.swarthmore.edu/ling073/Biak/Final_project Final Project] | ||
+ | |||
==External Resources== | ==External Resources== | ||
*[https://github.swarthmore.edu/Ling073-sp21/ling073-bhw-eng Language Pair Repo] | *[https://github.swarthmore.edu/Ling073-sp21/ling073-bhw-eng Language Pair Repo] | ||
Line 27: | Line 34: | ||
'''Intended Translation:''' This snake up here smelled it. | '''Intended Translation:''' This snake up here smelled it. | ||
'''Biltrans Output:''' ^Ikak<n>/Snake<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^yasne<det><dem>/@yasne<det><dem>$ ^nas<v><tv><p3><sg>/smell<vblex><tv><p3><sg>$ ^i<prn><pers><p3><sg>/prpers<prn><subj><p3><m><sg> | '''Biltrans Output:''' ^Ikak<n>/Snake<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^yasne<det><dem>/@yasne<det><dem>$ ^nas<v><tv><p3><sg>/smell<vblex><tv><p3><sg>$ ^i<prn><pers><p3><sg>/prpers<prn><subj><p3><m><sg> | ||
− | '''Translation Output:''' #Snake #the @yasne #smell he. | + | '''Translation Output:''' #Snake #the @yasne #smell he. |
+ | |||
==== 4. ==== | ==== 4. ==== | ||
'''Original sentence:''' Skovark ro mnu ine. | '''Original sentence:''' Skovark ro mnu ine. | ||
'''Intended Translation:''' The three live in this village. | '''Intended Translation:''' The three live in this village. | ||
− | '''Biltrans Output:''' ^i<prn><pers><p3><sg><spc>/the<det><def><sp>$ ^ne<det><dem>/@ne<det><dem> | + | '''Biltrans Output:''' ^Vark<v><iv><p3><pc>/Lie<vblex><iv><p3><pc>/Live<vblex><iv><p3><pc>$ ^ro<pr>/at<pr>$ ^mnu<n>/village<n>$ ^i<prn><pers><p3><sg><spc>/the<det><def><sp>$ ^ne<det><dem>/@ne<det><dem>$^.<sent>/.<sent>$ |
'''Translation Output:''' #Lie at #village the @ne. | '''Translation Output:''' #Lie at #village the @ne. | ||
+ | |||
==== 5. ==== | ==== 5. ==== | ||
'''Original sentence:''' Imnai kwar? | '''Original sentence:''' Imnai kwar? | ||
Line 68: | Line 77: | ||
'''Biltrans Output:''' ^Ikak<n>/Snake<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^ne<det><dem>/@ne<det><dem>$ ^snonsnon<n>/name<n>$ ^det<det><pos><px3sg><sg><spc>/prpers<det><pos><px3sg><sg><spc>$ ^Kormsamba<n>/Kormsamba<n>$^.<sent>/.<sent>$^.<sent>/.<sent>$ | '''Biltrans Output:''' ^Ikak<n>/Snake<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^ne<det><dem>/@ne<det><dem>$ ^snonsnon<n>/name<n>$ ^det<det><pos><px3sg><sg><spc>/prpers<det><pos><px3sg><sg><spc>$ ^Kormsamba<n>/Kormsamba<n>$^.<sent>/.<sent>$^.<sent>/.<sent>$ | ||
'''Translation Output:''' #Snake #the @ne #name #prpers #Kormsamba. | '''Translation Output:''' #Snake #the @ne #name #prpers #Kormsamba. | ||
+ | ==Additions== | ||
+ | ===Disambiguation=== | ||
+ | * Added a disambiguation rule, selecting the article form of the word "na" over the pronoun form when preceded by noun | ||
+ | ** Brought ambiguation in corpus from ~1.04 to ~1.03. Note an increase from our original ambiguation score due to more words being added. | ||
+ | ===Structural Transfer=== | ||
+ | * Added a rule which added an implicit subject in the absence of an explicit one. | ||
+ | * Changed rule to specify the type (subject/object) of pronoun. | ||
+ | * Added a rule which specified def/dem determiner phrases | ||
+ | * Added a rule that correctly translated the adverb endings on determiners | ||
+ | * These changes brought our WER from 72.22% to 27.78% and our PER from 63.89% to 22.22% | ||
+ | ===Adding Stems=== | ||
+ | * Added ~new 100 new stems | ||
+ | |||
+ | ==Polished RBMT System== | ||
+ | * Precision: 87.77293% | ||
+ | * Recall: 94.81132% | ||
+ | * Coverage over large corpus: 7072 / 14287 (~0.49499545040946314832) | ||
+ | * Stems in transducer: 382 | ||
+ | * Over bhw.longer.txt: | ||
+ | ** Word Error Rate (WER): 80.28 % | ||
+ | ** Position-independent word error rate (PER): 71.83 % | ||
+ | ** Percentage of unknown words: 15.54 % | ||
+ | ** Number of position-independent correct words: 81/284 | ||
+ | ** Coverage: 211 / 250 (0.844) | ||
+ | * Over bhw.corpus.large.txt | ||
+ | ** Coverage: 5817 / 13768 (~0.42250145264381173736) | ||
[[Category:Sp21_TranslationPairs]][[Category:English]][[Category:Biak]] | [[Category:Sp21_TranslationPairs]][[Category:English]][[Category:Biak]] |
Latest revision as of 13:34, 20 May 2021
Contents
Developed Resources
External Resources
BHW -> ENG Evaluation
Coverage Analysis
- Monolingual transducer coverage of small corpus: 141 / 220 (~0.64091)
- Bilingual transducer coverage of small corpus: 133 / 221 (~0.60181)
Sentence Evaluation
1.
Original sentence: Rusa anine dores. Intended Translation: This deer stood. Biltrans Output: ^Rusa<n>/Deer<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^ne<det><dem>/@ne<det><dem>$ ^ores<v><iv><p3><sg>/stand<vblex><iv><p3><sg> Translation Output: #Deer #the @ne #stand.
2.
Original sentence: Ras anya dares ya isam kaku inja sumbrow. Intended Translation: The sun was very hot on the day (and) so they were thirsty. Biltrans Output: ^Ras<n>/Day<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^dares<n>/sun<n>$ ^i<prn><pers><p3><sg><spc>/the<det><def><sp>$ ^sam<v><iv><p3><sg>/hot<vblex><iv><p3><sg>$ ^kaku<adv>/very<adv>$ ^inja<cnjcoo>/so<cnjadv>$ ^mbrow<v><iv><p3><du>/thirsty<vblex><iv><p3><du> Translation Output: #Day #the #sun the #hot very so #thirsty.
3.
Original sentence: Ikak aniyasne nyas i. Intended Translation: This snake up here smelled it. Biltrans Output: ^Ikak<n>/Snake<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^yasne<det><dem>/@yasne<det><dem>$ ^nas<v><tv><p3><sg>/smell<vblex><tv><p3><sg>$ ^i<prn><pers><p3><sg>/prpers<prn><subj><p3><m><sg> Translation Output: #Snake #the @yasne #smell he.
4.
Original sentence: Skovark ro mnu ine. Intended Translation: The three live in this village. Biltrans Output: ^Vark<v><iv><p3><pc>/Lie<vblex><iv><p3><pc>/Live<vblex><iv><p3><pc>$ ^ro<pr>/at<pr>$ ^mnu<n>/village<n>$ ^i<prn><pers><p3><sg><spc>/the<det><def><sp>$ ^ne<det><dem>/@ne<det><dem>$^.<sent>/.<sent>$ Translation Output: #Lie at #village the @ne.
5.
Original sentence: Imnai kwar? Intended Translation: Has it stopped yet? Biltrans Output: ^Mnai<v><iv><p3><sg>/Stop<vblex><iv><p3><sg>$ ^kwar<adv>/already<adv>$^?<sent>/?<sent>$^.<sent>/.<sent>$ Translation Output: #Stop already?
6.
Original sentence: Sampe nkofur rum nane ra nabro romawa sya sifarkor. Intended Translation: After we built the houses, then the children began to study. Biltrans Output: ^Sampe<adv>/Then<adv>$ ^fur<v><tv><p1><pl><ex>/build<vblex><tv><p1><pl><ex>$ ^rum<n>/house<n>$ ^na<prn><pers><p3><pl><inan><spc>/the<det><def><sp>$ ^ne<det><dem>/@ne<det><dem>$ ^ra<pr>/until<pr>$ ^bro<v><iv><p3><pl><inan>/empty<adj><p3><pl><inan>$ ^romawa<n>/boy<n>$ ^si<prn><pers><p3><pl><an><spc>/the<det><def><sp>$ ^farkor<v><iv><p3><pl><an>/study<vblex><iv><p3><pl><an>$^.<sent>/.<sent>$^.<sent>/.<sent>$ Translation Output: Then #build #house the @ne until #empty #boy the #study.
7.
Original sentence: Nggokain do Sepse fa nggofafyar. Intended Translation: We are having a conversation in Sepse. Biltrans Output: ^Kain<v><iv><p1><pl><ex>/Sit<vblex><iv><p1><pl><ex>$ ^do<pr>/at<pr>$ ^Sepse<n>/Sepse<n>$ ^fa<pr>/to<pr>$ ^fafyar<v><iv><p1><pl><ex>/tell<vblex><iv><p1><pl><ex>$^.<sent>/.<sent>$^.<sent>/.<sent>$ Translation Output: #Sit at #Sepse to #tell.
8.
Original sentence: Snewar vyedya iba. Intended Translation: Her belly was big. Biltrans Output:^Snewar<n>/Belly<n>$ ^det<det><pos><px3sg><sg><spc>/prpers<det><pos><px3sg><sg><spc>$ ^ba<v><iv><p3><sg>/big<adj><p3><sg>$^.<sent>/.<sent>$^.<sent>/. Translation Output: #Belly #prpers #big.
9.
Original sentence: Isnai aya ro marandan yedi. Intended Translation: It enlightens my life. Biltrans Output: ^Snai<v><tv><p3><sg>/Enlighten<vblex><tv><p3><sg>$ ^aya<prn><pers><p1><sg>/prpers<prn><subj><p1><mf><sg>$ ^ro<pr>/at<pr>$ ^marandan<n>/trip<n>$ ^det<det><pos><px1sg><sg><spc>/prpers<det><pos><px1sg><sg><spc>$^.<sent>/.<sent>$^.<sent>/.<sent> Translation Output: #Enlighten I at #trip #prpers.
10.
Original sentence: Ikak anine snonsnon vyedya Kormsamba. Intended Translation: This snake's name was Kormsamba. Biltrans Output: ^Ikak<n>/Snake<n>$ ^i<prn><pers><p3><sg><spc><giv>/the<det><def><sp><giv>$ ^ne<det><dem>/@ne<det><dem>$ ^snonsnon<n>/name<n>$ ^det<det><pos><px3sg><sg><spc>/prpers<det><pos><px3sg><sg><spc>$ ^Kormsamba<n>/Kormsamba<n>$^.<sent>/.<sent>$^.<sent>/.<sent>$ Translation Output: #Snake #the @ne #name #prpers #Kormsamba.
Additions
Disambiguation
- Added a disambiguation rule, selecting the article form of the word "na" over the pronoun form when preceded by noun
- Brought ambiguation in corpus from ~1.04 to ~1.03. Note an increase from our original ambiguation score due to more words being added.
Structural Transfer
- Added a rule which added an implicit subject in the absence of an explicit one.
- Changed rule to specify the type (subject/object) of pronoun.
- Added a rule which specified def/dem determiner phrases
- Added a rule that correctly translated the adverb endings on determiners
- These changes brought our WER from 72.22% to 27.78% and our PER from 63.89% to 22.22%
Adding Stems
- Added ~new 100 new stems
Polished RBMT System
- Precision: 87.77293%
- Recall: 94.81132%
- Coverage over large corpus: 7072 / 14287 (~0.49499545040946314832)
- Stems in transducer: 382
- Over bhw.longer.txt:
- Word Error Rate (WER): 80.28 %
- Position-independent word error rate (PER): 71.83 %
- Percentage of unknown words: 15.54 %
- Number of position-independent correct words: 81/284
- Coverage: 211 / 250 (0.844)
- Over bhw.corpus.large.txt
- Coverage: 5817 / 13768 (~0.42250145264381173736)