Difference between revisions of "Magahi to English Evaluation"

From LING073
Jump to: navigation, search
(Coverage of bilingual transducer on the same file)
Line 4: Line 4:
 
== Coverage of bilingual transducer on the same file ==
 
== Coverage of bilingual transducer on the same file ==
 
There doesn't appear to be a <code>mag-eng.automorf.hfst</code> and for whatever reason <code>aq-covtest ../ling073-mag-eng-corpus/mag.sentences.txt ./mag-eng.automorf.bin</code> just gives an error, but dividing the number of translated words by the total number of words in <code>mag.sentences.txt</code> yields a  
 
There doesn't appear to be a <code>mag-eng.automorf.hfst</code> and for whatever reason <code>aq-covtest ../ling073-mag-eng-corpus/mag.sentences.txt ./mag-eng.automorf.bin</code> just gives an error, but dividing the number of translated words by the total number of words in <code>mag.sentences.txt</code> yields a  
coverage, with the punctuation removed, of 158 / 260, or 60.76923076923076%.
+
coverage, with the punctuation removed, of 161 / 260, or 61.92307692307693%.
  
 
== Sentences ==
 
== Sentences ==

Revision as of 20:44, 11 April 2021

Coverage of monolingual transducer on mag.sentences.txt

coverage: 229 / 305 (~0.75081967213114754098)

Coverage of bilingual transducer on the same file

There doesn't appear to be a mag-eng.automorf.hfst and for whatever reason aq-covtest ../ling073-mag-eng-corpus/mag.sentences.txt ./mag-eng.automorf.bin just gives an error, but dividing the number of translated words by the total number of words in mag.sentences.txt yields a coverage, with the punctuation removed, of 161 / 260, or 61.92307692307693%.

Sentences

1. कोई जङगल में एगो साधू रह हलन.

  • In a certain forest there dwelt a saint.
  • ^koī<adj>/certain<adj>$ ^jaṅgal<n><obl>/forest<n><obl>$ ^meṃ<post>/in<pr>$ ^ego<adj>/one<adj>$ ^sādhū<n>/saint<n>$ ^raHa<v><pres>/live<v><pres>$ ^*हलन/*हलन$
  • certain #forest in #one #saint #live *हलन

2. उनक भीरी एगो राजा भुलाते जा पहुंचलन आउ साधू के देख के पाओं लाग के बैठ गेलन.

  • One day a king lost his way and approached him. When the king saw him he paid him reverence and sat down.
  • ^*उनक/*उनक$ ^bhīrī<adj>/near<adj>$ ^ego<adj>/one<adj>$ ^rājā<n>/king<n>$ ^bhulāte<n>/way<n>$ ^jā<v><pres>/go<v><pres>$ ^*पहुंचलन/*पहुंचलन$ ^āu<cnj>/and<cnj>$ ^sādhū<n><obl>/saint<n><obl>$ ^ke<post>/of<pr>$ ^dekh<v><pres>/see<v><pres>$ ^ke<post>/of<pr>$ ^pāoṃ<n>/feet<n>$ ^lāg<v><pres>/touch<v><pres>$ ^ke<post>/of<pr>$ ^baiṭh<v><pres>/sat<v><pres>$ ^ge<v><past><o_p3><low>/went<v><past><o_p3><low>$
  • *उनक #near #one #king #way #go *पहुंचलन #and #saint of #see of #feet #touch of #sat #went

3. साधू उनका पिआसल जान के थोडाऐसन जङगल के फर खाए ला देलथीन, आउ पानी पीला देलथीन.

  • The saint seeing that he was thirsty gave him some wild fruit to eat and some water to drink.
  • ^sādhū<n>/saint<n>$ ^*उनका/*उनका$ ^piāsal<adj>/thirsty<adj>$ ^jān<v><pres>/know<v><pres>$ ^ke<post>/of<pr>$ ^*थोडाऐसन/*थोडाऐसन$ ^jaṅgal<n><obl>/forest<n><obl>$ ^ke<post>/of<pr>$ ^phar<n>/fruit<n>$ ^khā<v><inf>/eat<v><inf>$ ^*ला/*ला$ ^de<v><past><o_p3><hi>/give<v><past><o_p3><hi>$ ^āu<cnj>/and<cnj>$ ^pānī<n>/water<n>$ ^pī<v><perf><obl>/drink<v><perf><obl>$ ^de<v><past><o_p3><hi>/give<v><past><o_p3><hi>$
  • #saint *उनका #thirsty #know of *थोडाऐसन #forest of #fruit #eat *ला #give #and #water #drink #give

4. राजा खा के आउ पानी पी के बहुत खुस भेलन, आउ ठनढा हवा में थोडे बेर बैठला से थकैनी निकल गेलैन.

  • When he ate the fruit and drank the water, the king became glad in heart, and, after sitting for a short time in the cool air, his weariness left him.
  • ^rājā<n>/king<n>$ ^khā<v><pres>/eat<v><pres>$ ^ke<post>/of<pr>$ ^āu<cnj>/and<cnj>$ ^pānī<n>/water<n>$ ^pī<v><pres>/drink<v><pres>$ ^ke<post>/of<pr>$ ^baHut<adv>/very$ ^khus<adj>/glad<adj>$ ^bhe<v><past><o_p3><low>/become<v><past><o_p3><low>$ ^āu<cnj>/and<cnj>$ ^ṭhanḍhā<adj>/cool<adj>$ ^*हवा/*हवा$ ^meṃ<post>/in<pr>$ ^thoṛe<adj>/some<adj>$ ^ber<n>/time<n>$ ^baiṭh<v><perf><obl>/sat<v><perf><obl>$ ^se<post>/with<pr>$ ^thakainī<n>/weariness<n>$ ^*निकल/*निकल$ ^ge<v><past><o_p3><hi>/went<v><past><o_p3><hi>$
  • #king #eat of #and #water #drink of #very #glad #become #and #cool *हवा in some #time #sat with #weariness *निकल #went

5. तब राजा साधू जी से हाथ जोर के पुछलन के, "महाराज! हमरा कुछ सिखावन के बात कहीं, के जेकरा से हमर कलेआन होय."

  • Then reverently clasping his hands before the holy man he said to him, "Reverend Sir, deign to tell me some words of advice, by which my welfare may be assured."
  • ^tab<adv>/then<adv>$ ^rājā<n>/king<n>$ ^sādhū<n><obl>/saint<n><obl>$ ^jī<post>/the<pr>$ ^se<post>/with<pr>$ ^Hāth<n>/hand<n>$ ^jor<v><pres>/clasp<v><pres>$ ^ke<post>/of<pr>$ ^puch<v><past><o_p3><low>/ask<v><past><o_p3><low>$ ^ke<post>/of<pr>$ ^maHārāj<n>/great-king<n>$ ^*हमरा/*हमरा$ ^kuch<det>/some<det>$ ^*सिखावन/*सिखावन$ ^ke<post>/of<pr>$ ^bāt<n>/thing<n>$ ^kaH<v><pres><s_p1><o_p2><hi>/say<v><pres><s_p1><o_p2><hi>$ ^ke<post>/of<pr>$ ^*जेकरा/*जेकरा$ ^se<post>/with<pr>$ ^*हमर/*हमर$ ^kaleān<n>/welfare<n>$ ^*होय/*होय$
  • then #king #saint #the with #hand #clasp of #ask of #great-king *हमरा #some *सिखावन of #thing #say of *जेकरा with *हमर #welfare *होय

6. साधू जी बोललन के, "ई चारो बात के इआद रख.

  • The saint replied, "Keep in thy remembrance these four things:
  • ^sādhū<n><obl>/saint<n><obl>$ ^jī<post>/the<pr>$ ^bol<v><past><o_p3><low>/speak<v><past><o_p3><low>$ ^ke<post>/of<pr>$ ^*ई/*ई$ ^cāro<adj>/four<adj>$ ^bāt<n><obl>/thing<n><obl>$ ^ke<post>/of<pr>$ ^iād<n>/memory<n>$ ^*रख/*रख$
  • #saint #the #speak of *ई #four #thing of #memory *रख

7. पहिला ई के, नरायन सामी के नाम हर दम जपना.

  • First, to every keep repeating the name of God;
  • ^paHilā<n>/first<n>$ ^*ई/*ई$ ^ke<post>/of<pr>$ ^*नरायन/*नरायन$ ^sāmī<n><obl>/lord<n><obl>$ ^ke<post>/of<pr>$ ^nām<n>/name<n>$ ^Har<det>/every<det>$ ^dam<n>/moment<n>$ ^*जपना/*जपना$
  • #first *ई of *नरायन #lord of #name #every #moment *जपना

8. दूसर ई के, सब जीउ पर दया रखना.

  • Second, to show compassion to all living creatures;
  • ^dūsar<n>/second<n>$ ^*ई/*ई$ ^ke<post>/of<pr>$ ^*सब/*सब$ ^*जीउ/*जीउ$ ^par<post>/on<pr>$ ^*दया/*दया$ ^*रखना/*रखना$
  • #second *ई of *सब *जीउ on *दया *रखना

9. तीसर ई के, अन कर चूक के छमा करना.

  • Third, to be tolerant to the errors of others;
  • ^tīsar<n>/third<n>$ ^*ई/*ई$ ^ke<post>/of<pr>$ ^*अन/*अन$ ^*कर/*कर$ ^*चूक/*चूक$ ^ke<post>/of<pr>$ ^*छमा/*छमा$ ^*करना/*करना$
  • #third *ई of *अन *कर *चूक of *छमा *करना

10. आउ चाउठा ई के, कभी कोई बात के घमण्ड ना करन.

  • and Fourthly, never to be vain-glorious for any cause.
  • ^āu<cnj>/and<cnj>$ ^*चाउठा/*चाउठा$ ^*ई/*ई$ ^ke<post>/of<pr>$ ^*कभी/*कभी$ ^koī<adj>/certain<adj>$ ^bāt<n><obl>/thing<n><obl>$ ^ke<post>/of<pr>$ ^*घमण/*घमण$^*ड/*ड$ ^*ना/*ना$ ^*करन/*करन$
  • #and *चाउठा *ई of *कभी certain #thing of *घमण*ड *ना *करन

Analysis

The reason for most of the asterisks is pronouns and the auxiliary. They do not have straightforward English translations at all. For the postpositions, we gave the most common translation, but most of the postpositions have incredibly broad meanings.

Also, for whatever reason, lexd seems to not like it when words end in 'w' or 'an'. I think it might have to do with some potential ambiguity with some endings. For example, सिखावन (sikhāwan) is in NounRoot, so it should be analyzed as sikhāwan<n>, but -an is the plural ending, so I think it tries to match with the plural ending case and freaks out since sikhāw isn't a NounRoot. That or there's some bug in the transliterator, which is definitely possible, since seemingly benign words like ला (lā) also completely fail to be analyzed for no apparent reason.

The final trouble point for a lot of this is that this story comes from the really old source, not the modern source. So a lot of the endings are different, and verbs work completely differently. It took a lot of care to figure out what was ending and what was root since it didn't line up with the modern source most of the time. We were able to add most of the inflections as options into our lexd file the best we could, but some we weren't able to figure out. For example, I think "ना" (nā) is some kind of "should be X'd" suffix, so जप (jap: mutter) becomes जपना (japnā: should be muttered), but it only shows up twice in the story, and isn't in either of the sources, so we weren't sure what to do.