Ladino and English

From LING073
Jump to: navigation, search

Note: Resources for machine translation between Ladino and English

External Resources

Github Repo for Language Pair

Ladino Transducer

English Transducer

Developed Resources

Bilingual Corpus

Contrastive Grammar

Structural Transfer

Lad → Eng Evaluation

Sentence Analysis

Sentence 1

El mirava en el cielo y en la estrellería: He was looking at heaven and at the stars

^El<prn><pers><p3><m><sg><nom>/Prpers<prn><subj><p3><m><sg>$ ^mirar<v><iv><pii><p1><sg>/look<vblex><pii><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^cielo<n><m><sg>/heavens<n><sg>$ ^y<cnjcoo>/and<cnjcoo>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><f><sg>/the<det><def><sp>$ ^estrellería<n><f><sg>/stars<n><sg>$^.<sent>/.<sent>$

He #look at the #heavens and at the #stars

Sentence 2

No me mires: Don't look at me

^No<adv>/Not<adv>$ ^me<prn><pers><p1><sg><pro>/prpers<prn><obj><p1><mf><sg>$ ^mirar<v><iv><prs><p2><sg>/look<vblex><prs><p2><sg>$^.<sent>/.<sent>$

Not me #look

Sentence 3

Yo mirí en el korason de la estrellería: I looked at the heart of the stars

^Yo<prn><pers><p1><sg><nom>/Prpers<prn><subj><p1><mf><sg>$ ^mirar<v><iv><pret><p1><sg>/look<vblex><pret><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^korason<n><m><sg>/heart<n><sg>$ ^de<pr>/of<pr>/from<pr>$ ^*la/*la$ ^estrellería<n><f><sg>/stars<n><sg>$^.<sent>/.<sent>$

I #look at the heart of the #stars

Sentence 4

Eyas no somportaría la dolor: They(fem) do not bear the pain

^Eyas<prn><pers><p3><f><pl><nom>/Prpers<prn><subj><p3><f><pl>$ ^no<adv>/not<adv>$ ^somportar<v><tv><cni><p1><sg>/bear<vblex><cni><p1><sg>$ ^*la/*la$ ^dolor<n><f><sg>/pain<n><sg>$^.<sent>/.<sent>$

They not #bear the pain

Sentence 5

Eyos kantan: They(masculine) sing

^Eyos<prn><pers><p3><m><pl><nom>/Prpers<prn><subj><p3><m><pl>$ ^kantar<v><iv><pres><p3><pl>/sing<vblex><pres><p3><pl>$^.<sent>/.<sent>$

They #sing

Sentence 6

Eya kantó: She sang

^Eya<prn><pers><p3><f><sg><nom>/Prpers<prn><subj><p3><f><sg>$ ^kantar<v><iv><pret><p3><sg>/sing<vblex><pret><p3><sg>$^.<sent>/.<sent>$

She #sing

Sentence 7

Yo bivire en Yisrael: I will live in Israel

^Yo<prn><pers><p1><sg><nom>/Prpers<prn><subj><p1><mf><sg>$ ^bivir<v><iv><fut><p1><sg>/live<vblex><fut><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^Yisrael<np>/Israel<np>$^.<sent>/.<sent>$

I #live at #Israel

Sentence 8

Nozotros komeriamos en la kavané: We would eat in the coffeehouse

^Nozotros<prn><pers><p1><m><pl><nom>/Prpers<prn><subj><p1><mf><pl>$ ^komer<v><tv><cni><p1><pl>/eat<vblex><cni><p1><pl>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><f><sg>/the<det><def><sp>$ ^kavané<n><f><sg>/coffeehouse<n><sg>$^.<sent>/.<sent>$

We #eat at the #coffeehouse

Sentence 9

Tu biviras kuatro mezes: You will live four months

^Tu<prn><pers><p2><sg><nom>/Prpers<prn><subj><p2><mf><sg>$ ^bivir<v><iv><fut><p2><sg>/live<vblex><fut><p2><sg>$ ^kuatro<num>/four<num><pl>$ ^mes<n><m><pl>/month<n><pl>$^.<sent>/.<sent>$

You #live four months

Sentence 10

Eyos no komieron el limón: They did not eat the lemon

^Eyos<prn><pers><p3><m><pl><nom>/Prpers<prn><subj><p3><m><pl>$ ^no<adv>/not<adv>$ ^komer<v><tv><pret><p3><pl>/eat<vblex><pret><p3><pl>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^limón<n><m><sg>/lemon<n><sg>$^.<sent>/.<sent>$

They not #eat the lemon.

Intitial Overall Analysis

The coverage of the monolingual transducer on the lad.sentences.txt file (which has more than the 10 sentences listed) is ~0.39583. The coverage of the bilingual transducer on the same file is ~0.33568. Further adaptation is done here.

Final Evaluation

Additions

I added four rules to verb morphology and added adjective inflection, I added four rules to the morphological disambiguator (differentiating the verb meanings from the preposition meanings for komo, de, and para, and differentiating the adjective meaning of querido from the verb and noun meanings) and I added two new transfer rules to the lad-eng.rtx file (making 'te dio' analyze as 'gave you' and making 'la Espanya' analyze as 'Spain').

Precision and Recall

Totals: 162 forms, 182 tp, 9 fp, 0 tn, 162 fn

Precision: 95.28796%

Recall: 52.90698%

Monolingual Transducer Coverage

Coverage over lad.corpus.large.txt: 289488 / 649936 (~0.44541000960094532385)

remaining unknown forms: 360448

649936 words in the corpus

393 stems, including 22 punctuation.

MT Coverage

Word error rate (WER) on lad.longer.txt: 86.55 %

Position-independent word error rate (PER) on lad.longer.txt: 76.82 %

Number of position-independent correct words: 150

Coverage over lad.longer.txt: 312/655, or ~0.47633587786259541985

Coverage over lad.corpus.large.txt : 254170/626118 ~0.40594584407412021376