Difference between revisions of "Ladino and English"
(→MT Coverage) |
|||
(27 intermediate revisions by the same user not shown) | |||
Line 10: | Line 10: | ||
==Developed Resources== | ==Developed Resources== | ||
[https://github.swarthmore.edu/Ling073-sp21/ling073-lad-eng-corpus Bilingual Corpus] | [https://github.swarthmore.edu/Ling073-sp21/ling073-lad-eng-corpus Bilingual Corpus] | ||
+ | |||
+ | [https://wikis.swarthmore.edu/ling073/Ladino_and_English/Contrastive_Grammar Contrastive Grammar] | ||
+ | |||
+ | [https://wikis.swarthmore.edu/ling073/Ladino_and_English/Structural_Transfer Structural Transfer] | ||
==Lad → Eng Evaluation== | ==Lad → Eng Evaluation== | ||
===Sentence Analysis=== | ===Sentence Analysis=== | ||
====Sentence 1==== | ====Sentence 1==== | ||
− | + | El mirava en el cielo y en la estrellería: He was looking at heaven and at the stars | |
+ | |||
+ | ^El<prn><pers><p3><m><sg><nom>/Prpers<prn><subj><p3><m><sg>$ ^mirar<v><iv><pii><p1><sg>/look<vblex><pii><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^cielo<n><m><sg>/heavens<n><sg>$ ^y<cnjcoo>/and<cnjcoo>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><f><sg>/the<det><def><sp>$ ^estrellería<n><f><sg>/stars<n><sg>$^.<sent>/.<sent>$ | ||
+ | |||
+ | He #look at the #heavens and at the #stars | ||
====Sentence 2==== | ====Sentence 2==== | ||
No me mires: Don't look at me | No me mires: Don't look at me | ||
+ | |||
+ | ^No<adv>/Not<adv>$ ^me<prn><pers><p1><sg><pro>/prpers<prn><obj><p1><mf><sg>$ ^mirar<v><iv><prs><p2><sg>/look<vblex><prs><p2><sg>$^.<sent>/.<sent>$ | ||
+ | |||
+ | Not me #look | ||
====Sentence 3==== | ====Sentence 3==== | ||
Yo mirí en el korason de la estrellería: I looked at the heart of the stars | Yo mirí en el korason de la estrellería: I looked at the heart of the stars | ||
+ | |||
+ | ^Yo<prn><pers><p1><sg><nom>/Prpers<prn><subj><p1><mf><sg>$ ^mirar<v><iv><pret><p1><sg>/look<vblex><pret><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^korason<n><m><sg>/heart<n><sg>$ ^de<pr>/of<pr>/from<pr>$ ^*la/*la$ ^estrellería<n><f><sg>/stars<n><sg>$^.<sent>/.<sent>$ | ||
+ | |||
+ | I #look at the heart of the #stars | ||
====Sentence 4==== | ====Sentence 4==== | ||
Eyas no somportaría la dolor: They(fem) do not bear the pain | Eyas no somportaría la dolor: They(fem) do not bear the pain | ||
+ | |||
+ | ^Eyas<prn><pers><p3><f><pl><nom>/Prpers<prn><subj><p3><f><pl>$ ^no<adv>/not<adv>$ ^somportar<v><tv><cni><p1><sg>/bear<vblex><cni><p1><sg>$ ^*la/*la$ ^dolor<n><f><sg>/pain<n><sg>$^.<sent>/.<sent>$ | ||
+ | |||
+ | They not #bear the pain | ||
====Sentence 5==== | ====Sentence 5==== | ||
Eyos kantan: They(masculine) sing | Eyos kantan: They(masculine) sing | ||
+ | |||
+ | ^Eyos<prn><pers><p3><m><pl><nom>/Prpers<prn><subj><p3><m><pl>$ ^kantar<v><iv><pres><p3><pl>/sing<vblex><pres><p3><pl>$^.<sent>/.<sent>$ | ||
+ | |||
+ | They #sing | ||
====Sentence 6==== | ====Sentence 6==== | ||
Eya kantó: She sang | Eya kantó: She sang | ||
+ | |||
+ | ^Eya<prn><pers><p3><f><sg><nom>/Prpers<prn><subj><p3><f><sg>$ ^kantar<v><iv><pret><p3><sg>/sing<vblex><pret><p3><sg>$^.<sent>/.<sent>$ | ||
+ | |||
+ | She #sing | ||
====Sentence 7==== | ====Sentence 7==== | ||
Yo bivire en Yisrael: I will live in Israel | Yo bivire en Yisrael: I will live in Israel | ||
+ | |||
+ | ^Yo<prn><pers><p1><sg><nom>/Prpers<prn><subj><p1><mf><sg>$ ^bivir<v><iv><fut><p1><sg>/live<vblex><fut><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^Yisrael<np>/Israel<np>$^.<sent>/.<sent>$ | ||
+ | |||
+ | I #live at #Israel | ||
====Sentence 8==== | ====Sentence 8==== | ||
− | + | Nozotros komeriamos en la kavané: We would eat in the coffeehouse | |
+ | |||
+ | ^Nozotros<prn><pers><p1><m><pl><nom>/Prpers<prn><subj><p1><mf><pl>$ ^komer<v><tv><cni><p1><pl>/eat<vblex><cni><p1><pl>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><f><sg>/the<det><def><sp>$ ^kavané<n><f><sg>/coffeehouse<n><sg>$^.<sent>/.<sent>$ | ||
+ | |||
+ | We #eat at the #coffeehouse | ||
====Sentence 9==== | ====Sentence 9==== | ||
− | Tu biviras | + | Tu biviras kuatro mezes: You will live four months |
+ | |||
+ | ^Tu<prn><pers><p2><sg><nom>/Prpers<prn><subj><p2><mf><sg>$ ^bivir<v><iv><fut><p2><sg>/live<vblex><fut><p2><sg>$ ^kuatro<num>/four<num><pl>$ ^mes<n><m><pl>/month<n><pl>$^.<sent>/.<sent>$ | ||
+ | |||
+ | You #live four months | ||
====Sentence 10==== | ====Sentence 10==== | ||
Eyos no komieron el limón: They did not eat the lemon | Eyos no komieron el limón: They did not eat the lemon | ||
+ | |||
+ | ^Eyos<prn><pers><p3><m><pl><nom>/Prpers<prn><subj><p3><m><pl>$ ^no<adv>/not<adv>$ ^komer<v><tv><pret><p3><pl>/eat<vblex><pret><p3><pl>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^limón<n><m><sg>/lemon<n><sg>$^.<sent>/.<sent>$ | ||
+ | |||
+ | They not #eat the lemon. | ||
+ | |||
+ | ==Intitial Overall Analysis== | ||
+ | The coverage of the monolingual transducer on the lad.sentences.txt file (which has more than the 10 sentences listed) is ~0.39583. The coverage of the bilingual transducer on the same file is ~0.33568. Further adaptation is done [https://wikis.swarthmore.edu/ling073/Ladino_and_English/Lexical_selection#Case_1 here]. | ||
+ | |||
+ | ==Final Evaluation== | ||
+ | |||
+ | ===Additions=== | ||
+ | I added four rules to verb morphology and added adjective inflection, I added four rules to the morphological disambiguator (differentiating the verb meanings from the preposition meanings for komo, de, and para, and differentiating the adjective meaning of querido from the verb and noun meanings) and I added two new transfer rules to the lad-eng.rtx file (making 'te dio' analyze as 'gave you' and making 'la Espanya' analyze as 'Spain'). | ||
+ | |||
+ | ===Precision and Recall=== | ||
+ | Totals: 162 forms, 182 tp, 9 fp, 0 tn, 162 fn | ||
+ | |||
+ | Precision: 95.28796% | ||
+ | |||
+ | Recall: 52.90698% | ||
+ | |||
+ | ===Monolingual Transducer Coverage=== | ||
+ | Coverage over lad.corpus.large.txt: 289488 / 649936 (~0.44541000960094532385) | ||
+ | |||
+ | remaining unknown forms: 360448 | ||
+ | |||
+ | 649936 words in the corpus | ||
+ | |||
+ | 393 stems, including 22 punctuation. | ||
+ | |||
+ | ===MT Coverage=== | ||
+ | Word error rate (WER) on lad.longer.txt: 86.55 % | ||
+ | |||
+ | Position-independent word error rate (PER) on lad.longer.txt: 76.82 % | ||
+ | |||
+ | Number of position-independent correct words: 150 | ||
+ | |||
+ | Coverage over lad.longer.txt: 312/655, or ~0.47633587786259541985 | ||
+ | |||
+ | Coverage over lad.corpus.large.txt : 254170/626118 ~0.40594584407412021376 | ||
+ | [[Category:Ladino]] [[Category:English]] [[Category:Sp21_TranslationPairs]] |
Latest revision as of 20:23, 19 May 2021
Note: Resources for machine translation between Ladino and English
Contents
External Resources
Developed Resources
Lad → Eng Evaluation
Sentence Analysis
Sentence 1
El mirava en el cielo y en la estrellería: He was looking at heaven and at the stars
^El<prn><pers><p3><m><sg><nom>/Prpers<prn><subj><p3><m><sg>$ ^mirar<v><iv><pii><p1><sg>/look<vblex><pii><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^cielo<n><m><sg>/heavens<n><sg>$ ^y<cnjcoo>/and<cnjcoo>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><f><sg>/the<det><def><sp>$ ^estrellería<n><f><sg>/stars<n><sg>$^.<sent>/.<sent>$
He #look at the #heavens and at the #stars
Sentence 2
No me mires: Don't look at me
^No<adv>/Not<adv>$ ^me<prn><pers><p1><sg><pro>/prpers<prn><obj><p1><mf><sg>$ ^mirar<v><iv><prs><p2><sg>/look<vblex><prs><p2><sg>$^.<sent>/.<sent>$
Not me #look
Sentence 3
Yo mirí en el korason de la estrellería: I looked at the heart of the stars
^Yo<prn><pers><p1><sg><nom>/Prpers<prn><subj><p1><mf><sg>$ ^mirar<v><iv><pret><p1><sg>/look<vblex><pret><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^korason<n><m><sg>/heart<n><sg>$ ^de<pr>/of<pr>/from<pr>$ ^*la/*la$ ^estrellería<n><f><sg>/stars<n><sg>$^.<sent>/.<sent>$
I #look at the heart of the #stars
Sentence 4
Eyas no somportaría la dolor: They(fem) do not bear the pain
^Eyas<prn><pers><p3><f><pl><nom>/Prpers<prn><subj><p3><f><pl>$ ^no<adv>/not<adv>$ ^somportar<v><tv><cni><p1><sg>/bear<vblex><cni><p1><sg>$ ^*la/*la$ ^dolor<n><f><sg>/pain<n><sg>$^.<sent>/.<sent>$
They not #bear the pain
Sentence 5
Eyos kantan: They(masculine) sing
^Eyos<prn><pers><p3><m><pl><nom>/Prpers<prn><subj><p3><m><pl>$ ^kantar<v><iv><pres><p3><pl>/sing<vblex><pres><p3><pl>$^.<sent>/.<sent>$
They #sing
Sentence 6
Eya kantó: She sang
^Eya<prn><pers><p3><f><sg><nom>/Prpers<prn><subj><p3><f><sg>$ ^kantar<v><iv><pret><p3><sg>/sing<vblex><pret><p3><sg>$^.<sent>/.<sent>$
She #sing
Sentence 7
Yo bivire en Yisrael: I will live in Israel
^Yo<prn><pers><p1><sg><nom>/Prpers<prn><subj><p1><mf><sg>$ ^bivir<v><iv><fut><p1><sg>/live<vblex><fut><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^Yisrael<np>/Israel<np>$^.<sent>/.<sent>$
I #live at #Israel
Sentence 8
Nozotros komeriamos en la kavané: We would eat in the coffeehouse
^Nozotros<prn><pers><p1><m><pl><nom>/Prpers<prn><subj><p1><mf><pl>$ ^komer<v><tv><cni><p1><pl>/eat<vblex><cni><p1><pl>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><f><sg>/the<det><def><sp>$ ^kavané<n><f><sg>/coffeehouse<n><sg>$^.<sent>/.<sent>$
We #eat at the #coffeehouse
Sentence 9
Tu biviras kuatro mezes: You will live four months
^Tu<prn><pers><p2><sg><nom>/Prpers<prn><subj><p2><mf><sg>$ ^bivir<v><iv><fut><p2><sg>/live<vblex><fut><p2><sg>$ ^kuatro<num>/four<num><pl>$ ^mes<n><m><pl>/month<n><pl>$^.<sent>/.<sent>$
You #live four months
Sentence 10
Eyos no komieron el limón: They did not eat the lemon
^Eyos<prn><pers><p3><m><pl><nom>/Prpers<prn><subj><p3><m><pl>$ ^no<adv>/not<adv>$ ^komer<v><tv><pret><p3><pl>/eat<vblex><pret><p3><pl>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^limón<n><m><sg>/lemon<n><sg>$^.<sent>/.<sent>$
They not #eat the lemon.
Intitial Overall Analysis
The coverage of the monolingual transducer on the lad.sentences.txt file (which has more than the 10 sentences listed) is ~0.39583. The coverage of the bilingual transducer on the same file is ~0.33568. Further adaptation is done here.
Final Evaluation
Additions
I added four rules to verb morphology and added adjective inflection, I added four rules to the morphological disambiguator (differentiating the verb meanings from the preposition meanings for komo, de, and para, and differentiating the adjective meaning of querido from the verb and noun meanings) and I added two new transfer rules to the lad-eng.rtx file (making 'te dio' analyze as 'gave you' and making 'la Espanya' analyze as 'Spain').
Precision and Recall
Totals: 162 forms, 182 tp, 9 fp, 0 tn, 162 fn
Precision: 95.28796%
Recall: 52.90698%
Monolingual Transducer Coverage
Coverage over lad.corpus.large.txt: 289488 / 649936 (~0.44541000960094532385)
remaining unknown forms: 360448
649936 words in the corpus
393 stems, including 22 punctuation.
MT Coverage
Word error rate (WER) on lad.longer.txt: 86.55 %
Position-independent word error rate (PER) on lad.longer.txt: 76.82 %
Number of position-independent correct words: 150
Coverage over lad.longer.txt: 312/655, or ~0.47633587786259541985
Coverage over lad.corpus.large.txt : 254170/626118 ~0.40594584407412021376