Difference between revisions of "Ladino and English"
(→Additions) |
(→MT Coverage) |
||
(8 intermediate revisions by the same user not shown) | |||
Line 87: | Line 87: | ||
They not #eat the lemon. | They not #eat the lemon. | ||
− | ==Overall Analysis== | + | ==Intitial Overall Analysis== |
The coverage of the monolingual transducer on the lad.sentences.txt file (which has more than the 10 sentences listed) is ~0.39583. The coverage of the bilingual transducer on the same file is ~0.33568. Further adaptation is done [https://wikis.swarthmore.edu/ling073/Ladino_and_English/Lexical_selection#Case_1 here]. | The coverage of the monolingual transducer on the lad.sentences.txt file (which has more than the 10 sentences listed) is ~0.39583. The coverage of the bilingual transducer on the same file is ~0.33568. Further adaptation is done [https://wikis.swarthmore.edu/ling073/Ladino_and_English/Lexical_selection#Case_1 here]. | ||
− | == | + | ==Final Evaluation== |
− | |||
+ | ===Additions=== | ||
+ | I added four rules to verb morphology and added adjective inflection, I added four rules to the morphological disambiguator (differentiating the verb meanings from the preposition meanings for komo, de, and para, and differentiating the adjective meaning of querido from the verb and noun meanings) and I added two new transfer rules to the lad-eng.rtx file (making 'te dio' analyze as 'gave you' and making 'la Espanya' analyze as 'Spain'). | ||
+ | |||
+ | ===Precision and Recall=== | ||
+ | Totals: 162 forms, 182 tp, 9 fp, 0 tn, 162 fn | ||
+ | |||
+ | Precision: 95.28796% | ||
+ | |||
+ | Recall: 52.90698% | ||
+ | |||
+ | ===Monolingual Transducer Coverage=== | ||
+ | Coverage over lad.corpus.large.txt: 289488 / 649936 (~0.44541000960094532385) | ||
+ | |||
+ | remaining unknown forms: 360448 | ||
+ | |||
+ | 649936 words in the corpus | ||
+ | |||
+ | 393 stems, including 22 punctuation. | ||
+ | |||
+ | ===MT Coverage=== | ||
+ | Word error rate (WER) on lad.longer.txt: 86.55 % | ||
+ | |||
+ | Position-independent word error rate (PER) on lad.longer.txt: 76.82 % | ||
+ | |||
+ | Number of position-independent correct words: 150 | ||
+ | |||
+ | Coverage over lad.longer.txt: 312/655, or ~0.47633587786259541985 | ||
+ | |||
+ | Coverage over lad.corpus.large.txt : 254170/626118 ~0.40594584407412021376 | ||
[[Category:Ladino]] [[Category:English]] [[Category:Sp21_TranslationPairs]] | [[Category:Ladino]] [[Category:English]] [[Category:Sp21_TranslationPairs]] |
Latest revision as of 20:23, 19 May 2021
Note: Resources for machine translation between Ladino and English
Contents
External Resources
Developed Resources
Lad → Eng Evaluation
Sentence Analysis
Sentence 1
El mirava en el cielo y en la estrellería: He was looking at heaven and at the stars
^El<prn><pers><p3><m><sg><nom>/Prpers<prn><subj><p3><m><sg>$ ^mirar<v><iv><pii><p1><sg>/look<vblex><pii><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^cielo<n><m><sg>/heavens<n><sg>$ ^y<cnjcoo>/and<cnjcoo>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><f><sg>/the<det><def><sp>$ ^estrellería<n><f><sg>/stars<n><sg>$^.<sent>/.<sent>$
He #look at the #heavens and at the #stars
Sentence 2
No me mires: Don't look at me
^No<adv>/Not<adv>$ ^me<prn><pers><p1><sg><pro>/prpers<prn><obj><p1><mf><sg>$ ^mirar<v><iv><prs><p2><sg>/look<vblex><prs><p2><sg>$^.<sent>/.<sent>$
Not me #look
Sentence 3
Yo mirí en el korason de la estrellería: I looked at the heart of the stars
^Yo<prn><pers><p1><sg><nom>/Prpers<prn><subj><p1><mf><sg>$ ^mirar<v><iv><pret><p1><sg>/look<vblex><pret><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^korason<n><m><sg>/heart<n><sg>$ ^de<pr>/of<pr>/from<pr>$ ^*la/*la$ ^estrellería<n><f><sg>/stars<n><sg>$^.<sent>/.<sent>$
I #look at the heart of the #stars
Sentence 4
Eyas no somportaría la dolor: They(fem) do not bear the pain
^Eyas<prn><pers><p3><f><pl><nom>/Prpers<prn><subj><p3><f><pl>$ ^no<adv>/not<adv>$ ^somportar<v><tv><cni><p1><sg>/bear<vblex><cni><p1><sg>$ ^*la/*la$ ^dolor<n><f><sg>/pain<n><sg>$^.<sent>/.<sent>$
They not #bear the pain
Sentence 5
Eyos kantan: They(masculine) sing
^Eyos<prn><pers><p3><m><pl><nom>/Prpers<prn><subj><p3><m><pl>$ ^kantar<v><iv><pres><p3><pl>/sing<vblex><pres><p3><pl>$^.<sent>/.<sent>$
They #sing
Sentence 6
Eya kantó: She sang
^Eya<prn><pers><p3><f><sg><nom>/Prpers<prn><subj><p3><f><sg>$ ^kantar<v><iv><pret><p3><sg>/sing<vblex><pret><p3><sg>$^.<sent>/.<sent>$
She #sing
Sentence 7
Yo bivire en Yisrael: I will live in Israel
^Yo<prn><pers><p1><sg><nom>/Prpers<prn><subj><p1><mf><sg>$ ^bivir<v><iv><fut><p1><sg>/live<vblex><fut><p1><sg>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^Yisrael<np>/Israel<np>$^.<sent>/.<sent>$
I #live at #Israel
Sentence 8
Nozotros komeriamos en la kavané: We would eat in the coffeehouse
^Nozotros<prn><pers><p1><m><pl><nom>/Prpers<prn><subj><p1><mf><pl>$ ^komer<v><tv><cni><p1><pl>/eat<vblex><cni><p1><pl>$ ^en<pr>/at<pr>/on<pr>/in<pr>$ ^el<det><def><f><sg>/the<det><def><sp>$ ^kavané<n><f><sg>/coffeehouse<n><sg>$^.<sent>/.<sent>$
We #eat at the #coffeehouse
Sentence 9
Tu biviras kuatro mezes: You will live four months
^Tu<prn><pers><p2><sg><nom>/Prpers<prn><subj><p2><mf><sg>$ ^bivir<v><iv><fut><p2><sg>/live<vblex><fut><p2><sg>$ ^kuatro<num>/four<num><pl>$ ^mes<n><m><pl>/month<n><pl>$^.<sent>/.<sent>$
You #live four months
Sentence 10
Eyos no komieron el limón: They did not eat the lemon
^Eyos<prn><pers><p3><m><pl><nom>/Prpers<prn><subj><p3><m><pl>$ ^no<adv>/not<adv>$ ^komer<v><tv><pret><p3><pl>/eat<vblex><pret><p3><pl>$ ^el<det><def><m><sg>/the<det><def><sp>$ ^limón<n><m><sg>/lemon<n><sg>$^.<sent>/.<sent>$
They not #eat the lemon.
Intitial Overall Analysis
The coverage of the monolingual transducer on the lad.sentences.txt file (which has more than the 10 sentences listed) is ~0.39583. The coverage of the bilingual transducer on the same file is ~0.33568. Further adaptation is done here.
Final Evaluation
Additions
I added four rules to verb morphology and added adjective inflection, I added four rules to the morphological disambiguator (differentiating the verb meanings from the preposition meanings for komo, de, and para, and differentiating the adjective meaning of querido from the verb and noun meanings) and I added two new transfer rules to the lad-eng.rtx file (making 'te dio' analyze as 'gave you' and making 'la Espanya' analyze as 'Spain').
Precision and Recall
Totals: 162 forms, 182 tp, 9 fp, 0 tn, 162 fn
Precision: 95.28796%
Recall: 52.90698%
Monolingual Transducer Coverage
Coverage over lad.corpus.large.txt: 289488 / 649936 (~0.44541000960094532385)
remaining unknown forms: 360448
649936 words in the corpus
393 stems, including 22 punctuation.
MT Coverage
Word error rate (WER) on lad.longer.txt: 86.55 %
Position-independent word error rate (PER) on lad.longer.txt: 76.82 %
Number of position-independent correct words: 150
Coverage over lad.longer.txt: 312/655, or ~0.47633587786259541985
Coverage over lad.corpus.large.txt : 254170/626118 ~0.40594584407412021376