Difference between revisions of "Central Kurdish and English"
From LING073
(Created page with "Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Central_Kurdish Sorani Kurdish] and [https://wikis.swarthmore.edu/ling073/English English]. ...") |
|||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Central_Kurdish Sorani Kurdish] and [https://wikis.swarthmore.edu/ling073/English English]. | Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Central_Kurdish Sorani Kurdish] and [https://wikis.swarthmore.edu/ling073/English English]. | ||
− | [[Category:Sp21_TranslationPairs]] [[Category: English]] [[Category: Kurdish]] | + | == External Resources == |
+ | |||
+ | * [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb-eng Translation Pair] | ||
+ | * [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb/blob/master/apertium-ckb.ckb.lexd Sorani Transducer] | ||
+ | * [https://github.com/apertium/apertium-eng/blob/master/apertium-eng.eng.dix English Transducer] | ||
+ | |||
+ | == Developed Resources == | ||
+ | |||
+ | * [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb-eng-corpus Parallel Corpus] | ||
+ | * [https://wikis.swarthmore.edu/ling073/Central_Kurdish_and_English/Contrastive_Grammar Contrastive Grammar] | ||
+ | |||
+ | == ckb --> eng Evaluation == | ||
+ | |||
+ | * Coverage of monolingual transducer: 39.01% | ||
+ | * Coverage of bilingual transducer: 17.45% | ||
+ | |||
+ | Sentence 1: پیاوەکە هات. | ||
+ | * Intended translation: "The man came." | ||
+ | * Lexical transfer: #man came | ||
+ | * Full translation: ^پیاو<n><def><sg>/man<n><def><sg>$ ^هاتن<v><iv><past>/come<vblex><past> | ||
+ | |||
+ | Sentence 2: ئەوان سەگیان هێنا | ||
+ | * Intended translation: They brought dogs. | ||
+ | * Lexical transfer: #them #dog #them brought | ||
+ | * Full translation: ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^سەگ<n>/dog<n>$ ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^هێنان<v><tv><past | ||
+ | |||
+ | Sentence 3: من نانم خوارد. | ||
+ | * Intended translation: I ate bread. | ||
+ | * Lexical transfer: I #bread I ate | ||
+ | * Full Translation: ^من<prn><pers><p1><sg>/I<prn><subj><p1><mf><sg>/me<prn><obj><p1><mf><sg>$ ^نان<n>/bread<n>$ ^من<prn><pers><p1><sg>/I<prn><subj><p1><mf><sg>/me<prn><obj><p1><mf><sg>$ ^خواردن<v><tv><past>/eat<vblex><past>$ | ||
+ | |||
+ | Sentence 4: گەورەترین سەگ هات. | ||
+ | * Intended translation: The biggest dog came. | ||
+ | * Lexical transfer: biggest dog came | ||
+ | * Full Translation: ^گەورە<adj><sup>/big<adj><sint><sup>$ ^سەگ<n><sg>/dog<n><sg>$ ^هاتن<v><iv><past>/come<vblex><past>$ | ||
+ | |||
+ | Sentence 5: ئێمە ناچین. | ||
+ | * Intended translation: We are not going. | ||
+ | * Lexical transfer: we #go | ||
+ | * Full Translation: ^ئێمە<prn><pers><p1><pl>/we<prn><subj><p1><mf><pl>/us<prn><subj><p1><mf><pl>$ ^چوون<v><iv><npast><neg><p1><pl>/go<vblex><npast><neg><p1><pl>$ | ||
+ | |||
+ | Sentence 6: ئەوان نەچوون. | ||
+ | * Intended translation: They did not go. | ||
+ | * Lexical transfer: #them #go | ||
+ | * Full Translation: ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^چوون<v><iv><past><neg><p2><pl>/go<vblex><past><neg><p2><pl>$ | ||
+ | |||
+ | Sentence 7: مەچۆ. | ||
+ | * Intended translation: Don't go. | ||
+ | * Lexical transfer: #go | ||
+ | * Full Translation: ^چوون<v><iv><imp><neg><p2><sg>/go<vblex><imp><neg><p2><sg>$ | ||
+ | |||
+ | Sentence 8: فیلمەکان خۆش بوون. | ||
+ | * Intended translation: The films were funny. | ||
+ | * Lexical transfer: #film #funny were | ||
+ | * Full Translation: ^فیلم<n><def><pl>/film<n><def><pl>$ ^خۆش<adj>/good<adj><sint>$ ^بوون<v><tv><past><p2><pl>/be<vblex><past><p2><pl>$ | ||
+ | |||
+ | Sentence 9: نان بخۆ. | ||
+ | * Intended translation: Eat bread. | ||
+ | * Lexical transfer: bread #eat | ||
+ | * Full Translation: ^نان<n><sg>/bread<n><sg>$ ^خواردن<v><tv><imp><p2><sg>/eat<vblex><imp><p2><sg>$ | ||
+ | |||
+ | Sentence 10: ئاژەڵەکە مرد. | ||
+ | * Intended translation: The animal died. | ||
+ | * Lexical transfer: #animal died | ||
+ | * Full Translation: ^ئاژەڵ<n><def><sg>/animal<n><def><sg>$ ^مردن<v><iv><past>/die<vblex><past>$ | ||
+ | |||
+ | == Additions == | ||
+ | |||
+ | For the Polished RBMT lab, I: | ||
+ | |||
+ | * Added 1000+ stems (mostly nouns) to the bilingual transducer by copying stems from the Kurdish Apertium transducer. | ||
+ | * Added 5 patterns for grammar points such as the subjunctive mood and the present perfect tense. Also added intricacies to existing paradigms. | ||
+ | * Added 3 disambiguation rules and modified some others. | ||
+ | |||
+ | The new (improved?) metrics for the monolingual transducer are: | ||
+ | |||
+ | * Precision against the annotated corpus: 100.0% | ||
+ | * Recall against the annotated corpus: 86.3% | ||
+ | * Coverage over the large corpus: ~52.5% | ||
+ | * Number of words in the large corpus: 48,562 | ||
+ | * Number of stems in the transducer: 1,529 | ||
+ | |||
+ | The new metrics for the translation pair are: | ||
+ | |||
+ | * WER over longer corpus: 94.4% | ||
+ | * PER over longer corpus: 91.5% | ||
+ | * % of stems translated correctly in the longer corpus: 46 | ||
+ | * Trimmed coverage over longer corpora: 47.7% | ||
+ | * Trimmed coverage over large corpora: 35.6% | ||
+ | * Number of tokens in longer corpora: 384 | ||
+ | * Number of tokens in large corpora: 48,562 | ||
+ | |||
+ | [[Category:Sp21_TranslationPairs]] [[Category: English]] [[Category: Central Kurdish]] |
Latest revision as of 21:19, 15 May 2021
Resources for machine translation between Sorani Kurdish and English.
External Resources
Developed Resources
ckb --> eng Evaluation
- Coverage of monolingual transducer: 39.01%
- Coverage of bilingual transducer: 17.45%
Sentence 1: پیاوەکە هات.
- Intended translation: "The man came."
- Lexical transfer: #man came
- Full translation: ^پیاو<n><def><sg>/man<n><def><sg>$ ^هاتن<v><iv><past>/come<vblex><past>
Sentence 2: ئەوان سەگیان هێنا
- Intended translation: They brought dogs.
- Lexical transfer: #them #dog #them brought
- Full translation: ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^سەگ<n>/dog<n>$ ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^هێنان<v><tv><past
Sentence 3: من نانم خوارد.
- Intended translation: I ate bread.
- Lexical transfer: I #bread I ate
- Full Translation: ^من<prn><pers><p1><sg>/I<prn><subj><p1><mf><sg>/me<prn><obj><p1><mf><sg>$ ^نان<n>/bread<n>$ ^من<prn><pers><p1><sg>/I<prn><subj><p1><mf><sg>/me<prn><obj><p1><mf><sg>$ ^خواردن<v><tv><past>/eat<vblex><past>$
Sentence 4: گەورەترین سەگ هات.
- Intended translation: The biggest dog came.
- Lexical transfer: biggest dog came
- Full Translation: ^گەورە<adj>/big<adj><sint>$ ^سەگ<n><sg>/dog<n><sg>$ ^هاتن<v><iv><past>/come<vblex><past>$
Sentence 5: ئێمە ناچین.
- Intended translation: We are not going.
- Lexical transfer: we #go
- Full Translation: ^ئێمە<prn><pers><p1><pl>/we<prn><subj><p1><mf><pl>/us<prn><subj><p1><mf><pl>$ ^چوون<v><iv><npast><neg><p1><pl>/go<vblex><npast><neg><p1><pl>$
Sentence 6: ئەوان نەچوون.
- Intended translation: They did not go.
- Lexical transfer: #them #go
- Full Translation: ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^چوون<v><iv><past><neg><p2><pl>/go<vblex><past><neg><p2><pl>$
Sentence 7: مەچۆ.
- Intended translation: Don't go.
- Lexical transfer: #go
- Full Translation: ^چوون<v><iv><imp><neg><p2><sg>/go<vblex><imp><neg><p2><sg>$
Sentence 8: فیلمەکان خۆش بوون.
- Intended translation: The films were funny.
- Lexical transfer: #film #funny were
- Full Translation: ^فیلم<n><def><pl>/film<n><def><pl>$ ^خۆش<adj>/good<adj><sint>$ ^بوون<v><tv><past><p2><pl>/be<vblex><past><p2><pl>$
Sentence 9: نان بخۆ.
- Intended translation: Eat bread.
- Lexical transfer: bread #eat
- Full Translation: ^نان<n><sg>/bread<n><sg>$ ^خواردن<v><tv><imp><p2><sg>/eat<vblex><imp><p2><sg>$
Sentence 10: ئاژەڵەکە مرد.
- Intended translation: The animal died.
- Lexical transfer: #animal died
- Full Translation: ^ئاژەڵ<n><def><sg>/animal<n><def><sg>$ ^مردن<v><iv><past>/die<vblex><past>$
Additions
For the Polished RBMT lab, I:
- Added 1000+ stems (mostly nouns) to the bilingual transducer by copying stems from the Kurdish Apertium transducer.
- Added 5 patterns for grammar points such as the subjunctive mood and the present perfect tense. Also added intricacies to existing paradigms.
- Added 3 disambiguation rules and modified some others.
The new (improved?) metrics for the monolingual transducer are:
- Precision against the annotated corpus: 100.0%
- Recall against the annotated corpus: 86.3%
- Coverage over the large corpus: ~52.5%
- Number of words in the large corpus: 48,562
- Number of stems in the transducer: 1,529
The new metrics for the translation pair are:
- WER over longer corpus: 94.4%
- PER over longer corpus: 91.5%
- % of stems translated correctly in the longer corpus: 46
- Trimmed coverage over longer corpora: 47.7%
- Trimmed coverage over large corpora: 35.6%
- Number of tokens in longer corpora: 384
- Number of tokens in large corpora: 48,562