Difference between revisions of "Central Kurdish and English"

From LING073
Jump to: navigation, search
(Created page with "Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Central_Kurdish Sorani Kurdish] and [https://wikis.swarthmore.edu/ling073/English English]. ...")
 
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Central_Kurdish Sorani Kurdish] and [https://wikis.swarthmore.edu/ling073/English English].
 
Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Central_Kurdish Sorani Kurdish] and [https://wikis.swarthmore.edu/ling073/English English].
  
[[Category:Sp21_TranslationPairs]] [[Category: English]] [[Category: Kurdish]]
+
== External Resources ==
 +
 
 +
* [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb-eng Translation Pair]
 +
* [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb/blob/master/apertium-ckb.ckb.lexd Sorani Transducer]
 +
* [https://github.com/apertium/apertium-eng/blob/master/apertium-eng.eng.dix English Transducer]
 +
 
 +
== Developed Resources ==
 +
 
 +
* [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb-eng-corpus Parallel Corpus]
 +
* [https://wikis.swarthmore.edu/ling073/Central_Kurdish_and_English/Contrastive_Grammar Contrastive Grammar]
 +
 
 +
== ckb --> eng Evaluation ==
 +
 
 +
* Coverage of monolingual transducer: 39.01%
 +
* Coverage of bilingual transducer: 17.45%
 +
 
 +
Sentence 1: پیاوەکە هات.
 +
* Intended translation: "The man came."
 +
* Lexical transfer: #man came
 +
* Full translation: ^پیاو<n><def><sg>/man<n><def><sg>$ ^هاتن<v><iv><past>/come<vblex><past>
 +
 
 +
Sentence 2: ئەوان سەگیان هێنا
 +
* Intended translation: They brought dogs.
 +
* Lexical transfer: #them #dog #them brought
 +
* Full translation: ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^سەگ<n>/dog<n>$ ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^هێنان<v><tv><past
 +
 
 +
Sentence 3: من نانم خوارد.
 +
* Intended translation: I ate bread.
 +
* Lexical transfer: I #bread I ate
 +
* Full Translation: ^من<prn><pers><p1><sg>/I<prn><subj><p1><mf><sg>/me<prn><obj><p1><mf><sg>$ ^نان<n>/bread<n>$ ^من<prn><pers><p1><sg>/I<prn><subj><p1><mf><sg>/me<prn><obj><p1><mf><sg>$ ^خواردن<v><tv><past>/eat<vblex><past>$
 +
 
 +
Sentence 4: گەورەترین سەگ هات.
 +
* Intended translation: The biggest dog came.
 +
* Lexical transfer: biggest dog came
 +
* Full Translation: ^گەورە<adj><sup>/big<adj><sint><sup>$ ^سەگ<n><sg>/dog<n><sg>$ ^هاتن<v><iv><past>/come<vblex><past>$
 +
 
 +
Sentence 5: ئێمە ناچین.
 +
* Intended translation: We are not going.
 +
* Lexical transfer: we #go
 +
* Full Translation: ^ئێمە<prn><pers><p1><pl>/we<prn><subj><p1><mf><pl>/us<prn><subj><p1><mf><pl>$ ^چوون<v><iv><npast><neg><p1><pl>/go<vblex><npast><neg><p1><pl>$
 +
 
 +
Sentence 6: ئەوان نەچوون.
 +
* Intended translation: They did not go.
 +
* Lexical transfer: #them #go
 +
* Full Translation: ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^چوون<v><iv><past><neg><p2><pl>/go<vblex><past><neg><p2><pl>$
 +
 
 +
Sentence 7: مەچۆ.
 +
* Intended translation: Don't go.
 +
* Lexical transfer: #go
 +
* Full Translation: ^چوون<v><iv><imp><neg><p2><sg>/go<vblex><imp><neg><p2><sg>$
 +
 
 +
Sentence 8: فیلمەکان خۆش بوون.
 +
* Intended translation: The films were funny.
 +
* Lexical transfer: #film #funny were
 +
* Full Translation: ^فیلم<n><def><pl>/film<n><def><pl>$ ^خۆش<adj>/good<adj><sint>$ ^بوون<v><tv><past><p2><pl>/be<vblex><past><p2><pl>$
 +
 
 +
Sentence 9: نان بخۆ.
 +
* Intended translation: Eat bread.
 +
* Lexical transfer: bread #eat
 +
* Full Translation: ^نان<n><sg>/bread<n><sg>$ ^خواردن<v><tv><imp><p2><sg>/eat<vblex><imp><p2><sg>$
 +
 
 +
Sentence 10: ئاژەڵەکە مرد.
 +
* Intended translation: The animal died.
 +
* Lexical transfer: #animal died
 +
* Full Translation: ^ئاژەڵ<n><def><sg>/animal<n><def><sg>$ ^مردن<v><iv><past>/die<vblex><past>$
 +
 
 +
== Additions ==
 +
 
 +
For the Polished RBMT lab, I:
 +
 
 +
* Added 1000+ stems (mostly nouns) to the bilingual transducer by copying stems from the Kurdish Apertium transducer.
 +
* Added 5 patterns for grammar points such as the subjunctive mood and the present perfect tense. Also added intricacies to existing paradigms.
 +
* Added 3 disambiguation rules and modified some others.
 +
 
 +
The new (improved?) metrics for the monolingual transducer are:
 +
 
 +
* Precision against the annotated corpus: 100.0%
 +
* Recall against the annotated corpus: 86.3%
 +
* Coverage over the large corpus: ~52.5%
 +
* Number of words in the large corpus: 48,562
 +
* Number of stems in the transducer: 1,529
 +
 
 +
The new metrics for the translation pair are:
 +
 
 +
* WER over longer corpus: 94.4%
 +
* PER over longer corpus: 91.5%
 +
* % of stems translated correctly in the longer corpus: 46
 +
* Trimmed coverage over longer corpora: 47.7%
 +
* Trimmed coverage over large corpora: 35.6%
 +
* Number of tokens in longer corpora: 384
 +
* Number of tokens in large corpora: 48,562
 +
 
 +
[[Category:Sp21_TranslationPairs]] [[Category: English]] [[Category: Central Kurdish]]

Latest revision as of 20:19, 15 May 2021

Resources for machine translation between Sorani Kurdish and English.

External Resources

Developed Resources

ckb --> eng Evaluation

  • Coverage of monolingual transducer: 39.01%
  • Coverage of bilingual transducer: 17.45%

Sentence 1: پیاوەکە هات.

  • Intended translation: "The man came."
  • Lexical transfer: #man came
  • Full translation: ^پیاو<n><def><sg>/man<n><def><sg>$ ^هاتن<v><iv><past>/come<vblex><past>

Sentence 2: ئەوان سەگیان هێنا

  • Intended translation: They brought dogs.
  • Lexical transfer: #them #dog #them brought
  • Full translation: ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^سەگ<n>/dog<n>$ ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^هێنان<v><tv><past

Sentence 3: من نانم خوارد.

  • Intended translation: I ate bread.
  • Lexical transfer: I #bread I ate
  • Full Translation: ^من<prn><pers><p1><sg>/I<prn><subj><p1><mf><sg>/me<prn><obj><p1><mf><sg>$ ^نان<n>/bread<n>$ ^من<prn><pers><p1><sg>/I<prn><subj><p1><mf><sg>/me<prn><obj><p1><mf><sg>$ ^خواردن<v><tv><past>/eat<vblex><past>$

Sentence 4: گەورەترین سەگ هات.

  • Intended translation: The biggest dog came.
  • Lexical transfer: biggest dog came
  • Full Translation: ^گەورە<adj>/big<adj><sint>$ ^سەگ<n><sg>/dog<n><sg>$ ^هاتن<v><iv><past>/come<vblex><past>$

Sentence 5: ئێمە ناچین.

  • Intended translation: We are not going.
  • Lexical transfer: we #go
  • Full Translation: ^ئێمە<prn><pers><p1><pl>/we<prn><subj><p1><mf><pl>/us<prn><subj><p1><mf><pl>$ ^چوون<v><iv><npast><neg><p1><pl>/go<vblex><npast><neg><p1><pl>$

Sentence 6: ئەوان نەچوون.

  • Intended translation: They did not go.
  • Lexical transfer: #them #go
  • Full Translation: ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^چوون<v><iv><past><neg><p2><pl>/go<vblex><past><neg><p2><pl>$

Sentence 7: مەچۆ.

  • Intended translation: Don't go.
  • Lexical transfer: #go
  • Full Translation: ^چوون<v><iv><imp><neg><p2><sg>/go<vblex><imp><neg><p2><sg>$

Sentence 8: فیلمەکان خۆش بوون.

  • Intended translation: The films were funny.
  • Lexical transfer: #film #funny were
  • Full Translation: ^فیلم<n><def><pl>/film<n><def><pl>$ ^خۆش<adj>/good<adj><sint>$ ^بوون<v><tv><past><p2><pl>/be<vblex><past><p2><pl>$

Sentence 9: نان بخۆ.

  • Intended translation: Eat bread.
  • Lexical transfer: bread #eat
  • Full Translation: ^نان<n><sg>/bread<n><sg>$ ^خواردن<v><tv><imp><p2><sg>/eat<vblex><imp><p2><sg>$

Sentence 10: ئاژەڵەکە مرد.

  • Intended translation: The animal died.
  • Lexical transfer: #animal died
  • Full Translation: ^ئاژەڵ<n><def><sg>/animal<n><def><sg>$ ^مردن<v><iv><past>/die<vblex><past>$

Additions

For the Polished RBMT lab, I:

  • Added 1000+ stems (mostly nouns) to the bilingual transducer by copying stems from the Kurdish Apertium transducer.
  • Added 5 patterns for grammar points such as the subjunctive mood and the present perfect tense. Also added intricacies to existing paradigms.
  • Added 3 disambiguation rules and modified some others.

The new (improved?) metrics for the monolingual transducer are:

  • Precision against the annotated corpus: 100.0%
  • Recall against the annotated corpus: 86.3%
  • Coverage over the large corpus: ~52.5%
  • Number of words in the large corpus: 48,562
  • Number of stems in the transducer: 1,529

The new metrics for the translation pair are:

  • WER over longer corpus: 94.4%
  • PER over longer corpus: 91.5%
  •  % of stems translated correctly in the longer corpus: 46
  • Trimmed coverage over longer corpora: 47.7%
  • Trimmed coverage over large corpora: 35.6%
  • Number of tokens in longer corpora: 384
  • Number of tokens in large corpora: 48,562