Central Kurdish and English

From LING073
Jump to: navigation, search

Resources for machine translation between Sorani Kurdish and English.

External Resources

Developed Resources

ckb --> eng Evaluation

  • Coverage of monolingual transducer: 39.01%
  • Coverage of bilingual transducer: 17.45%

Sentence 1: پیاوەکە هات.

  • Intended translation: "The man came."
  • Lexical transfer: #man came
  • Full translation: ^پیاو<n><def><sg>/man<n><def><sg>$ ^هاتن<v><iv><past>/come<vblex><past>

Sentence 2: ئەوان سەگیان هێنا

  • Intended translation: They brought dogs.
  • Lexical transfer: #them #dog #them brought
  • Full translation: ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^سەگ<n>/dog<n>$ ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^هێنان<v><tv><past

Sentence 3: من نانم خوارد.

  • Intended translation: I ate bread.
  • Lexical transfer: I #bread I ate
  • Full Translation: ^من<prn><pers><p1><sg>/I<prn><subj><p1><mf><sg>/me<prn><obj><p1><mf><sg>$ ^نان<n>/bread<n>$ ^من<prn><pers><p1><sg>/I<prn><subj><p1><mf><sg>/me<prn><obj><p1><mf><sg>$ ^خواردن<v><tv><past>/eat<vblex><past>$

Sentence 4: گەورەترین سەگ هات.

  • Intended translation: The biggest dog came.
  • Lexical transfer: biggest dog came
  • Full Translation: ^گەورە<adj>/big<adj><sint>$ ^سەگ<n><sg>/dog<n><sg>$ ^هاتن<v><iv><past>/come<vblex><past>$

Sentence 5: ئێمە ناچین.

  • Intended translation: We are not going.
  • Lexical transfer: we #go
  • Full Translation: ^ئێمە<prn><pers><p1><pl>/we<prn><subj><p1><mf><pl>/us<prn><subj><p1><mf><pl>$ ^چوون<v><iv><npast><neg><p1><pl>/go<vblex><npast><neg><p1><pl>$

Sentence 6: ئەوان نەچوون.

  • Intended translation: They did not go.
  • Lexical transfer: #them #go
  • Full Translation: ^ئەوان<prn><pers><p3><pl>/them<prn><obj><p3><mf><pl>/they<prn><subj><p3><mf><pl>$ ^چوون<v><iv><past><neg><p2><pl>/go<vblex><past><neg><p2><pl>$

Sentence 7: مەچۆ.

  • Intended translation: Don't go.
  • Lexical transfer: #go
  • Full Translation: ^چوون<v><iv><imp><neg><p2><sg>/go<vblex><imp><neg><p2><sg>$

Sentence 8: فیلمەکان خۆش بوون.

  • Intended translation: The films were funny.
  • Lexical transfer: #film #funny were
  • Full Translation: ^فیلم<n><def><pl>/film<n><def><pl>$ ^خۆش<adj>/good<adj><sint>$ ^بوون<v><tv><past><p2><pl>/be<vblex><past><p2><pl>$

Sentence 9: نان بخۆ.

  • Intended translation: Eat bread.
  • Lexical transfer: bread #eat
  • Full Translation: ^نان<n><sg>/bread<n><sg>$ ^خواردن<v><tv><imp><p2><sg>/eat<vblex><imp><p2><sg>$

Sentence 10: ئاژەڵەکە مرد.

  • Intended translation: The animal died.
  • Lexical transfer: #animal died
  • Full Translation: ^ئاژەڵ<n><def><sg>/animal<n><def><sg>$ ^مردن<v><iv><past>/die<vblex><past>$


For the Polished RBMT lab, I:

  • Added 1000+ stems (mostly nouns) to the bilingual transducer by copying stems from the Kurdish Apertium transducer.
  • Added 5 patterns for grammar points such as the subjunctive mood and the present perfect tense. Also added intricacies to existing paradigms.
  • Added 3 disambiguation rules and modified some others.

The new (improved?) metrics for the monolingual transducer are:

  • Precision against the annotated corpus: 100.0%
  • Recall against the annotated corpus: 86.3%
  • Coverage over the large corpus: ~52.5%
  • Number of words in the large corpus: 48,562
  • Number of stems in the transducer: 1,529

The new metrics for the translation pair are:

  • WER over longer corpus: 94.4%
  • PER over longer corpus: 91.5%
  •  % of stems translated correctly in the longer corpus: 46
  • Trimmed coverage over longer corpora: 47.7%
  • Trimmed coverage over large corpora: 35.6%
  • Number of tokens in longer corpora: 384
  • Number of tokens in large corpora: 48,562