Difference between revisions of "Central Kurdish"

From LING073
Jump to: navigation, search
(Developed Resources)
 
(16 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
For the [https://wikis.swarthmore.edu/ling073/Initial_corpus_assembly corpus assembly] lab, a [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb-corpus repository]  of plain text files has been created using excerpts from some of the resources listed below.
 
For the [https://wikis.swarthmore.edu/ling073/Initial_corpus_assembly corpus assembly] lab, a [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb-corpus repository]  of plain text files has been created using excerpts from some of the resources listed below.
  
== Computational Resources ==
+
== External Resources ==
 +
 
 +
=== Computational Resources ===
 +
 
 +
* [https://www.branah.com/kurdish Keyboard Layout] (seems to be quite common)
  
 
* .txt lists of words, word-level digrams, and character-level trigrams
 
* .txt lists of words, word-level digrams, and character-level trigrams
 
* [https://www.branah.com/kurdish Keyboard Layout]: seems to be the most common layout
 
  
 
* Latin-based phonetic keyboard layout that I own on my computer
 
* Latin-based phonetic keyboard layout that I own on my computer
 
{{comment|: does this mean that it came with your OS or that it's a custom one you (or someone else?) created? -Jonathan}}
 
{{comment|: does this mean that it came with your OS or that it's a custom one you (or someone else?) created? -Jonathan}}
{{comment|: Someone else created it, I think Unikurd is its name, I cannot find a link to it but it is on my computer.}}
 
  
 
* [https://arxiv.org/pdf/1811.10278.pdf Rule-based Kurdish Text Transliteration System]: Latin-Arabic orthography conversion
 
* [https://arxiv.org/pdf/1811.10278.pdf Rule-based Kurdish Text Transliteration System]: Latin-Arabic orthography conversion
Line 21: Line 22:
 
* [https://arxiv.org/abs/2010.06041 Towards Machine Translation for the Kurdish Language]: Sorani machine translation model
 
* [https://arxiv.org/abs/2010.06041 Towards Machine Translation for the Kurdish Language]: Sorani machine translation model
  
== Dictionaries ==
+
=== Dictionaries ===
  
 
* [https://sites.fas.harvard.edu/~iranian/Sorani/sorani_3_vocabulary.pdf Sorani Vocabulary]: vocab list with Latin script transliterations, by Harvard
 
* [https://sites.fas.harvard.edu/~iranian/Sorani/sorani_3_vocabulary.pdf Sorani Vocabulary]: vocab list with Latin script transliterations, by Harvard
Line 31: Line 32:
 
* ''Diccionaire Fondamental Kurde-Français-Sorani'': French-Sorani dict with phrases & alphabet
 
* ''Diccionaire Fondamental Kurde-Français-Sorani'': French-Sorani dict with phrases & alphabet
  
== Grammatical Descriptions ==
+
=== Grammatical Descriptions ===
  
 
* [https://en.wikipedia.org/wiki/Sorani_grammar Sorani Grammar]: high-level description of important grammatical properties of Sorani
 
* [https://en.wikipedia.org/wiki/Sorani_grammar Sorani Grammar]: high-level description of important grammatical properties of Sorani
Line 39: Line 40:
 
* [https://sites.fas.harvard.edu/~iranian/Sorani/sorani_1_grammar.pdf A Reference Grammar with Selected Readings]: extensive descriptions of Sorani grammar
 
* [https://sites.fas.harvard.edu/~iranian/Sorani/sorani_1_grammar.pdf A Reference Grammar with Selected Readings]: extensive descriptions of Sorani grammar
  
== Scientific Works ==
+
=== Scientific Works ===
  
 
* [https://arxiv.org/pdf/1809.10763.pdf Building a Lemmatizer and a Spell-Checker for Sorani Kurdish]: includes background on Sorani morphology
 
* [https://arxiv.org/pdf/1809.10763.pdf Building a Lemmatizer and a Spell-Checker for Sorani Kurdish]: includes background on Sorani morphology
Line 49: Line 50:
 
* [https://www.researchgate.net/publication/261379031_Building_a_Test_Collection_for_Sorani_Kurdish Building a Test Collection for Sorani Kurdish]: outlines a Test Collection project + list of affixes
 
* [https://www.researchgate.net/publication/261379031_Building_a_Test_Collection_for_Sorani_Kurdish Building a Test Collection for Sorani Kurdish]: outlines a Test Collection project + list of affixes
  
== Corpora ==
+
=== Corpora ===
  
 
* [http://www.language-archives.org/language/ckb OLAC Resources] (how do I access these?)
 
* [http://www.language-archives.org/language/ckb OLAC Resources] (how do I access these?)
 
{{comment|: Try clicking the links on that page and then looking for "Identifier (URI)" - Daniel}}
 
{{comment|: Try clicking the links on that page and then looking for "Identifier (URI)" - Daniel}}
 
=== Books & Encyclopediae ===
 
  
 
* [https://ckb.wikipedia.org/wiki/%D8%AF%DB%95%D8%B3%D8%AA%D9%BE%DB%8E%DA%A9 Sorani Wikipedia]: many articles averaging a few paragraphs in length
 
* [https://ckb.wikipedia.org/wiki/%D8%AF%DB%95%D8%B3%D8%AA%D9%BE%DB%8E%DA%A9 Sorani Wikipedia]: many articles averaging a few paragraphs in length
Line 66: Line 65:
 
* ''شازاده چکۆله'': Sorani version of ''The Little Prince'', by Aso Abdullah
 
* ''شازاده چکۆله'': Sorani version of ''The Little Prince'', by Aso Abdullah
  
=== News Sites ===
+
* News sites: [https://www.awene.com/detail?article=44481 Awene], [https://www.knnc.net/Details.aspx?jimare=35292 KNN], [https://nrttv.com/News.aspx?id=40606&MapID=3 NRT], [https://www.rudaw.net/ Rudaw], [https://www.xendan.org/?__cf_chl_jschl_tk__=40b26372944f5b2f5030a42ce2e242824802bf7f-1613677479-0-AdIOUD4PpqZJRxo5UoABrk12U7vL-b-MImh9McpzkptoSp9XEfaSUWZ8CMCghXsnze0cZ_uNou_nz_-dlrtYGSZedfD70albLiukFs_f9mQpfQ4eZtm6GftmTY4oPlVII7xBMhGVVlZdd-y4EqXa0jcXmmvumK-YPYb-72mOZe6Qmfr-Jz_fHbzlznT-kWYV93t-VPAMTzKTp_HzaVf5G3b7dkBX2EpBV302Z28AIubfw293-cWCuVQUOn8cewTE01fknrRdM3p8-m32G7wy1YleNKq-6OR8CuMBYzJ2jxvBmniY9NYxlQ-AUlApfeHigQ Xendan]: aggregation of sites with ~300 words/article
 
 
This is an aggregation of news sites, all written in Sorani, that contain written media on the order of ~300 words per article, though long-form works of journalism can also be found.
 
 
 
* [https://www.awene.com/detail?article=44481 Awene]
 
 
 
* [https://www.knnc.net/Details.aspx?jimare=35292 KNN]
 
 
 
* [https://nrttv.com/News.aspx?id=40606&MapID=3 NRT]
 
 
 
* [https://www.rudaw.net/ Rudaw]
 
 
 
* [https://www.xendan.org/?__cf_chl_jschl_tk__=40b26372944f5b2f5030a42ce2e242824802bf7f-1613677479-0-AdIOUD4PpqZJRxo5UoABrk12U7vL-b-MImh9McpzkptoSp9XEfaSUWZ8CMCghXsnze0cZ_uNou_nz_-dlrtYGSZedfD70albLiukFs_f9mQpfQ4eZtm6GftmTY4oPlVII7xBMhGVVlZdd-y4EqXa0jcXmmvumK-YPYb-72mOZe6Qmfr-Jz_fHbzlznT-kWYV93t-VPAMTzKTp_HzaVf5G3b7dkBX2EpBV302Z28AIubfw293-cWCuVQUOn8cewTE01fknrRdM3p8-m32G7wy1YleNKq-6OR8CuMBYzJ2jxvBmniY9NYxlQ-AUlApfeHigQ Xendan]
 
  
=== Blog Pages ===
+
* [https://sites.google.com/site/abasshiwan2/ Abas Shiwan's blog] (with Latin script transliterations)
  
Long-form posts that exceed ~1000 words can be found here. Each link goes to a specific person's blog archive, containing anywhere between 10-100 articles.
+
* [http://ala-hooshiar.blogspot.com/ Ala Hooshiar's blog]
  
* [https://sites.google.com/site/abasshiwan2/ Abas Shiwan] (with Latin script transliterations)
+
* [http://amjad-shakely.blogspot.com/ Amjad Shakely's blog]
  
* [http://ala-hooshiar.blogspot.com/ Ala Hooshiar]
+
== Developed Resources ==
  
* [http://amjad-shakely.blogspot.com/ Amjad Shakely]
+
* [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb-keyboard Keyboard] (see [https://wikis.swarthmore.edu/ling073/Central_Kurdish/Keyboard Wiki page])
 +
* [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb Transducer] (see [https://wikis.swarthmore.edu/ling073/Central_Kurdish/Transducer Wiki page])
 +
* [https://github.swarthmore.edu/Ling073-sp21/ling073-ckb-eng Resources for Kurdish-English Machine Translation] (see [https://wikis.swarthmore.edu/ling073/Central_Kurdish_and_English Wiki page])

Latest revision as of 09:56, 13 April 2021


Below is a list of resources relevant to the Sorani Kurdish language. Resources are categorized according to type of content. I own italicized resources in PDF format. I have also flagged resources I have not yet obtained.

For the corpus assembly lab, a repository of plain text files has been created using excerpts from some of the resources listed below.

External Resources

Computational Resources

  • .txt lists of words, word-level digrams, and character-level trigrams
  • Latin-based phonetic keyboard layout that I own on my computer
does this mean that it came with your OS or that it's a custom one you (or someone else?) created? -Jonathan

Dictionaries

  • Dictionary of Scientific Terms: includes Sorani definitions of terms like "atom"
  • Diccionaire Fondamental Kurde-Français-Sorani: French-Sorani dict with phrases & alphabet

Grammatical Descriptions

  • Sorani Grammar: high-level description of important grammatical properties of Sorani

Scientific Works

Corpora

Try clicking the links on that page and then looking for "Identifier (URI)" - Daniel
  • JW Website: few dozen entries accompanied by voice narration
  • Hawler Gov: various entries on the Kurdish capital
  • شازاده چکۆله: Sorani version of The Little Prince, by Aso Abdullah

Developed Resources