Difference between revisions of "Tibetan"

From LING073
Jump to: navigation, search
(Orthography/Grammar Tools)
 
(69 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Tibetan Language Page
+
=Existing Tools=
 +
 
 +
In this section, we list existing dictionaries, translations and translating tools.  It seems that we will have lots of dictionaries to work with, and there are several useful-loping translation tools that will definitely help us in this course.
  
 
==Tibetan-English Dictionaries==
 
==Tibetan-English Dictionaries==
 +
*The [https://archive.org/details/essaytowardsadi01tshgoog first Tibetan–European language dictionary] was written in 1834 by Hungarian Sándor Kőrösi Csoma (1784–1842).<ref name="First Tibetan Dictionary" /> 
 +
*Linguist Heinrich August Jäschke formed a the Moravian mission in 1857 in Ladakh.  During this time, he wrote [https://archive.org/details/TibetanGrammarByH.A.Jschke Tibetan Grammar] and [https://archive.org/details/tibetanenglishdi00jsuoft A Tibetan–English Dictionary].<ref name="Tibetan Grammer" />  There's also a [https://play.google.com/books/reader?id=_RQTAAAAYAAJ&printsec=frontcover&output=reader&hl=en&pg=GBS.PR9 free eBook version] of this dictionary
 +
*There are a number of [http://www.mongols.eu/tibetan-language/tibetan-english-online-dictionary/ Tibetan online dictionaries]. 
 +
*There is also a [http://www.thlib.org/tools/#wiki=/access/wiki/site/c06fa8cf-c49c-4ebc-007f-482de5382105/tibetan%20translation%20tool.html Tibetan translation tool] and a [http://www.thlib.org/tools/wiki/Dictionaries%20Available.html downloadable dictionary].
 +
*Here is some more information about [https://www.omniglot.com/writing/tibetan.htm the Tibetan alphabet]
 +
*This Tripod search has a number of results for [https://catalog.tricolib.brynmawr.edu/find/Combined/Results?lookfor=tibetan+language&type=&limit=20&sort=relevance Tibetan dictionaries and usage references]
 +
 +
==Tibetan spellchecker==
 +
*A [http://www.columbia.edu/~ph2046/RnD/Hackett/Tibetan_Spellchecker.zip Microsoft Visual Basic macro spellchecker] for Tibetan language in Microsoft Word. GNU General Public License (GNU GPL)
 +
*Here's [https://github.com/tibetan-nlp/tibetan-spellchecker/tree/master/syllables another spellchecker] with [https://github.com/tibetan-nlp/tibetan-spellchecker/blob/master/doc/standard-syllable-structure.md documentation].  The bibliography for this project is [https://github.com/eroux/tibetan-spellchecker/blob/master/doc/bibliography.md here], which includes additional sources that we could use
 +
*A [https://github.com/eroux/hunspell-bo Classical Tibetan syllable spellchecker for Hunspell]. Creative Commons Zero v1.0 Universal
 +
 +
==Tibetan keyboard layout==
 +
*An [https://www.branah.com/tibetan online Tibetan keyboard]
 +
*Wikipedia page about [https://en.wikipedia.org/wiki/Tibetan_alphabet#Input_method_and_keyboard_layout Tibetan keyboard layout].  The [https://en.wikipedia.org/wiki/Tibetan_alphabet Tibetan alphabet] Wikipedia page is useful in general because it has information about the alphabet and the functions of different symbols
 +
*[http://www.lexilogos.com/keyboard/tibetan.htm Interactive Tibetan keyboard] and [https://www.tavultesoft.com/tibetan/ three more]
 +
 +
==Orthography/Grammar Tools==
 +
*This links to a [https://collab.its.virginia.edu/wiki/tibetan-script/THL%27s%20Online%20Tibetan%20Transliteration%20Converter.html Tibetan transliteration converter], which is located [http://www.thlib.org/reference/transliteration/wyconverter.php here].  This page lists [http://www.thlib.org/reference/transliteration/#!essay=/thl/ewts collaborators, versions, etc.].  There are also [https://collab.its.virginia.edu/wiki/tibetan-script/Wylie%20to%20Tibetan%20Machine%20Unicode.html instructions]
 +
*Link to a wiki page for [http://www.rigpawiki.org/index.php?title=Wylie wylie], which is a method for transliterating Tibetan text into Roman script.
 +
*Wiki page about [http://www.rigpawiki.org/index.php?title=Tibetan_Grammar_-_Formation_of_the_Tibetan_Syllable Tibetan grammar].  This has a lot of information about pronunciation, tone, prefixes, etc.  [http://www.rigpawiki.org/index.php?title=Rigpa_Phonetic_Guidelines This page] is similar.
 +
*The main Wikipedia site has a really useful [https://en.wikipedia.org/wiki/Modern_Standard_Tibetan_grammar page on Tibetan grammar].  All of the sources for this cite are also useful.  In particular:
 +
**Here is a free eBook version of [https://play.google.com/books/reader?id=KAmi4M8_-9oC&printsec=frontcover&output=reader&hl=en&pg=GBS.PP1 Hand-book of Colloquial Tibetan: A Practical Guide to the Language of Central Tibet]
 +
**The same author also wrote [https://play.google.com/books/reader?id=h65FAAAAcAAJ&printsec=frontcover&output=reader&hl=en&pg=GBS.PA1 A short practical grammar of the Tibetan language, with special reference to the spoken dialects]
 +
**[https://www.amazon.com/Introduction-Classical-Tibetan-Stephen-Hodge/dp/9745240397 An Introduction to Classical Tibetan]
 +
*Tibetan IPA article (page xix) with [http://stedt.berkeley.edu/pubs_and_prods/STEDT_Monograph3_Phonological-Inv-TB.pdf IPA chart]
 +
*Resource for [http://pratyeka.org/tibetan/Tibetan_language.pdf learning Tibetan with examples]
 +
*A more modern [https://catalog.tricolib.brynmawr.edu/find/Record/.b4901628 Tibetan grammar book]
 +
 +
=Translations and Texts=
 +
*This [http://www.gsungrab.org/en/home.php Bible translation] may be useful.  Wikipedia also has a page devoted to [https://en.wikipedia.org/wiki/Bible_translations_into_Tibetan Tibetan bible translation]. 
 +
*Furthermore, there are [https://www.penguin.co.uk/books/35600/the-tibetan-book-of-the-dead/ full English translations] of Bardo Thödol, the Tibetan book of the dead.  I'm having trouble finding a Tibetan version, but I'm sure it exists somewhere on the Internet.
 +
*[http://www.tibetanlanguage.org/bookstore/EngTib_Texts.html This website] has Tibetan and English versions of various books.  We will try to locate some of these sources.
 +
*Apparently [https://blogs.wsj.com/chinarealtime/2011/07/21/harry-potter-podder-and-the-tibetan-translator/ Harry Potter and the Sorcerer's Stone] was also translated into Tibetan.  I'm having trouble finding the actual text however
 +
*This [http://www.lotsawahouse.org/bo/free-translations-tibetan-buddhist-texts website which has Tibetan/English text translations] may be the most useful one we've found so far.  Each line has Tibetan and the corresponding English translation below it.  There's enough text here to sustain us for quite a while.
 +
*This page links to dozens of [http://www.tsadra.org/tools.html#tibetancomputerreference references, dictionaries, texts and translations]
 +
 +
=Academic Papers=
 +
 +
There are a number of academic papers relating to this subject.  In this section, we list a few of them.  There seem to be quite a lot of papers on the topic of Tibetan machine translation and speech recognition.  This is by no means a complete summary.
 +
 +
==Tibetan Speech Recognition==
 +
Research papers on speech recognition for Tibetan
  
The [https://archive.org/details/essaytowardsadi01tshgoog first Tibetan–European language dictionary] was written in 1834 by Hungarian Sándor Kőrösi Csoma (1784–1842).<ref name="First Tibetan Dictionary" />  Linguist Heinrich August Jäschke formed a the Moravian mission in 1857 in Ladakh.  During this time, he wrote [https://archive.org/details/TibetanGrammarByH.A.Jschke Tibetan Grammar] and [https://archive.org/details/tibetanenglishdi00jsuoft A Tibetan–English Dictionary].<ref name="Tibetan Grammer" />  There are a number of [http://www.mongols.eu/tibetan-language/tibetan-english-online-dictionary/ Tibetan online dictionaries].
+
*[http://ieeexplore.ieee.org/document/5577887/ Tibetan Language Speech Recognition Model Based on Active Learning and Semi-Supervised Learning]
 +
*[http://journals.sagepub.com/doi/full/10.5772/54000 Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction]
  
==Tibetan-English Translations==
+
==Tibetan Machine Translation==
This [http://www.gsungrab.org/en/home.php Bible translation] may be useful.  Wikipedia also has a page devoted to [https://en.wikipedia.org/wiki/Bible_translations_into_Tibetan Tibetan bible translation].  Furthermore, there are [https://www.penguin.co.uk/books/35600/the-tibetan-book-of-the-dead/ full English translations] of Bardo Thödol, the Tibetan book of the dead. I'm having trouble finding a Tibetan version, but I'm sure it exists somewhere on the Internet.
+
*[https://academiccommons.columbia.edu/catalog/ac:133018 Automatic Segmentation and Part-Of-Speech Tagging For Tibetan: A First Step Towards Machine Translation] and [https://academiccommons.columbia.edu/download/fedora_content/download/ac:133019/content/IATS-IX_Hackett_paper.pdf download link]
 +
*[https://link.springer.com/chapter/10.1007/978-3-319-11104-9_59 A Method for the Chinese-Tibetan Machine Translation System’s Syntactic Analysis].  There is an [https://books.google.com/books?id=le0XBgAAQBAJ&pg=PA508&lpg=PA508&dq=tibetan+speech+machine+translation&source=bl&ots=nsqYmGfl-Z&sig=mNu395OEhBsDSlrZFiAXJsjQF3w&hl=en&sa=X&ved=0ahUKEwjfofqv9PvYAhUHvVMKHYmDAX4Q6AEISzAI#v=onepage&q=tibetan%20speech%20machine%20translation&f=false eBook] version also
 +
*[https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=10&ved=0ahUKEwjfofqv9PvYAhUHvVMKHYmDAX4QFghNMAk&url=http%3A%2F%2Fdownload.atlantis-press.com%2Fphp%2Fdownload_paper.php%3Fid%3D25876023&usg=AOvVaw3FZ4WQ_NiZ9GQtcAA8wCIU Analysis of Tibetan-language Speech Technology]
 +
*[http://ieeexplore.ieee.org/document/5512462/ A Research on Text analysis in Tibetan speech synthesis]
  
 +
=Developed Resources=
 +
This section includes the tools I have developed as a part of being in LING073.  Because three standard keyboards already exist (ewts, tcrc and wylie) for Linux, I chose to take this task further by developing a transcription keyboard.  My research has indicated that the Wylie keyboard is the most standard, and thus this is the keyboard I will use for the remainder of this class.
  
==Tibetan spellchecker==
+
==Transcription Keyboard==
A [http://www.columbia.edu/~ph2046/RnD/Hackett/Tibetan_Spellchecker.zip Microsoft Visual Basic macro spellchecker] for Tibetan language in Microsoft Word. GNU General Public License (GNU GPL)
+
Because a keyboard for Tibetan already exists in IBus, I created a transcription keyboard.  This will allow users to type pronunciations of Tibetan words as they might appear in Tibetan dictionaries.  The [https://github.swarthmore.edu/arobey1/ling073-tib-keyboard GitHub repository] I created contains the standard Wylie Tibetan keyboard (bo-wylie.mim) and the transcription keyboard (bo-transcription.mim) I created.  I also holds AUTHORS, LICENSE and INSTALL files.  I also created a [https://wikis.swarthmore.edu/ling073/Tibetan/Keyboard wiki page] describing how I implemented this keyboard.
  
A [https://github.com/eroux/hunspell-bo Classical Tibetan syllable spellchecker for Hunspell]. Creative Commons Zero v1.0 Universal
+
==Corpus==
 +
I have assembled a corpus of texts in Tibetan.  The full corpus and the scripts I used to parse the data can be found in [https://github.swarthmore.edu/arobey1/ling073-tib-corpus this Github repository].  This includes the entirety of the Tibetan Wikipedia, several books of the bible with corresponding English translations underneath each line, several biographies and 600 webpages from the homepage of the Dalai Lama.  The tools that I have developed to scrape data are fairly general - they can be extended to a number of different kinds of sites.  In particular, the Parser() class in the parse.py file is very useful for formatting glosses.
  
 +
==Orthography Documentation==
 +
* Similar to [https://en.wikipedia.org/wiki/Modern_Standard_Tibetan_grammar these examples].  Put spaces between words:
  
==Tibetan keyboard layout==
+
    Skin of sheep
An [https://www.branah.com/tibetan online Tibetan keyboard]
+
    ལུག་གི་པགས་པ
 +
    <lug-gi pags-pa>
  
 +
    Husband <-> wife
 +
    Khyo-po <-> Khyo-mo
  
==Tibetan speech recognition==
+
    Boy <-> girl
Research papers on speech recognition for Tibetan
+
    Pu-tsa <-> Pu-mo
  
[http://ieeexplore.ieee.org/document/5577887/ Tibetan Language Speech Recognition Model Based on Active Learning and Semi-Supervised Learning]
 
  
[http://journals.sagepub.com/doi/full/10.5772/54000 Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction]
+
* See info on little dot [https://en.wikipedia.org/wiki/Tibetan_alphabet#Basic_alphabet tsek] character - seems to separate syllables, but there is no concept of words.  Overall, just look over the [https://en.wikipedia.org/wiki/Tibetan_alphabet Tibetan alphabet Wikipedia page]
 +
* Also see grammar resources above
  
==References==
+
=References=
 
<references>
 
<references>
 
<ref name="First Tibetan Dictionary">[https://archive.org/details/essaytowardsadi01tshgoog Tibetan-European Language Dictionary].  [https://archive.org/web/ WayBack Machine: Internet Archive].  Retrieved January 25, 2018>
 
<ref name="First Tibetan Dictionary">[https://archive.org/details/essaytowardsadi01tshgoog Tibetan-European Language Dictionary].  [https://archive.org/web/ WayBack Machine: Internet Archive].  Retrieved January 25, 2018>
Line 34: Line 91:
  
 
[[Category:Tibetan]]
 
[[Category:Tibetan]]
 +
[[Category:sp18_ResourceDocumentation]]

Latest revision as of 11:49, 22 February 2018

Existing Tools

In this section, we list existing dictionaries, translations and translating tools. It seems that we will have lots of dictionaries to work with, and there are several useful-loping translation tools that will definitely help us in this course.

Tibetan-English Dictionaries

Tibetan spellchecker

Tibetan keyboard layout

Orthography/Grammar Tools

Translations and Texts

Academic Papers

There are a number of academic papers relating to this subject. In this section, we list a few of them. There seem to be quite a lot of papers on the topic of Tibetan machine translation and speech recognition. This is by no means a complete summary.

Tibetan Speech Recognition

Research papers on speech recognition for Tibetan

Tibetan Machine Translation

Developed Resources

This section includes the tools I have developed as a part of being in LING073. Because three standard keyboards already exist (ewts, tcrc and wylie) for Linux, I chose to take this task further by developing a transcription keyboard. My research has indicated that the Wylie keyboard is the most standard, and thus this is the keyboard I will use for the remainder of this class.

Transcription Keyboard

Because a keyboard for Tibetan already exists in IBus, I created a transcription keyboard. This will allow users to type pronunciations of Tibetan words as they might appear in Tibetan dictionaries. The GitHub repository I created contains the standard Wylie Tibetan keyboard (bo-wylie.mim) and the transcription keyboard (bo-transcription.mim) I created. I also holds AUTHORS, LICENSE and INSTALL files. I also created a wiki page describing how I implemented this keyboard.

Corpus

I have assembled a corpus of texts in Tibetan. The full corpus and the scripts I used to parse the data can be found in this Github repository. This includes the entirety of the Tibetan Wikipedia, several books of the bible with corresponding English translations underneath each line, several biographies and 600 webpages from the homepage of the Dalai Lama. The tools that I have developed to scrape data are fairly general - they can be extended to a number of different kinds of sites. In particular, the Parser() class in the parse.py file is very useful for formatting glosses.

Orthography Documentation

   Skin of sheep
   ལུག་གི་པགས་པ
   <lug-gi pags-pa> 
   Husband <-> wife
   Khyo-po <-> Khyo-mo
   Boy <-> girl
   Pu-tsa <-> Pu-mo


  • See info on little dot tsek character - seems to separate syllables, but there is no concept of words. Overall, just look over the Tibetan alphabet Wikipedia page
  • Also see grammar resources above

References

  1. Cite error: Invalid <ref> tag; no text was provided for refs named First_Tibetan_Dictionary
  2. Cite error: Invalid <ref> tag; no text was provided for refs named Tibetan_Grammer