This page lists Harrison Saunders’s top three language interests.

I would like to work with Daniel Khazanov and/or someone with computing experience.

1. Manx

Already has Apertium resources, so technically off limits for this semester, but I'm open to discussing whether an exception might be possible. -JNW

I am interested in Manx for two reasons: first of all, it is a small language with few resources online, and secondly, it belongs to the Celtic language family, in which I have particular interest. I have some experience with the language Breton, which is in the same family, although in a different branch—the morphology of Manx, however, is a little more complex (by which I mean less analytic) than Breton, and would be interesting to investigate.

ISO 639-3: glv

Language Background

Manx is a revived language. The last L1 speaker, Ned Maddrell, died in 1974, but the language has undergone some revitalization efforts, including the introduction of the Manx language into secondary schools and into primary schools in 2001. Manx currently has about 1,660 speakers. Some children are currently being raised with Manx as their primary language. The language is a marker of Manx cultural identity. Manx uses a Latin script orthography based on English.

Language Typology

Manx belongs to the Goidelic branch (controversial) of the Celtic language family, itself a branch of the larger Indo-European family. It is a partially inflected fusional language with a number of analytic features. Verbs, nouns, and prepositions can be inflected. The word order, typical for Celtic languages, is usually VSO.

Corpus collection

There is a Manx edition of Wikipedia, as well as a number of resources at the bottom of the above wikipedia page, where pages in Manx can be found—including a Bible in Manx. I am also fairly confident that I can find at least one Manx twitter user.

2. Kabyle

Unlike Manx, Kabyle, also called Amazigh, has millions of speakers. It seems alarming to me that languages like Kabyle are under-resourced in this way, despite being a major language. This is one reason why I am interested in looking at Kabyle, another is that I have some experience with Hebrew, which is distantly related to Kabyle, and I would like to see what similarities the two languages share (if any).

ISO 639-3: kab

Language Background

Kabyle is a thriving language in Algeria, Morocco, and Tunisia. It is spoken by about 6,819,200 people worldwide, the vast majority of them in Algeria, where it is an official national language. In Algeria, it is used by people in all walks of life, alongside Arabic and French. The main writing system is a Latin script-based orthography, but a native writing system called Tifinagh is also sometimes used.

Language Typology

Kabyle belongs to the Northern Berber languages, in the Berber branch of the Afroasiatic family. The Semitic languages, such as Arabic, belong to this family, as do a number of others, like Beja. It is a fusional inflecting language with markers for gender on nouns and for tense, aspect, person, gender, and number on verbs. Prefixes and suffixes are used, sometimes in the form of cirumfixes, as in the name of the language (Taqbaylit).

Corpus collection

It will be easy to find examples of Kabyle, since the language is so widely spoken and is considered an official language in Algeria. There is also a Kabyle edition of Wikipedia:


Lastly, I would be interested in working on Lakota. As an Indigenous language of the Americas, helping create new resources for Lakota seems like a good way to further the cause of the language and the people who speak it. Of course, allyship in this issue is deceptively subtle, as seen in the article about Mapuche. As such, building a Lakota program should be done sensitively and would ideally go along with an understanding of the Lakota discourse surrounding the language.

ISO 639-3: lkt

Language Background

Like many Indigenous languages of the United States, Lakota has seen a number of suppression efforts over its history, since the arrival of Europeans. These efforts have taken their toll on the language: out of more than 100,000 Lakota, only 2,200 speak Lakota; moreover, only 100 of these are L2 speakers, meaning that the future of the language could be bleak. To counter this, there is a revitalization movement among Lakota people that has resulted in a growing number of young L2 speakers. The orthography is Latin script-based, with a number of special characters.

Language Info

Lakota is an inflecting language with SOV word order. It is also largely postpositional. Lakota is considered by some to be a variety of the Sioux language, itself a member of the Western Siouan branch of the Siouan-Catawban language family. Prefixes and suffixes are used, marking person, number, and categories like causative and comparative.

Corpus collection

The Wikipedia page for Lakota, listed above, has numerous resources where large amounts of text in Lakota can be found, including the website of the Lakota reclamation project, as well as a translated version of the Book of Common Prayer.