User:WDENGLE1/Language selection
From LING073
My experience tilts more "computational" than "linguistics". In particular, my knowledge of phonetics and phonology is weak. Therefore, I'd prefer to work with someone with a strong linguistics background and perhaps less computational experience so that we could complement each other!
Ngaanyatjarra
- ISO 639-3: ntj.
- An indigenous language of Western Australia, particularly Warburton.
- about 1,100 native speakers (as of 2016 census).
- Morphological typology: synthetic? (the WP article refers to afixation).
- Written exclusively in the Latin script without diacritics, so encoding text/a keyboard layout should be trivial (just use English's).
Sources
- Wikipedia article (contains some general info).
- Ngaanyatjarra's entry in the Australian Indigenous Languages Database refers to some grammatical descriptions and sources of text, but I'm not sure how to access/find them.
- Ngaanyatjarra–English dictionary available at UPenn, can we obtain this via Quaker Consortium? (paper/probable DRM, probably inaccessible, machine-readability Highly unlikely).
- A Ngaanyatjarra–English wordlist presented as an HTML table, accessible and should be machine parsable.
- A small English dictionary of Western Desert terms:Ngaanyatjarra is part of the Western Desert language family. PDF, has a text layer, probably harder to machine-parse.
- Bible translations in various formats (the easiest for machine and my readability appear to be Epub/HTML).
- A pre-reading booklet for children, PDF appears to have a (at least somewhat complete) text layer but should be visually verified.
- Some books in Ngaanyatjarra collected by the Goldfields Aboriginal Language Centre: website is tricky accessibility-wise, might be worth contacting them for more resources if I work on this language.
Low German
- ISO 639-3: nds.
- West Germanic language variety spoken mainly in Northern Germany, with closely related dialects spoken in the northeastern part of the Netherlands.
- The language is referred to in itself as Plattdütsch among other names.
- Estimated 4.35–7.15 million native speakers.
- Morphological classification: synthetic (declension of adjectives/nouns and conjugation of verbs described on WP).
- Latin script, with competing orthographic standards based on Dutch's and German's, so the Dutch and German keyboard layouts, respectively, should suffice.
Sources
Balinese
- ISO 639-3: ban.
- Malayo-Polynesian language spoken on the Indonesian island of Bali as well as Northern Nusa Penida, Western Lombok, Eastern Java, Southern Sumatra, and Sulawesi.
- Native speakers: 3.3 million (as of 2000 census).
- Morphological typology: synthetic.
- Written both in Latin script and in its own script, though primarily older texts are written in the latter. This could pose challenges (my screen reader reads Balinese characters as hex codes).