Difference between revisions of "Language selection"

From LING073
Jump to: navigation, search
m (Random list of languages that might work)
(Random list of languages that might work: +Denaʼina)
Line 87: Line 87:
 
* Lezgian
 
* Lezgian
 
* Kabardian
 
* Kabardian
 +
* Denaʼina
 
</div>
 
</div>
  

Revision as of 22:28, 19 February 2020

In Ling 073, everyone will be applying the topics of the class to an under-resourced language of their choice throughout the semester. Students will [for the most part] work in pairs on a single language, but no two pairs will work on the same language.

Note: If you have a strong desire to work on language that is normally regarded as entirely "isolating", some accommodations may be made, but you should talk with the professor about it immediately.

Considerations for language selection

  • Ideally, you should choose a language with at least some interesting morphological processes.
  • You'll need some authentic text (i.e., text produced by native speakers, even if not standardised) in this language, whether from documents found online, an excerpt of published text that you type up, someone's twitter account, or sample sentences from a grammar. See Places to look for corpora for more info.
  • You need to choose a language that doesn't have [many] existing computational resources; specific exclusions listed below:

Languages you may not choose

Note: If you really want, you may select an Apertium language listed as "incubator", but you will basically be expected to start from scratch for each assignment and ignore what's available from Apertium except to augment your resources later
  • No languages supported by Giellatekno.
  • No historical languages unless with special permission; there should be some current speech community—ideally L1—even if small
  • No conlangs unless with special permission
  • No languages chosen in a previous semester (see below)

Languages chosen in previous semesters

Languages in italics were not implemented in translation pipelines.

Random list of languages that might work

  • Western Abenaki
  • Kabardian
  • Lakota
  • Shor
  • Ndebele
  • Arrernte
  • Iatmul
  • Tiwi
  • Beja
  • Garifuna
  • Arhuaco/Ikʉ
  • Mapudungun
  • Maithili
  • Santali
  • Waray
  • Kikamba
  • Biak
  • Konkani or Rohingya
  • Platduuts (nds-nl) or Plattdüütsch (nds)
  • Alemannisch (any southern German)
  • Kinyarwanda
  • Lepcha
  • Pontic Greek
  • Somali
  • Tigre
  • Evenki
  • Pʼurhépecha
  • Kabyle
  • Mandinka
  • Lezgian
  • Kabardian
  • Denaʼina

The assignment

By the beginning of the Thursday class during the first week of classes (this semester: 08:30 on 24 January 2019), turn in the following:

  1. Make a page on the wiki:
    • Create a "Language selection" page under your userpage (wikis.swarthmore.edu/ling073/User:student1/Language_selection, replacing student1 with your username).
    • At the very top, mention who you might like to work with in a pair. This could be anything from "someone who knows linguistics really well" or "someone who is good with computers" or even a specific person (in which case, link to their language selection page!) or a note that you're not sure or don't care.
    • List in order of preference three languages you might like to work on this semester. There are some examples given above, but don't limit yourself to those. There are thousands of languages to choose from!
  2. Document some things for each language:
    • For each language, determine as best you can with the resources available a morphological typology of the language. E.g., is it primarily isolating, agglutinative, etc., and how do you know? Are there patterns in that language that reflect more than one morphological type?
    • Determine basic information about each language. How many speakers are there, where do they live, what other languages might they know, what is the status of the language in terms of its transmission to current and future generations, is there a normative orthography of some sort? What is the orthography like (what script / any interesting features / multiple official/historical orthographies / etc.)? Provide ISO codes used for the language, especially three-letter ones. Basically all of this information should be findable on [htftp://ethnologue.com ethnologue] and wikipedia (in one language or other), but feel free to use any source that seems reliable (academic papers, census data, etc.). Cite the sources you use.
    • Give some estimation of how likely it will be for you to find at least a few pages' worth of text in this language. In other words, see if you can find something online quickly—websites in the language, a translation or the bible or universal declaration of human rights, a blog, a grammar book with lots of examples, etc. Don't limit yourself to online resources—if library resources exist (even if not available at Swarthmore), that can also work! (If it's not at all likely that you can have some amount of text in the language on your screen or in your hand within a week or two, you probably should find some other language to work on!)
  3. Clean up the page
    • Include a category tag for sp19_LanguageSelection and one for the name of each language. You should have four category tags on your page, e.g. [[Category:sp19_LanguageSelection]], [[Category:Abkhaz]], and one each for the other two languages.
    • Make use of MediaWiki formatting markup. E.g., each language can be a section, data can be formatted as bullet points or in tables, citations should make use of proper macros, etc. You can see how MW markup works simply by going to edit an existing page and examining the source used to produce various elements.
  • NOTES
    • Note that conflicts of first choice will be resolved in class on Thursday, but in cases of an impasse, the first person to post their interest in the language to the wiki will get their earlier choice, and the other party will get a subsequent choice.
    • Feel free to examine language selection pages from previous years (e.g., sp17_LanguageSelection and sp18_LanguageSelection), but don't copy stuff wholesale—and note that a number of those languages have already been done so you can't choose them anyway :)