Documenting resources

From LING073
Jump to: navigation, search

The assignment

The point of this exercise is to scour library and online resources for everything you can find about a language. This assignment is due by the beginning of the Tuesday class during the second week of classes (this semester: 11:20 on 24 January 2016).

Generally, there are five main categories of resources that we're interested in for a given language:

  • computational resources (spell checkers, orthography converters, speech recognition software, keyboard layouts, machine translators),
  • dictionaries/phrasebooks/glossaries (multilingual and monolingual, online and paper),
  • grammatical descriptions (theoretical and pædagogical),
  • scientific works (papers, books, websites), and
  • corpora (any collection of authentic text, linguistically annotated or not)

Your task is to find any resources possible, categorise them appropriately, provide a short description, list licenses, and work to obtain any resources that are not immediately available that would appear to be useful. You will submit your work by creating a page on the wiki. Don't worry about putting it under your user page as with language selection—just make a page named the same as the language name.

In terms of finding resources, check the hints under Places to look for corpora. Basically, use general search engines, library search engines (like Tripod and WorldCat), scholarly article databases, references mentioned in less useful materials (like Wikipedia articles), and even emailing linguists who may have done some work on the language or people identifying as speakers of the language (people who grew up speaking the language, anthropologists, etc.). There is also a google drive folder full of grammars of quite a few languages that is findable through most search engines, but the contents of which do not come up in a search.

In terms of categorisation, your top-level headings can be the five categories listed above, or whatever makes sense. You can look at some resource categorisation pages that already exist on the Apertium wiki for ideas: Kazakh, Farsi, Aromanian.

Describe the resource in a single informative phrase. E.g., "dictionary of technical terms; includes lots of computer terminology" or "purportedly a transducer of Chukchi, but no source code seems to be available".

Specify licenses for anything you think could be useful to take text from. For example, if there's an electronic dictionary, it may be useful to extract all the words from it, but the license may not allow for that. Other examples include example sentences in grammar books and academic publications and authentic texts in the language. Licenses will typically range from all rights reserved (most physically published material) to public domain (almost anything on Project Gutenberg), and there's a lot in between (anything on wikipedia, wiktionary, etc.). Copyrighted materials can be used for evaluation, but may not be released or distributed—though the copyright holder may be contactable and may be amenable to you using it for this class or even releasing annotated forms of it (contact early, contact often!).

See which of the resources you can obtain. Start with the most useful-looking ones first. If some of them require borrowing from another library, start the request through ILL as soon as possible. If a resource appears to exist, but is not easily found, contact people who cite or mention it.

Add the category you used before that's also the name of the language, and one for sp17_ResourceDocumentation. Also, make sure you've added yourself and your language to the table at Sp17_LanguageSelection.

Submit the assignment as a page to the wiki with the name of the language. Basically, every time you update the wiki page you're "resubmitting" it—just make sure you have the whole thing done by the submission date.