Documenting resources

From LING073
Revision as of 16:37, 16 February 2021 by Jwashin1 (talk | contribs) (The assignment)

Jump to: navigation, search

The assignment

The point of this exercise is to scour library and online resources for everything you can find about a language. This assignment is due by the beginning of the Tuesday class during the second week of classes (this semester: 14:00 on 23 February 2021).

Generally, there are five main categories of resources that we're interested in for a given language:

  • computational resources (spell checkers, orthography converters, speech recognition software, keyboard layouts, machine translators),
  • dictionaries/phrasebooks/glossaries (multilingual and monolingual, online and paper),
  • grammatical descriptions (theoretical and pædagogical),
  • scientific works (papers, books, websites), and
  • corpora (any collection of authentic text, linguistically annotated or not, including Wikipedia in the language, news websites, Twitter feeds, etc.)

Your task is to find any resources that are out there, categorise them appropriately, provide a short description, list licenses, and work to obtain any resources that are not immediately available that would appear to be useful. You will submit your work by creating a page on the wiki. Don't worry about putting it under your user page as with language selection—just make a page named the same as the language name.

  • In terms of finding resources, check the hints under Places to look for corpora. Basically, use general search engines, library search engines (like Tripod and WorldCat), scholarly article databases, references mentioned in less useful materials (like Wikipedia articles), and even emailing linguists who may have done some work on the language or people identifying as speakers of the language (people who grew up speaking the language, anthropologists, etc.). There is also a google drive folder full of grammars of quite a few languages that is findable through most search engines, but the contents of which do not come up in a search.
To find text in the language, you may also try searching for words you've identified to be valid, authentic orthographic forms in the language. Make sure when you find something that it's not some other language. For example, "chala" is a word in Uzbek, but it's also a word in a bunch of other languages, not to mention a common transliteration of a word in some Indic languages—so if you're looking for Uzbek text using the word "chala", then you need to be sure that you haven't found Chichewa or transcribed Hindi instead!
  • In terms of categorisation, your top-level headings can be the five categories listed above, or whatever makes sense. You can look at some resource categorisation pages that already exist on the Apertium wiki for ideas: Kazakh, Farsi, Aromanian.
  • Describe the resource in a single informative phrase. E.g., "dictionary of technical terms; includes lots of computer terminology" or "purportedly a transducer of Chukchi, but no source code seems to be available" or "paper that talks about non-finite verb phrases in Turkic and has a few examples of Tatar".
  • Specify licenses for anything you think could be useful to take text from. For example, if there's an electronic dictionary, it may be useful to extract all the words from it, but the license may not allow for that. Other examples include example sentences in grammar books and academic publications and authentic texts in the language. Licenses will typically range from all rights reserved (most physically published material) to public domain (almost anything on Project Gutenberg), and there's a lot in between (anything on Wikipedia, Wiktionary, etc.). Copyrighted materials can be used for evaluation, but may not be released or distributed—though the copyright holder may be contactable and may be amenable to you using it for this class or even releasing annotated forms of it (contact early, contact often!).
  • See which of the resources you can obtain. Start with the most useful-looking ones first. If some of them require borrowing from another library, start the request through ILL as soon as possible. If a resource appears to exist, but is not easily found, contact people who cite or mention it. You can also always contact me or the linguistics librarian for help obtaining sources.
  • Submit the assignment as a page to the wiki with the name of the language. Basically, every time you update the wiki page you're "resubmitting" it—just make sure you have the whole thing done by the submission date.