Grammar documentation

From LING073
Revision as of 22:26, 7 February 2018 by Jwashin1 (talk | contribs) (Implementation)

Jump to: navigation, search

Now that we have text in the language and are getting comfortable typing in the language, it's time to explore the grammar of the language. This will put you in a good spot to start implementing its morphology computationally - the next unit.

Morphology

Morphology is concerned with how words are formed in a language.

For the purposes of grammar documentation, we should concern ourselves only with functional morphology - that is, alternations between forms that change the function (as opposed to the meaning) of a word. It should also be a productive alternation (i.e., an alternation that holds for any word of the same word class) even if it isn't always formed in the same way. A straightforward example is English plurals - basically every noun in English has a plural form, even though it could be formed irregularly (sometimes even with no phonological alternation, like "sheep/sheep").

An example of a derivational (i.e., not functional) non-productive morphological alternation might be the adjectival "un" prefix in English (in words like "unhappy", "unusual", "uncaring"). You can make new word forms with it, but it changes the meaning, doesn't affect how the word is used in a sentence, and can't, in fact, form new words with just any adjective (*"unexpensive", *"unfast", *"unpeculiar").

We need to keep in mind both the distinctions being made in a language and the language-specific strategies for the distinction. If a language doesn't distinguish number on nouns, then we don't need to try to impose a singular and plural distinction. In more analytical terms, we don't want to say that any noun is both singular and plural, like the form "sheep" is in English - we simply leave analyses of number out of the picture. If the primary strategy in a language is that the plural receives an additional morphological element, then we can think of the singular as "unmarked", and our analyses of the singular nouns should probably just be "noun" (not "noun.singular"). However, if the primary strategy of the language is to change a morphological element, then we probably want to analyse the singular noun forms as explicitly singular ("noun.singular").

The sorts of things we're looking for:

  • Any alternation of forms of a given word based on their syntactic or phonological environment
    • Phonological and [functional] morphological alternations
  • Any categorisation schema relevant to the lexicon
    • noun/verb classes, pronoun features, etc.
    • can have bearing on what morphology is taken, or what syntactic arguments are allowed

To consider

  • Are there any irregular forms in the language? What about the pronouns (are their alternations identical to nouns?)?
  • Is there any sort of agreement morphology for person/number/etc on verbs or nouns?
  • How are tense/mood/aspect/evidentiality marked in the language? Do the verbs change form? Are there auxiliary verbs or other particle-like words that might be analysed as part of the morphology instead of as separate words? Do transitive and intransitive verbs take different morphology, or different syntactic arguments?
  • Do nouns change form in different number (e.g., singular/plural)? Do they change form based on how they are used in the sentence? Are they lexically specified for class (masculine/feminine, or more?)? Do all nouns take the same set of forms?
  • Do adjectives behave like nouns in terms of number/case/etc.? How are comparatives formed?
  • What properties of personal pronouns are distinguished? Most languages have at least 3 person distinctions (1st, 2nd, 3rd) and many have number distinctions as well (singular, plural). Are other things distinguished, like an additional person or number, relative social class encoding of speaker and hearer (for all pronouns or just 2nd person?), etc.?
  • What about demonstrative and interrogative pronouns?
  • Are there any phonologically productive alternations in the language?
  • Make sure you say what the use of the morphology is.

Implementation

There are standard abbreviations used for word categories and common morphological alternations, called tags. We'll try to stick to the Apertium tagset.

We need to decide on a tag for each word category ("part of speech"), for each subcategory, and also for each functional morphological alternation made. We also need to decide on the order the tags occur in - standardly you have the order of "category - subcategory - [functional morphology]", but it isn't always clear what order the functional morphology should occur in. By default, it should occur in the order the morphemes appear in, but this may not be stable within a language's grammar. When in doubt, you can work down from "major distinction" to "minor distinction".

A stem followed by a set of tags constitutes an analysis.

At this stage, we will be working through the grammar of a language by trying to map specific forms to specific analyses.

Examples

See Grammar documentation/Examples for examples of how to do this. Following are some ideas for the types of things to look at:

  • You could document something similar to the plural pattern(s) of English. List the regular form, predictable alternations, and a list of irregular forms.
  • You could document something like a single tense conjugation of Spanish. Mention that the theme vowel determines what the set of endings is, and list the endings for each person/number.
  • spellrelax. If there is a list of common spelling alternatives that you want to interpret as a given standard, listing those (with some explanation) can count as one grammar point. For example, if certain accent marks are considered proper, but most people don't use them, then you'll want to interpret characters without these accent marks

The assignment

This assignment is due before lab on Thursday of the 4th week of class (this semester, 11:20 on Thursday, February 15th, 2018).

  1. Determine what the main parts of speech are in your language. There are going to be some open classes, like nouns, verbs, and adjectives, and some [relatively more] closed classes like prepositions or pronouns. Create one section on a Language/Grammar page on the wiki outlining the main parts of speech and any subcategories, providing computational POS symbols (or tags) for each one that are compatible with the ones used by the Apertium project. Give an example or two of each class and subclass using the {{morphTest}} template as it's used on the examples page.
  2. Find any set of alternations (as described above) in your language. For each one, write one new section describing this grammar point. Provide some examples. You'll have to make preliminary decisions about what the base form is, what tags you should be using, etc. Some examples are available. You should have at least ten grammar points in all.
    • If you're working on a polysynthetic language, you may have a lot of options to sort out in order to choose discrete grammar points, and if you're working on a more isolating language, there may not be a lot of morphology points to choose from. If you need easier grammar points or just more grammar points, then feel free to create sections for some of the examples listed above that aren't on the examples page. Include these in an "Other" section, since they won't be relevant for your language. If you're working on a polysynthetic language, though, please limit the number of easier grammar points from other languages you choose to two only.
    • As mentioned above, you can list spellrelax mappings and count that as one grammar point.
    • If you identify a dominant pattern (like x when A and y when B), and are also able to document a number of exceptions, this can count as a second grammar point—but even if there are four dominant patterns, if it's the same process it can only count as two grammar points.
    • Each grammar point should have at least three examples using the {{morphTest}} template.
  3. Add the page to the category Grammar documentation and also a category for your language. Add a link to this new page to the main language page, under the section for resources developed in this class.

Sanity checks

  • There should be on the order of 50 morphTests, 30 at an absolute minimum.
    • You can have examples of each part of speech tag in the initial section.
  • Each morphTest should have an analysis on the left and a form on the right.
    • The analysis should have a stem (or "lemma"), a main categorisation tag (e.g., <v>), any sub-categorisation tags (e.g., <iv>), and any morphology tags (e.g., <past>).
    • The morphological forms should be proper orthographic forms of the language (i.e., native orthography, not grammar book orthography). There should be no dashes in the forms, no extraneous quotation marks, no English glosses, etc. inside the morphTest template. You can have these things in notes outside the template.
  • Make sure you use the same tag throughout the page consistently—e.g., you don't want <v>, <vb>, and <vblex> all used for verbs—choose one and be consistent.
  • There should be minimal use of non-productive morphology, such as derivations. An example of this might be infinite<adj><→n> ↔ infinity, since this same process can't be applied freely to any noun. In some languages, derivation of this sort is entirely or almost entirely productive, in which case this is fine.