Grammar documentation

From LING073
Revision as of 17:57, 6 February 2019 by Jwashin1 (talk | contribs) (Morphological analyses)

Jump to: navigation, search

Now that we have text in the language and are getting comfortable typing in the language, it's time to explore the grammar of the language. This will put you in a good spot to start implementing its morphology computationally - the next unit.

Morphology

Morphology is concerned with how words are formed in a language.

Common morpholical strategies

Functional vs derivational morphology

For the purposes of grammar documentation, we should concern ourselves only with functional morphology - that is, alternations between forms that change the function (as opposed to the meaning) of a word. It should also be a productive alternation (i.e., an alternation that holds for any word of the same word class) even if it isn't always formed in the same way. A straightforward example is English plurals - basically every noun in English has a plural form, even though it could be formed irregularly (sometimes even with no phonological alternation, like "sheep/sheep").

An example of a derivational (as opposed to functional) non-productive morphological alternation might be the adjectival "un" prefix in English (in words like "unhappy", "unusual", "uncaring"). You can make new word forms with it, which means it's at least semi-productive, but you can't form new words from just any adjective using "un": *"unexpensive", *"unfast", *"unpeculiar" might be expected, but are impossible. Furthermore, the prefix changes the meaning and doesn't affect how the word is used in a sentence ("They are happy", "They are unhappy" are both okay, cf. "They are cats", *"They are cat", where the meaning would be basically the same, but the particular grammatical form of the word is no longer compatible with the sentence).

We need to keep in mind both the distinctions being made in a language and the language-specific strategies for the distinction. If a language doesn't distinguish number on nouns, then we don't need to try to impose a singular and plural distinction. In more analytical terms, we don't want to say that any noun is both singular and plural, like the form "sheep" is in English - we simply leave analyses of number out of the picture. If the primary strategy in a language is that the plural receives an additional morphological element, then we can think of the singular as "unmarked", and our analyses of the singular nouns should probably just be "noun" (not "noun.singular"). However, if the primary strategy of the language is to change a morphological element, then we probably want to analyse the singular noun forms as explicitly singular ("noun.singular").

The sorts of things we're looking for:

  • Any alternation of forms of a given word based on their syntactic or phonological environment
    • Phonological and [functional] morphological alternations
  • Any categorisation schema relevant to the lexicon
    • noun/verb classes, pronoun features, etc.
    • can have bearing on what morphology is taken, or what syntactic arguments are allowed

Morphological analyses

There are standardised formal representations of the mapping between morphological form and morphological analysis. A morphological form is just any word form of a language, whereas a morphological analysis tells you further information about that form. (Soon we will be developing a transducer to map between these two representations.) For our purposes, a morphological analysis will consist of a stem followed by a set of tags, or more specifically:

  • A lemma, or the stem or "base form" of the word. In English, a noun or verb lemma will just be the bare form of the word: the lemma of "cats" is "cat" and the lemma of "running" is "run". A more complicated example is the lemma of "is", which would be "be", which isn't morphologically a root, but is still the lemma.
  • A series of tags, or abbreviations used to both categorise words and provide information about common morphological properties of the words. We'll try to stick to the Apertium tagset.
    • The first tag is usually the word category or part of speech (often POS), for example <n> for "noun", v for "verb", adj for "adj".
    • A subcategory tag may follow if relevant for the particular language and part of speech, for example gender for nouns, transitivity for verbs, etc.
    • Also optionally are tags for the relevant morphological distinctions, often referred to as grammatical tags. This can include tags for things like person agreement (for verbs, possessed nouns, etc.), number (e.g., for nouns), number agreement (for determiners, adjectives, etc.), polarity (e.g., <neg> for negative), and lots of other things.
      • Note that the order of functional morphology tags isn't always clear. By default, it should occur in the order the morphemes appear in in the language, but this may not be stable within a single language's grammar. When in doubt, you can work down from "major distinction" to "minor distinction", whatever those terms might mean for your particular language.

An example is provided in the following diagram:

Structure of morphological analyses.png

More specifically

At this point in the course, we will be working through the grammar of a language by trying to understand the mapping of specific forms to specific analyses.

To consider

  • Are there any irregular forms in the language? What about the pronouns (are their alternations identical to nouns?)?
  • Is there any sort of agreement morphology for person/number/etc on verbs or nouns?
  • How are tense/mood/aspect/evidentiality marked in the language? Do the verbs change form? Are there auxiliary verbs or other particle-like words that might be analysed as part of the morphology instead of as separate words? Do transitive and intransitive verbs take different morphology, or different syntactic arguments?
  • Do nouns change form in different number (e.g., singular/plural)? Do they change form based on how they are used in the sentence? Are they lexically specified for class (masculine/feminine, or more?)? Do all nouns take the same set of forms?
  • Do adjectives behave like nouns in terms of number/case/etc.? How are comparatives formed?
  • What properties of personal pronouns are distinguished? Most languages have at least 3 person distinctions (1st, 2nd, 3rd) and many have number distinctions as well (singular, plural). Are other things distinguished, like an additional person or number, relative social class encoding of speaker and hearer (for all pronouns or just 2nd person?), etc.?
  • What about demonstrative and interrogative pronouns?
  • Are there any phonologically productive alternations in the language?
  • Make sure you say what the use of the morphology is.

Examples

See Grammar documentation/Examples for examples of how to do the assignment. Following are some ideas for the types of things to look at:

  • You could document something similar to the plural pattern(s) of English. List the regular form, predictable alternations, and a list of irregular forms.
  • You could document something like a single tense conjugation of Spanish. Mention that the theme vowel determines what the set of endings is, and list the endings for each person/number.
  • spellrelax. If there is a list of common spelling alternatives that you want to interpret as a given standard, listing those (with some explanation) can count as one grammar point. For example, if certain accent marks are considered proper, but most people don't use them, then you'll want to interpret characters without these accent marks

The assignment

This assignment is due before lab on Thursday of the 4th week of class (this semester, 8:30 on Thursday, February 14th, 2019).

  1. Determine what the main parts of speech are in your language. There are going to be some open classes, like nouns, verbs, and adjectives, and some [relatively more] closed classes like prepositions or pronouns. Create one section on a Language/Grammar page on the wiki outlining the main parts of speech and any subcategories, providing computational POS symbols (or tags) for each one that are compatible with the ones used by the Apertium project. Give an example or two of each class and subclass using the {{morphTest}} template as it's used on the examples page.
  2. Find any set of alternations (as described above) in your language. For each one, write one new section describing this grammar point. Provide some examples. You'll have to make preliminary decisions about what the base form is, what tags you should be using, etc. Some examples are available. You should have at least ten grammar points in all.
    • If you're working on a polysynthetic language, you may have a lot of options to sort out in order to choose discrete grammar points, and if you're working on a more isolating language, there may not be a lot of morphology points to choose from. If you need easier grammar points or just more grammar points, then feel free to create sections for some of the examples listed above that aren't on the examples page. Include these in an "Other" section, since they won't be relevant for your language. If you're working on a polysynthetic language, though, please limit the number of easier grammar points from other languages you choose to two only.
    • As mentioned above, you can list spellrelax mappings and count that as one grammar point.
    • If you identify a dominant pattern (like x when A and y when B), and are also able to document a number of exceptions, this can count as a second grammar point—but even if there are four dominant patterns, if it's the same process it can only count as two grammar points.
    • Each grammar point should have at least three examples using the {{morphTest}} template.
  3. Add the page to the category Grammar documentation and also a category for your language. Add a link to this new page to the main language page, under the section for resources developed in this class.

Sanity checks

  • There should be on the order of 50 morphTests, 30 at an absolute minimum.
    • You can have examples of each part of speech tag in the initial section.
  • Each morphTest should have an analysis on the left and a form on the right.
    • The analysis should have a stem (or "lemma"), a main categorisation tag (e.g., <v>), any sub-categorisation tags (e.g., <iv>), and any morphology tags (e.g., <past>).
    • The morphological forms should be proper orthographic forms of the language (i.e., native orthography, not grammar book orthography). There should be no dashes in the forms, no extraneous quotation marks, no English glosses, etc. inside the morphTest template. You can have these things in notes outside the template.
  • Make sure you use the same tag throughout the page consistently—e.g., you don't want <v>, <vb>, and <vblex> all used for verbs—choose one and be consistent.
  • There should be minimal use of non-productive morphology, such as derivations. An example of this might be infinite<adj><→n> ↔ infinity, since this same process can't be applied freely to any noun. In some languages, derivation of this sort is entirely or almost entirely productive, in which case this is fine.