User:Jmalin1/Final Project

From LING073
Jump to: navigation, search

Transducer for Innu

For my final project I began a transducer for Innu (also known as Montagnais). It is available on Swarthmore's github as well as git.

Implementation

The transducer handles the following major features of Innu morphology

Noun Possession

Possessed nouns in Innu receive a circumfix indicating the number and person of the possessor, and sometimes also a suffix -(i)m indicating possession. The transducer currently applies both pieces of morphology, but the appearance of the possession suffix is influenced by a number of factors including phonology, the gender of the noun, and whether or not the possession is alienable. For example, "my head" would be nustikuan in reference to one's own head and nustikuanim in reference to the detached head of an animal that the speaker owns. Ensuring that this suffix is conditioned correctly in the transducer will require further study of Innu texts or consultation with a speaker.

Verb Subject and Object Agreement

Innu transitive verbs are inflected for both subject and object. The "subject" of the verb is marked as whichever participant is higher in the hierarchy of person: 2>1>3>3(obv). If the subject of the action is lower in the hierarchy, then the verb is placed in an "inverted" form with a different suffix and sometimes a stem change. For example, using the verb petu (hear):

  • tshipetun ("You hear me") is marked with the second person prefix and first person suffix.
  • tshipetatin ("I hear you") is still marked with the second person prefix, but a different first person suffix indicating the the first person is the subject.

Evaluation

Coverage was tested over a preliminary corpus (8476 characters without spaces) composed of 4 myths taken from the Innu-Aimun website. The corpus is available in a repository on Swarthmore's github, which is private due to copyright.

Current corpus coverage is 54.58% (although approximately 22% of analyzed tokens are punctuation).

Next Steps

Double Obviation

Suffixes for obviation may change when an obviative noun is possessed by another obviative noun. Currently, this is implemented in the transducer, but messily, with some overgeneration of forms for possessed obviative nouns. It should be updated with greater understanding of the morphological processes involved.

More Verb Paradigms

While all major verb paradigms have been implemented in the present indicative, other tenses and moods have yet to be implemented. The future tense in particular has the issue that morphemes generally analyzed as verb prefixes are instead written as a separate word, meaning some reformatting of the other verb paradigms may be required to accommodate this.

Disambiguation

There are several ambiguous stems in the transducer - for example, many Innu adverbs are also interjections or conjunctions with different meanings. Variable word order means that precise disambiguation rules are difficult to come up with.

Noun and Verb Formation

As a polysynthetic language, Innu has several productive derivational processes. Currently, these are handled with different stems in the transducer, but future versions of the transducer could perhaps handle them automatically.

Dialectical Variation

Innu has several different dialects. The transducer functions in a modern, standardized orthography but could be modified to analyze text written in variant orthographies as well.