Difference between revisions of "Wôpanâak/Transducer"

From LING073
Jump to: navigation, search
(Current Evaluation)
 
Line 19: Line 19:
 
I added 9 twol rules.  6 of these were rules for noun generation, which was implemented before this assignment, and 3 are for verbs, which I only began to implement this week.
 
I added 9 twol rules.  6 of these were rules for noun generation, which was implemented before this assignment, and 3 are for verbs, which I only began to implement this week.
  
Corpus coverage is not currently working, for some reason.
+
Corpus coverage is currently 20.1%.
  
 
==Notes==
 
==Notes==

Latest revision as of 01:50, 23 February 2017

An in-progress transducer is available here

Evaluation

Analyzer Evaluation

The transducer contained 23 different correctly inflecting noun roots, and one incompletely analyzed verb root. The current corpus coverage is 15.44% (but see below).

The top corpus words not yet entered were weeyâws, nupȣnum, wutche, puhpeeq, nuhtatôm, nâwâw, nunâw, môcheenat, and nuwachônuqun. (Some short words are excluded from this list because they are likely functional words taken from older texts with nonstandard orthography.)

In the current yaml file, 55 tests pass and 22 fail.

Generator Evaluation

Previous Analyser Evaluation

At the start of this assignment, 55 out of 88 tests passed. Some of the failures were due to nouns that were analyzing correctly but generating wrong, while many were due to verbs not being implemented yet.

Current Evaluation

In the current yaml file, 68 tests pass the generation test and 9 fail. The failing tests are all pronouns and other things that have not been implemented - the tests for nouns and intransitive verbs are all working. However, the transducer cannot handle many things not present in the tests.

I added 9 twol rules. 6 of these were rules for noun generation, which was implemented before this assignment, and 3 are for verbs, which I only began to implement this week.

Corpus coverage is currently 20.1%.

Notes

This analyzer currently analyzes nouns. As of yet, it can properly analyze nouns for possession, number, animacy, and whether the noun is absentative, obviative, or locative. (All these analyses are based on Baird's grammar.) Not too many nouns have been entered into the lexicon yet, and in particular few nouns with truncating stems. At the moment, some of these overgenerate due to the presence of an archiphoneme, while others are sent through special lexicons to get the material before any suffixes.

Corpus Coverage

The corpus has many issues, the largest of which being that for much of the text orthography is not standardized. The analysis successes all come from sentences taken from the grammar.

Initial corpus coverage was 10.59%. Adding "waskeetôp<n><aa><sg>" and "seepȣ<n><nn><sg>", and fixing "mahkus" so that its stem inflected as reflected in the corpus, improved coverage to 15.44%.