Guarani/Transducer

From LING073
Jump to: navigation, search

Transducer for Guarani (Work in progress).

Analyzer

Evaluation

The current corpus coverage is 2.85% (out of 1016 tokenized words in the corpus).

The most missed words were:

Class tags
Word Occurences
ha 44
a 43
va 41
pe 27
i 25
e 20
m 18
ra 17
ta 10
o 10
guarani 9
up 9
ñe 9
yvy 9
y 9
icha 9
me 8
7
Tupã 7
t 7

I added tags and analysis for Tupã (God) and for ha (and) so the test passes for all of them. With doing this, coverage went up to 8.46%.

  • Tupã<n><sg> ↔ Tupã
  • ha<conj> ↔ ha

I also added tags for guarani (Guarani), but did not add it to the lexc file.

The total number of stems in the transducer so far is 30.

Tests

For the main yaml file, 43/82 tests are passing.

For the common words yaml file, 2/20 are passing.


Generator

Initial Evaluation

For the analyzer there are 43 tests passing and 39 tests failing (82 total). The current corpus coverage is 8.46%. For the generator there are initially 37 tests passing and 73 tests failing (116 total).


Final Evaluation

Currently, 49 out of 86 morph tests pass. I added two twol rules that work in conjunction to deal with nasal harmony for the plural marker. The first rule changes the archphoneme {n} to either n or nothing depending on if the previous root is a nasal. The second rule changes the {k} to be either k or g depending on whether or not the previous root is a nasal.

However, due to the large number of morph tests for negation (which I have not yet been able to figure out because it is a circumfix), I am unable to get the coverage up to two thirds. In a modified yaml test file that does not include the negation morph tests (but still includes more than 50 tests), the transducer is able to pass more than two thirds of the tests (49 out of 65). I added support for nominal tense, which caused about 5 more tests to pass. However, I was still getting weird twol behavior with the nasalized (didn't seem to be detecting it as a nasal).

Running the coverage test now yields a total coverage if 9.25%.

Links

The transducer is available on Github.