Guarani/Transducer
Transducer for Guarani (Work in progress).
Contents
Analyzer
Evaluation
The current corpus coverage is 2.85% (out of 1016 tokenized words in the corpus).
The most missed words were:
Word | Occurences |
---|---|
ha | 44 |
a | 43 |
va | 41 |
pe | 27 |
i | 25 |
e | 20 |
m | 18 |
ra | 17 |
ta | 10 |
o | 10 |
guarani | 9 |
up | 9 |
ñe | 9 |
yvy | 9 |
y | 9 |
icha | 9 |
me | 8 |
rã | 7 |
Tupã | 7 |
t | 7 |
I added tags and analysis for Tupã (God) and for ha (and) so the test passes for all of them. With doing this, coverage went up to 8.46%.
- Tupã<n><sg> ↔ Tupã
- ha<conj> ↔ ha
I also added tags for guarani (Guarani), but did not add it to the lexc file.
The total number of stems in the transducer so far is 30.
Tests
For the main yaml file, 43/82 tests are passing.
For the common words yaml file, 2/20 are passing.
Generator
Initial Evaluation
For the analyzer there are 43 tests passing and 39 tests failing (82 total). The current corpus coverage is 8.46%. For the generator there are initially 37 tests passing and 73 tests failing (116 total).
Final Evaluation
Currently, 49 out of 86 morph tests pass. I added two twol rules that work in conjunction to deal with nasal harmony for the plural marker. The first rule changes the archphoneme {n} to either n or nothing depending on if the previous root is a nasal. The second rule changes the {k} to be either k or g depending on whether or not the previous root is a nasal.
However, due to the large number of morph tests for negation (which I have not yet been able to figure out because it is a circumfix), I am unable to get the coverage up to two thirds. In a modified yaml test file that does not include the negation morph tests (but still includes more than 50 tests), the transducer is able to pass more than two thirds of the tests (49 out of 65). I added support for nominal tense, which caused about 5 more tests to pass. However, I was still getting weird twol behavior with the nasalized ẽ (didn't seem to be detecting it as a nasal).
Running the coverage test now yields a total coverage if 9.25%.
Links
The transducer is available on Github.