Purépecha/Transducer
From LING073
Contents
Analyser Evaluation
Code
Github Repo[1]
Tests
- As of now, our Transducer passes 110/197 tests generated from our Wikipedia page
Lexical Info
- Lexicons: 10
- Lexicon entries: 80
- Patterns: 2
- Pattern entries: 5
Counts for individual lexicons:
- NounRoot: 3
- RegNounInfl: 2
- ObjectRoot: 19
- Object: 1
- Punctuation: 22
- V-Stem: 13
- AspectTime: 10
- ModeInterrogative: 9
- All anonymous lexicons: 1
Coverage
- Initial coverage: 14.9% (16590/111120), this was initially run on an incorrect corpus, but the current coverage reflects coverage on the new and correct corpus
- Current coverage: 29% (49371/168970)
«by adding "ka<det> ↔ and", "Jose<n><sg> ↔ Joseph", "ma<num> ↔ one", "Mariani<n><sg> ↔ Maria", "Babilonia<n><sg> ↔ Babylon", "jimbo<det> ↔ for" to the transducer, coverage went from 14.9% to 29%»
Notes
- There are some more complex grammar forms that we aren't sure how to code yet.
- Originally, our corpus was primarily taken from tweets by a native Purepechan, but we were able to find a Bible in Purepechan that we added to our corpus.
- We incorrectly scraped the Bible for our corpus, so originally, it was repeating the same chapter over and over again. We fixed this in the most updated version.
Generator Evaluation
Initial evaluation of morphological generation
- Our Transducer passes 110/197 tests generated from our Wikipedia page
- Current corpus coverage: 29% (49371/168970)
- Our morphological generation test passes 55/102
final evaluation of morphological generation
Notes
Added transitive, took out extra transitive, changed <p3> to <p3><sg>
WE NEED TO ADD RULE WHERE ï IS ACCEPTED AS i