Difference between revisions of "Purépecha/Transducer"
From LING073
(corpus coverage update) |
|||
Line 41: | Line 41: | ||
== Coverage == | == Coverage == | ||
− | * | + | * Initial coverage: 14.9% (16590/111120), this was initially run on an incorrect corpus, but the current coverage reflects coverage on the new and correct corpus |
− | «by adding "{{morphTest|ka{{tag|det}}|and}}", "{{morphTest|Jose{{tag|n}}{{tag|sg}}|Joseph}}", "{{morphTest|ma{{tag|num}}|one}}", "{{morphTest|Mariani{{tag|n}}{{tag|sg}}|Maria}}", "{{morphTest|Babilonia{{tag|n}}{{tag|sg}}|Babylon}}", "{{morphTest|jimbo{{tag|det}}|for}}" to the transducer, coverage went from 14.9% to | + | * Current coverage: 29% (49371/168970) |
+ | |||
+ | «by adding "{{morphTest|ka{{tag|det}}|and}}", "{{morphTest|Jose{{tag|n}}{{tag|sg}}|Joseph}}", "{{morphTest|ma{{tag|num}}|one}}", "{{morphTest|Mariani{{tag|n}}{{tag|sg}}|Maria}}", "{{morphTest|Babilonia{{tag|n}}{{tag|sg}}|Babylon}}", "{{morphTest|jimbo{{tag|det}}|for}}" to the transducer, coverage went from 14.9% to 29%» | ||
== Notes == | == Notes == | ||
Line 50: | Line 52: | ||
* Originally, our corpus was primarily taken from tweets by a native Purepechan, but we were able to find a Bible in Purepechan that we added to our corpus. | * Originally, our corpus was primarily taken from tweets by a native Purepechan, but we were able to find a Bible in Purepechan that we added to our corpus. | ||
+ | |||
+ | * We incorrectly scraped the Bible for our corpus, so originally, it was repeating the same chapter over and over again. We fixed this in the most updated version. | ||
[[Category:Purépecha]] [[Category:Sp21 Transducers]] | [[Category:Purépecha]] [[Category:Sp21 Transducers]] |
Revision as of 14:13, 15 April 2021
Contents
Code
Github Repo[1]
Tests
- As of now, our Transducer passes 110/197 tests generated from our Wikipedia page
Lexical Info
- Lexicons: 10
- Lexicon entries: 80
- Patterns: 2
- Pattern entries: 5
Counts for individual lexicons:
- NounRoot: 3
- RegNounInfl: 2
- ObjectRoot: 19
- Object: 1
- Punctuation: 22
- V-Stem: 13
- AspectTime: 10
- ModeInterrogative: 9
- All anonymous lexicons: 1
Coverage
- Initial coverage: 14.9% (16590/111120), this was initially run on an incorrect corpus, but the current coverage reflects coverage on the new and correct corpus
- Current coverage: 29% (49371/168970)
«by adding "ka<det> ↔ and", "Jose<n><sg> ↔ Joseph", "ma<num> ↔ one", "Mariani<n><sg> ↔ Maria", "Babilonia<n><sg> ↔ Babylon", "jimbo<det> ↔ for" to the transducer, coverage went from 14.9% to 29%»
Notes
- There are some more complex grammar forms that we aren't sure how to code yet.
- Originally, our corpus was primarily taken from tweets by a native Purepechan, but we were able to find a Bible in Purepechan that we added to our corpus.
- We incorrectly scraped the Bible for our corpus, so originally, it was repeating the same chapter over and over again. We fixed this in the most updated version.