Difference between revisions of "Purépecha/Transducer"

From LING073
Jump to: navigation, search
(corpus coverage update)
Line 41: Line 41:
 
== Coverage ==
 
== Coverage ==
  
* Current coverage: 14.9%, (16590/111120)
+
* Initial coverage: 14.9% (16590/111120), this was initially run on an incorrect corpus, but the current coverage reflects coverage on the new and correct corpus
  
«by adding "{{morphTest|ka{{tag|det}}|and}}", "{{morphTest|Jose{{tag|n}}{{tag|sg}}|Joseph}}", "{{morphTest|ma{{tag|num}}|one}}", "{{morphTest|Mariani{{tag|n}}{{tag|sg}}|Maria}}", "{{morphTest|Babilonia{{tag|n}}{{tag|sg}}|Babylon}}", "{{morphTest|jimbo{{tag|det}}|for}}" to the transducer, coverage went from 14.9% to 33.7
+
* Current coverage: 29% (49371/168970)
 +
 
 +
«by adding "{{morphTest|ka{{tag|det}}|and}}", "{{morphTest|Jose{{tag|n}}{{tag|sg}}|Joseph}}", "{{morphTest|ma{{tag|num}}|one}}", "{{morphTest|Mariani{{tag|n}}{{tag|sg}}|Maria}}", "{{morphTest|Babilonia{{tag|n}}{{tag|sg}}|Babylon}}", "{{morphTest|jimbo{{tag|det}}|for}}" to the transducer, coverage went from 14.9% to 29
  
 
== Notes ==
 
== Notes ==
Line 50: Line 52:
  
 
* Originally, our corpus was primarily taken from tweets by a native Purepechan, but we were able to find a Bible in Purepechan that we added to our corpus.
 
* Originally, our corpus was primarily taken from tweets by a native Purepechan, but we were able to find a Bible in Purepechan that we added to our corpus.
 +
 +
* We incorrectly scraped the Bible for our corpus, so originally, it was repeating the same chapter over and over again. We fixed this in the most updated version.
  
 
[[Category:Purépecha]] [[Category:Sp21 Transducers]]
 
[[Category:Purépecha]] [[Category:Sp21 Transducers]]

Revision as of 14:13, 15 April 2021

Code

Github Repo[1]

Tests

  • As of now, our Transducer passes 110/197 tests generated from our Wikipedia page

Lexical Info

  • Lexicons: 10
  • Lexicon entries: 80
  • Patterns: 2
  • Pattern entries: 5


Counts for individual lexicons:

  • NounRoot: 3
  • RegNounInfl: 2
  • ObjectRoot: 19
  • Object: 1
  • Punctuation: 22
  • V-Stem: 13
  • AspectTime: 10
  • ModeInterrogative: 9
  • All anonymous lexicons: 1

Coverage

  • Initial coverage: 14.9% (16590/111120), this was initially run on an incorrect corpus, but the current coverage reflects coverage on the new and correct corpus
  • Current coverage: 29% (49371/168970)

«by adding "ka<det> ↔ and", "Jose<n><sg> ↔ Joseph", "ma<num> ↔ one", "Mariani<n><sg> ↔ Maria", "Babilonia<n><sg> ↔ Babylon", "jimbo<det> ↔ for" to the transducer, coverage went from 14.9% to 29%»

Notes

  • There are some more complex grammar forms that we aren't sure how to code yet.
  • Originally, our corpus was primarily taken from tweets by a native Purepechan, but we were able to find a Bible in Purepechan that we added to our corpus.
  • We incorrectly scraped the Bible for our corpus, so originally, it was repeating the same chapter over and over again. We fixed this in the most updated version.