Difference between revisions of "Purépecha/Transducer"

From LING073
Jump to: navigation, search
(lexical info)
m (finished)
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Code ==
+
== Analyser Evaluation ==
 +
 
 +
=== Code ===
  
 
Github Repo[https://github.swarthmore.edu/Ling073-sp21/ling073-tsz]
 
Github Repo[https://github.swarthmore.edu/Ling073-sp21/ling073-tsz]
  
== Tests ==
+
=== Tests ===
  
 
* As of now, our Transducer passes 110/197 tests generated from our Wikipedia page
 
* As of now, our Transducer passes 110/197 tests generated from our Wikipedia page
  
== Lexical Info ==
+
=== Lexical Info ===
  
Lexicons: 10
+
*Lexicons: 10
  
Lexicon entries: 80
+
*Lexicon entries: 80
  
Patterns: 2
+
*Patterns: 2
  
Pattern entries: 5
+
*Pattern entries: 5
  
  
Line 21: Line 23:
 
Counts for individual lexicons:
 
Counts for individual lexicons:
  
NounRoot: 3
+
*NounRoot: 3
 +
 
 +
*RegNounInfl: 2
 +
 
 +
*ObjectRoot: 19
 +
 
 +
*Object: 1
 +
 
 +
*Punctuation: 22
 +
 
 +
*V-Stem: 13
 +
 
 +
*AspectTime: 10
 +
 
 +
*ModeInterrogative: 9
 +
 
 +
*All anonymous lexicons: 1
 +
 
 +
=== Coverage ===
 +
 
 +
* Initial coverage: 14.9% (16590/111120), this was initially run on an incorrect corpus, but the current coverage reflects coverage on the new and correct corpus
 +
 
 +
* Current coverage: 29% (49371/168970)
 +
 
 +
«by adding "{{morphTest|ka{{tag|det}}|and}}", "{{morphTest|Jose{{tag|n}}{{tag|sg}}|Joseph}}", "{{morphTest|ma{{tag|num}}|one}}", "{{morphTest|Mariani{{tag|n}}{{tag|sg}}|Maria}}", "{{morphTest|Babilonia{{tag|n}}{{tag|sg}}|Babylon}}", "{{morphTest|jimbo{{tag|det}}|for}}" to the transducer, coverage went from 14.9% to 29%»
 +
 
 +
=== Notes ===
 +
 
 +
* There are some more complex grammar forms that we aren't sure how to code yet.
 +
 
 +
* Originally, our corpus was primarily taken from tweets by a native Purepechan, but we were able to find a Bible in Purepechan that we added to our corpus.
 +
 
 +
* We incorrectly scraped the Bible for our corpus, so originally, it was repeating the same chapter over and over again. We fixed this in the most updated version.
 +
 
 +
== Generator Evaluation ==
  
RegNounInfl: 2
+
=== Initial evaluation of morphological generation ===
  
ObjectRoot: 19
+
* Our Transducer passes 110/197 tests generated from our Wikipedia page
  
Object: 1
+
* Current corpus coverage: 29% (49371/168970)
  
Punctuation: 22
+
* Our morphological generation test passes 55/102
  
V-Stem: 13
+
=== final evaluation of morphological generation ===
  
AspectTime: 10
+
* Our Transducer passes 142/216 tests generated from our Wikipedia page
  
ModeInterrogative: 9
+
* Current corpus coverage: 29.3% (49473/168985)
  
All anonymous lexicons: 1
+
* Our morphological generation test passes 71/111
  
== Notes ==
+
=== Notes ===
  
Currently, the verbs are not passing because we're not sure how to code the verb forms together. At the most basic level, we have 4 verb tenses that we need to code altogether.
+
Added transitive, took out extra transitive, changed <p3> to <p3><sg>
  
 
[[Category:Purépecha]] [[Category:Sp21 Transducers]]
 
[[Category:Purépecha]] [[Category:Sp21 Transducers]]

Latest revision as of 15:14, 20 April 2021

Analyser Evaluation

Code

Github Repo[1]

Tests

  • As of now, our Transducer passes 110/197 tests generated from our Wikipedia page

Lexical Info

  • Lexicons: 10
  • Lexicon entries: 80
  • Patterns: 2
  • Pattern entries: 5


Counts for individual lexicons:

  • NounRoot: 3
  • RegNounInfl: 2
  • ObjectRoot: 19
  • Object: 1
  • Punctuation: 22
  • V-Stem: 13
  • AspectTime: 10
  • ModeInterrogative: 9
  • All anonymous lexicons: 1

Coverage

  • Initial coverage: 14.9% (16590/111120), this was initially run on an incorrect corpus, but the current coverage reflects coverage on the new and correct corpus
  • Current coverage: 29% (49371/168970)

«by adding "ka<det> ↔ and", "Jose<n><sg> ↔ Joseph", "ma<num> ↔ one", "Mariani<n><sg> ↔ Maria", "Babilonia<n><sg> ↔ Babylon", "jimbo<det> ↔ for" to the transducer, coverage went from 14.9% to 29%»

Notes

  • There are some more complex grammar forms that we aren't sure how to code yet.
  • Originally, our corpus was primarily taken from tweets by a native Purepechan, but we were able to find a Bible in Purepechan that we added to our corpus.
  • We incorrectly scraped the Bible for our corpus, so originally, it was repeating the same chapter over and over again. We fixed this in the most updated version.

Generator Evaluation

Initial evaluation of morphological generation

  • Our Transducer passes 110/197 tests generated from our Wikipedia page
  • Current corpus coverage: 29% (49371/168970)
  • Our morphological generation test passes 55/102

final evaluation of morphological generation

  • Our Transducer passes 142/216 tests generated from our Wikipedia page
  • Current corpus coverage: 29.3% (49473/168985)
  • Our morphological generation test passes 71/111

Notes

Added transitive, took out extra transitive, changed <p3> to <p3><sg>