Difference between revisions of "Purepecha and Spanish"

From LING073
Jump to: navigation, search
Line 81: Line 81:
 
== Additions ==
 
== Additions ==
  
I added 100 additional stems from the bilingual dictionary that we acquired from Professor Washington. The stems were mainly nouns and verbs as those were the more understandable forms but also included a few adjectives.
+
I added 100 additional stems from the bilingual dictionary that we acquired from Professor Washington. The stems were mainly nouns and verbs as those were the more understandable forms but also included a few adjectives. Additionally around 4 disambiguation rules were added and 2 lexical selection rules.
 +
 
 +
 
 +
* Precision and Recall
 +
**annotated tsz.sentences.txt files and included in ling073-tsz-eng repo, but the precision recall test is not working
 +
 
 +
* Coverage over large corpus
 +
** ~0.32400121335403172564
 +
** Tokens in corpus: 129184
 +
** Stems in Transducer: 210
 +
 
 +
====MT tsz → spa====
 +
 
 +
* Word error rate (WER): 99.07 %
 +
* Position-independent word error rate (PER): 98.31 %
 +
* Percentage of unknown words: 0%
 +
 
 +
====MT spa → tsz====
 +
 
 +
* Word error rate (WER): 118.50 %
 +
* Position-independent word error rate (PER): 117.59 %
 +
* Percentage of unknown words: 0%
 +
 
 +
Trimmed coverage
 +
* coverage: ~0.32400121335403172564
 +
 
  
 
[[Category:Sp21_TranslationPairs]][[Category:Spanish]][[Category:Purépecha]]
 
[[Category:Sp21_TranslationPairs]][[Category:Spanish]][[Category:Purépecha]]

Revision as of 15:33, 22 May 2021

Resources


TSZ -> SPA Evaluation

Coverage Analysis

  • Monolingual transducer coverage of corpus: 50772/168988 (~0.300)
  • Bilingual transducer coverage of corpus: 26198/159706 (~0.164)

Sentence Evaluation

1.

 Original sentence: Xí kwánitaska wátsïni Maríani.
 Intended Translation: Yo presté Maria mi hija (I lent Maria my daughter).
 Biltrans Output: ^xí<prn><sg>/yo<prn><tn><p1><mf><sg><sg>$ ^kwánita<v><iv><pres><perf><p1>/prestar<vblex><pres><perf><p1>$ ^wátsï<n><sg><obj>/hijo<n><m><sg><obj>$ ^María<np><sg><obj>/Maria<np><ant><sg><obj>$^.<sent>/.<sent>$
 Translation Output: #yo #prestar #hijo #Maria

2.

 Original sentence: Xí íntskuska itsîni maríkwani.
 Intended Translation: Yo di la niña agua (I gave the girl some water).
 Biltrans Output: ^xí<prn><sg>/Yo<prn><tn><p1><mf><sg><sg>$ ^íntsku<v><tv><pres><perf><p1>/dar<vblex><pres><perf><p1>$ ^itsi<n><sg><obj>/agua<n><f><sg><obj>$ ^maríkwa<n><sg><obj>/niña<n><f><sg><obj>$^.<sent>/.<sent>$
 Translation Output: #Yo #dar #agua #niña

3.

 Original sentence: Xí tumpíni íntskuska maríkwani.
 Intended Translation: Yo di el niño la niña (I gave the girl to the boy).
 Biltrans Output: ^Xí<prn><sg>/Yo<prn><tn><p1><mf><sg><sg>$ ^tumpí<n><sg><obj>/niño<n><m><sg><obj>$ ^íntsku<v><tv><pres><perf><p1>/dar<vblex><pres><perf><p1>$ ^maríkwa<n><sg><obj>/niña<n><f><sg><obj>$^.<sent>/.<sent>$
 Translation Output: #Yo #niño #dar #niña

4.

 Original sentence: K’amárasti tsírini.
 Intended Translation: Acabe maiz (I ran out of corn).
 Biltrans Output: ^K’amára<v><tv><pres><perf><p3><sg>/Acabar<vblex><pres><perf><p3><sg>$ ^tsíri<n><sg><obj>/maiz<n><m><sg><obj>/pulga<n><f><sg><obj>$^.<sent>/.<sent>$
 Translation Output: #Acabar #maiz

5.

 Original sentence: Xwánu xwásti tsírini.
 Intended Translation: Juan trajo maiz (Juan brought corn).
 Biltrans Output: ^Xwánu<np><sg>/Juan<np><ant><sg>$ ^xwá<v><tv><pres><perf><p3><sg>/traer<vblex><pres><perf><p3><sg>$ ^tsíri<n><sg><obj>/maiz<n><m><sg><obj>/pulga<n><f><sg><obj>$^.<sent>/.<sent>$
 Translation Output: #Juan #traer #Maiz

6.

 Original sentence: Pyásti tsírini Maríani.
 Intended Translation: El compro Maria maiz (He bought Maria some corn).
 Biltrans Output: ^pyá<v><tv><pres><perf><p3><sg>/comprar<vblex><pres><perf><p3><sg>$ ^tsíri<n><sg><obj>/maiz<n><m><sg><obj>/pulga<n><f><sg><obj>$ ^María<np><sg><obj>/Maria<np><ant><sg><obj>$^.<sent>/.<sent>$
 Translation Output: #comprar #maiz #Maria

7.

 Original sentence: María tumínani k’wanírasti tumpíni.
 Intended Translation: Maria tiró dinero a el niño (Maria threw the money to the boy).
 Biltrans Output: ^María<np><sg>/Maria<np><ant><sg>$ ^tumína<n><sg><obj>/dinero<n><m><sg><obj>$ ^k’waníra<v><tv><pres><perf><p3><sg>/tirar<vblex><pres><perf><p3><sg>$ ^tumpí<n><sg><obj>/niño<n><m><sg><obj>$^.<sent>/.<sent>$
 Translation Output: #Maria #dinero #tirar #niño

8.

 Original sentence: Ewáskani acháatini warini.
 Intended Translation: Robé el hombre de la mujer (I stole the woman's husband).
 Biltrans Output: ^Ewá<v><tv><pres><perf><p1><sg>/Robar<vblex><pres><perf><p1><sg>$ ^acháati<n><sg><obj>/hombre<n><m><sg><obj>$ ^wari<n><sg><obj>/mujer<n><f><sg><obj>$^.<sent>/.<sent>$
 Translation Output: #Robar #hombre #mujer

9.

 Original sentence: T’u intsîkurhiaka tsúntsuni imáni.
 Intended Translation: Tu le daras la olla (You will give the pot away to him/her).
 Biltrans Output: ^*t/*t$’^*u/*u$ ^intsikurhi<v><tv><fut><p1>/dar<vblex><fut><p1>$ ^tsúntsu<n><sg><obj>/olla<n><f><sg><obj>$ ^imá<det><obj>/esa<det><dem><f><obj>$^.<sent>/.<sent>$
 Translation Output: *t’*u #dar #olla #esa

10.

 Original sentence: Xí piréskani para María.
 Intended Translation: Yo cante para Maria (I sang for Maria).
 Biltrans Output: ^Xí<prn><sg>/Yo<prn><tn><p1><mf><sg><sg>$ ^piré<v><tv><pres><perf><p1><sg>/cantar<vblex><pres><perf><p1><sg>$ ^para<det>/para<pr>$ ^María<np><sg>/Maria<np><ant><sg>$^.<sent>/.<sent>$
 Translation Output: #Yo #cantar para #Maria

11.

 Original sentence: Acháati wántikusti tsírini.
 Intended Translation: El hombre mató la pulga (The man killed the flea).
 Biltrans Output: ^acháati<n><sg>/hombre<n><m><sg>$ ^wántiku<v><tv><pres><perf><p3><sg>/matar<vblex><pres><perf><p3><sg>$ ^tsíri<n><sg><obj>/maiz<n><m><sg><obj>/pulga<n><f><sg><obj>$^.<sent>/.<sent>$
 Translation Output: hombre #matar #pulga

Additions

I added 100 additional stems from the bilingual dictionary that we acquired from Professor Washington. The stems were mainly nouns and verbs as those were the more understandable forms but also included a few adjectives. Additionally around 4 disambiguation rules were added and 2 lexical selection rules.


  • Precision and Recall
    • annotated tsz.sentences.txt files and included in ling073-tsz-eng repo, but the precision recall test is not working
  • Coverage over large corpus
    • ~0.32400121335403172564
    • Tokens in corpus: 129184
    • Stems in Transducer: 210

MT tsz → spa

  • Word error rate (WER): 99.07 %
  • Position-independent word error rate (PER): 98.31 %
  • Percentage of unknown words: 0%

MT spa → tsz

  • Word error rate (WER): 118.50 %
  • Position-independent word error rate (PER): 117.59 %
  • Percentage of unknown words: 0%

Trimmed coverage

  • coverage: ~0.32400121335403172564