Kaingang and Portuguese/Structural Transfer

From LING073
Revision as of 19:53, 16 April 2019 by Drosset1 (talk | contribs)

Jump to: navigation, search

Pre-evaluation

Statistics about input files
-------------------------------------------------------
Number of words in reference: 63
Number of words in test: 63 
Number of unknown words (marked with a star) in test: 26
Percentage of unknown words: 41.27 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 38
Word error rate (WER): 60.32 %
Number of position-independent correct words: 25
Position-independent word error rate (PER): 60.32 %

Results when unknown-word marks (stars) are not removed
-------------------------------------------------------
Edit distance: 63
Word Error Rate (WER): 100.00 %
Number of position-independent correct words: 0
Position-independent word error rate (PER): 100.00 % 
Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 25
Percentage of unknown words that were free rides: 96.15 %

  • WER: 60.32%
  • PER: 60.32%
  • Coverage:
$ aq-covtest ling073-kgp-por-corpus/kgp.tests.txt ling073-kgp-por/kgp-por.automorf.bin
Number of tokenised words in the corpus: 82
Coverage: 64.63%
Top unknown words in the corpus:
3	 vỹ
2	 fi
1	 kafã
1	 Nũgnũj
1	 kur
1	 ẽgno
1	 tũg
1	 São
1	 Pau
1	 o
1	 rã
1	 jur
1	 tá
1	 Téj
1	 ki
1	 panh
1	 kãfór
1	 kyrũ
1	 jãmré
1	 ũn
Translation time: 0.0028295516967773438 seconds

Examples for implementation

Sentence:

Tagger output:

Biltrans output:

Chunker output:

Interchunk output:

Postchunk output:

kgp-por output:

Post-evaluation

  • WER:
  • PER:
  • Coverage: