Danish and English/Lexical Transfer
From LING073
Lexical Transfer
Contrastive Grammar
We ran apertium-eval-translator on our contrastive grammar. The results are as follows:
Test file: 'dan-eng.tests.txt' Reference file '../ling073-dan-eng-corpus/eng.tests.txt'
Statistics about input files ------------------------------------------------------- Number of words in reference: 58 Number of words in test: 58 Number of unknown words (marked with a star) in test: 58 Percentage of unknown words: 100.00 %
Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 0 Word error rate (WER): 0.00 % Number of position-independent correct words: 58 Position-independent word error rate (PER): 0.00 %
Results when unknown-word marks (stars) are not removed ------------------------------------------------------- Edit distance: 58 Word Error Rate (WER): 100.00 % Number of position-independent correct words: 0 Position-independent word error rate (PER): 100.00 %
Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 58 Percentage of unknown words that were free rides: 100.00 %
This really high WER is due to the fact that all of the words in our contrastive grammar translate with a leading pound sign (#). We we scrape these pound signs out of the file, our WER drops to a more reasonable 32.76%. We're still not sure why it's this high, but this mark is certainly better than 100%.
Test file: 'dan-eng.reformatted.tests.txt' Reference file '../ling073-dan-eng-corpus/eng.tests.txt'
Statistics about input files ------------------------------------------------------- Number of words in reference: 58 Number of words in test: 54 Number of unknown words (marked with a star) in test: Percentage of unknown words: 0.00 %
Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 19 Word error rate (WER): 32.76 % Number of position-independent correct words: 40 Position-independent word error rate (PER): 31.03 %
Results when unknown-word marks (stars) are not removed ------------------------------------------------------- Edit distance: 19 Word Error Rate (WER): 32.76 % Number of position-independent correct words: 40 Position-independent word error rate (PER): 31.03 %
Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0%
Mini-Corpus
We also ran apertium-eval-translator on a mini-corpus of Cinderella translated into Danish. For this test, we got a WER of 95%, meaning that at least a few words were translating completely correctly.
Test file: 'dan-eng.cinderella.txt' Reference file '../ling073-dan-eng-corpus/grimm/Cinderella/eng_Cinderella.txt'
Statistics about input files ------------------------------------------------------- Number of words in reference: 2695 Number of words in test: 2031 Number of unknown words (marked with a star) in test: 1279 Percentage of unknown words: 62.97 %
Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 2553 Word error rate (WER): 94.73 % Number of position-independent correct words: 282 Position-independent word error rate (PER): 89.54 %
Results when unknown-word marks (stars) are not removed ------------------------------------------------------- Edit distance: 2559 Word Error Rate (WER): 94.95 % Number of position-independent correct words: 234 Position-independent word error rate (PER): 91.32 %
Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 6 Percentage of unknown words that were free rides: 0.47 %