Difference between revisions of "Berik and English"
(Created page with "Resources for machine translation between Berik and English. Category:Sp17_TranslationPairs") |
|||
(10 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Resources for machine translation between [[Berik]] and English. | Resources for machine translation between [[Berik]] and English. | ||
− | [[Category: | + | = bkl -> eng evaluation = |
+ | |||
+ | Statistics about input files | ||
+ | ------------------------------------------------------- | ||
+ | Number of words in reference: 55 | ||
+ | |||
+ | Number of words in test: 40 | ||
+ | |||
+ | Number of unknown words (marked with a star) in test: 15 | ||
+ | |||
+ | Percentage of unknown words: 37.50 % | ||
+ | |||
+ | ------------------------------------------------------- | ||
+ | Edit distance: 54 | ||
+ | |||
+ | Word error rate (WER): 98.18 % | ||
+ | |||
+ | Number of position-independent correct words: 1 | ||
+ | |||
+ | Position-independent word error rate (PER): 98.18 % | ||
+ | |||
+ | Results when unknown-word marks (stars) are not removed | ||
+ | ------------------------------------------------------- | ||
+ | Edit distance: 55 | ||
+ | |||
+ | Word Error Rate (WER): 100.00 % | ||
+ | |||
+ | Number of position-independent correct words: 0 | ||
+ | |||
+ | Position-independent word error rate (PER): 100.00 % | ||
+ | |||
+ | Statistics about the translation of unknown words | ||
+ | ------------------------------------------------------- | ||
+ | Number of unknown words which were free rides: 1 | ||
+ | |||
+ | Percentage of unknown words that were free rides: 6.67 % | ||
+ | |||
+ | = Final Evaluation = | ||
+ | === Initial Precision & Recall === | ||
+ | * Precision: 100.00000% | ||
+ | * Recall: 78.03738% | ||
+ | |||
+ | === Adding Stems === | ||
+ | |||
+ | 136 stems were added. | ||
+ | |||
+ | === Disambiguation === | ||
+ | |||
+ | 4 new disam rules were added to deal with "ane" which can be "and" or "many". | ||
+ | |||
+ | === Structural Transfer === | ||
+ | * Added articles in copular phrases. | ||
+ | ** "Ai taneyan." | ||
+ | ** "I am not child." -> "I am not a child." | ||
+ | * Added rules for positive copular phrases. | ||
+ | ** "Je bwernabar namwer." | ||
+ | ** "He sick now." -> "He is sick now." | ||
+ | * Added verbs and tense marking. | ||
+ | ** "Gwirmir wini as damtafa." | ||
+ | ** "Tomorrow woman #prpers #see." -> "Tomorrow woman #prpers will see." | ||
+ | * Added prepositions for instrumental case. | ||
+ | ** "Je twena ginem tana." | ||
+ | ** "He pig #arrow #kill." -> "He pig killed with an arrow." | ||
+ | * Corrected word order of instransitive clauses with instrumentals. | ||
+ | ** "Korano atem difnant." | ||
+ | ** "chief #canoe #come." -> "chief came with a canoe." | ||
+ | |||
+ | === Final Numbers === | ||
+ | * Precision: 65.88785% | ||
+ | * Recall: 88.67925% | ||
+ | * Large Corpus | ||
+ | ** Word count: 365010 | ||
+ | ** Coverage: 55.34% | ||
+ | * Stems in transducer: 400 | ||
+ | |||
+ | |||
+ | [[Category:Sp18_TranslationPairs]] | ||
+ | [[Category:Berik]] |
Latest revision as of 12:46, 8 March 2019
Resources for machine translation between Berik and English.
Contents
bkl -> eng evaluation
Statistics about input files
Number of words in reference: 55
Number of words in test: 40
Number of unknown words (marked with a star) in test: 15
Percentage of unknown words: 37.50 %
Edit distance: 54
Word error rate (WER): 98.18 %
Number of position-independent correct words: 1
Position-independent word error rate (PER): 98.18 %
Results when unknown-word marks (stars) are not removed
Edit distance: 55
Word Error Rate (WER): 100.00 %
Number of position-independent correct words: 0
Position-independent word error rate (PER): 100.00 %
Statistics about the translation of unknown words
Number of unknown words which were free rides: 1
Percentage of unknown words that were free rides: 6.67 %
Final Evaluation
Initial Precision & Recall
- Precision: 100.00000%
- Recall: 78.03738%
Adding Stems
136 stems were added.
Disambiguation
4 new disam rules were added to deal with "ane" which can be "and" or "many".
Structural Transfer
- Added articles in copular phrases.
- "Ai taneyan."
- "I am not child." -> "I am not a child."
- Added rules for positive copular phrases.
- "Je bwernabar namwer."
- "He sick now." -> "He is sick now."
- Added verbs and tense marking.
- "Gwirmir wini as damtafa."
- "Tomorrow woman #prpers #see." -> "Tomorrow woman #prpers will see."
- Added prepositions for instrumental case.
- "Je twena ginem tana."
- "He pig #arrow #kill." -> "He pig killed with an arrow."
- Corrected word order of instransitive clauses with instrumentals.
- "Korano atem difnant."
- "chief #canoe #come." -> "chief came with a canoe."
Final Numbers
- Precision: 65.88785%
- Recall: 88.67925%
- Large Corpus
- Word count: 365010
- Coverage: 55.34%
- Stems in transducer: 400