Difference between revisions of "Berik and English"

From LING073
Jump to: navigation, search
 
(6 intermediate revisions by one other user not shown)
Line 45: Line 45:
 
=== Adding Stems ===
 
=== Adding Stems ===
  
=== (other improvements) ===
+
136 stems were added.
 +
 
 +
=== Disambiguation ===
 +
 
 +
4 new disam rules were added to deal with "ane" which can be "and" or "many".
  
 
=== Structural Transfer ===
 
=== Structural Transfer ===
 +
* Added articles in copular phrases.
 +
** "Ai taneyan."
 +
** "I am not child." -> "I am not a child."
 +
* Added rules for positive copular phrases.
 +
** "Je bwernabar namwer."
 +
** "He sick now." -> "He is sick now."
 +
* Added verbs and tense marking.
 +
** "Gwirmir wini as damtafa."
 +
** "Tomorrow woman #prpers #see." -> "Tomorrow woman #prpers will see."
 +
* Added prepositions for instrumental case.
 +
** "Je twena ginem tana."
 +
** "He pig #arrow #kill." -> "He pig killed with an arrow."
 +
* Corrected word order of instransitive clauses with instrumentals.
 +
** "Korano atem difnant."
 +
** "chief #canoe #come." -> "chief came with a canoe."
  
 
=== Final Numbers ===
 
=== Final Numbers ===
Will go here later!
+
* Precision: 65.88785%
 +
* Recall: 88.67925%
 +
* Large Corpus
 +
** Word count: 365010
 +
** Coverage: 55.34%
 +
* Stems in transducer: 400
 +
 
  
[[Category:Sp17_TranslationPairs]]
+
[[Category:Sp18_TranslationPairs]]
 +
[[Category:Berik]]

Latest revision as of 12:46, 8 March 2019

Resources for machine translation between Berik and English.

bkl -> eng evaluation

Statistics about input files


Number of words in reference: 55

Number of words in test: 40

Number of unknown words (marked with a star) in test: 15

Percentage of unknown words: 37.50 %


Edit distance: 54

Word error rate (WER): 98.18 %

Number of position-independent correct words: 1

Position-independent word error rate (PER): 98.18 %

Results when unknown-word marks (stars) are not removed


Edit distance: 55

Word Error Rate (WER): 100.00 %

Number of position-independent correct words: 0

Position-independent word error rate (PER): 100.00 %

Statistics about the translation of unknown words


Number of unknown words which were free rides: 1

Percentage of unknown words that were free rides: 6.67 %

Final Evaluation

Initial Precision & Recall

  • Precision: 100.00000%
  • Recall: 78.03738%

Adding Stems

136 stems were added.

Disambiguation

4 new disam rules were added to deal with "ane" which can be "and" or "many".

Structural Transfer

  • Added articles in copular phrases.
    • "Ai taneyan."
    • "I am not child." -> "I am not a child."
  • Added rules for positive copular phrases.
    • "Je bwernabar namwer."
    • "He sick now." -> "He is sick now."
  • Added verbs and tense marking.
    • "Gwirmir wini as damtafa."
    • "Tomorrow woman #prpers #see." -> "Tomorrow woman #prpers will see."
  • Added prepositions for instrumental case.
    • "Je twena ginem tana."
    • "He pig #arrow #kill." -> "He pig killed with an arrow."
  • Corrected word order of instransitive clauses with instrumentals.
    • "Korano atem difnant."
    • "chief #canoe #come." -> "chief came with a canoe."

Final Numbers

  • Precision: 65.88785%
  • Recall: 88.67925%
  • Large Corpus
    • Word count: 365010
    • Coverage: 55.34%
  • Stems in transducer: 400