Difference between revisions of "Fijian and English"

From LING073
Jump to: navigation, search
(fij → eng evaluation)
(Additions)
 
(45 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Fijian Fijian] and English
 
Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Fijian Fijian] and English
 
==fij → eng evaluation==
 
==fij → eng evaluation==
 +
Current WER and PER:
 
<pre>
 
<pre>
 
Test file: 'fij-eng.tests.txt'
 
Test file: 'fij-eng.tests.txt'
Line 7: Line 8:
 
Statistics about input files
 
Statistics about input files
 
-------------------------------------------------------
 
-------------------------------------------------------
Number of words in reference: 57
+
Number of words in reference: 56
 
Number of words in test: 59
 
Number of words in test: 59
Number of unknown words (marked with a star) in test: 18
+
Number of unknown words (marked with a star) in test:
Percentage of unknown words: 30.51 %
+
Percentage of unknown words: 0.00 %
  
 
Results when removing unknown-word marks (stars)
 
Results when removing unknown-word marks (stars)
 
-------------------------------------------------------
 
-------------------------------------------------------
Edit distance: 51
+
Edit distance: 50
Word error rate (WER): 89.47 %
+
Word error rate (WER): 89.29 %
 
Number of position-independent correct words: 12
 
Number of position-independent correct words: 12
Position-independent word error rate (PER): 82.46 %
+
Position-independent word error rate (PER): 83.93 %
  
 
Results when unknown-word marks (stars) are not removed
 
Results when unknown-word marks (stars) are not removed
 
-------------------------------------------------------
 
-------------------------------------------------------
Edit distance: 51
+
Edit distance: 50
Word Error Rate (WER): 89.47 %
+
Word Error Rate (WER): 89.29 %
 
Number of position-independent correct words: 12
 
Number of position-independent correct words: 12
Position-independent word error rate (PER): 82.46 %
+
Position-independent word error rate (PER): 83.93 %
  
 
Statistics about the translation of unknown words
 
Statistics about the translation of unknown words
 
-------------------------------------------------------
 
-------------------------------------------------------
 
Number of unknown words which were free rides: 0
 
Number of unknown words which were free rides: 0
Percentage of unknown words that were free rides: 0.00 %
+
Percentage of unknown words that were free rides: 0%
 +
 
 
</pre>
 
</pre>
  
 
==eng → fij evaluation==
 
==eng → fij evaluation==
 +
Current WER and PER :
 +
<pre>
 +
Test file: 'eng-fij.tests.txt'
 +
Reference file 'fij.tests.txt'
 +
 +
Statistics about input files
 +
-------------------------------------------------------
 +
Number of words in reference: 62
 +
Number of words in test: 56
 +
Number of unknown words (marked with a star) in test: 2
 +
Percentage of unknown words: 3.57 %
 +
 +
Results when removing unknown-word marks (stars)
 +
-------------------------------------------------------
 +
Edit distance: 52
 +
Word error rate (WER): 83.87 %
 +
Number of position-independent correct words: 13
 +
Position-independent word error rate (PER): 79.03 %
 +
 +
Results when unknown-word marks (stars) are not removed
 +
-------------------------------------------------------
 +
Edit distance: 52
 +
Word Error Rate (WER): 83.87 %
 +
Number of position-independent correct words: 13
 +
Position-independent word error rate (PER): 79.03 %
 +
 +
Statistics about the translation of unknown words
 +
-------------------------------------------------------
 +
Number of unknown words which were free rides: 0
 +
Percentage of unknown words that were free rides: 0.00 %
 +
 +
</pre>
  
 
==Lexical Selection==
 
==Lexical Selection==
 +
https://wikis.swarthmore.edu/ling073/Fijian_and_English/Lexical_selection#fij_.E2.86.92_eng
 
===eng → fij one-to-many mapping===
 
===eng → fij one-to-many mapping===
 
*Case 1: ''Pelu'' and ''lo’i'' describe two different kinds of bending action.
 
*Case 1: ''Pelu'' and ''lo’i'' describe two different kinds of bending action.
 +
 
{{transferTest|eng|fij|bend|pelu}} (e.g. bend of metal)
 
{{transferTest|eng|fij|bend|pelu}} (e.g. bend of metal)
  
 
{{transferTest|eng|fij|bend|lo’i}} (e.g. bend at a joint)
 
{{transferTest|eng|fij|bend|lo’i}} (e.g. bend at a joint)
  
*Case 2: The third person singular pronoun in English does not distinguish between nominative and accusative case.
+
*Case 2:  
 +
 
 +
{{transferTest|eng|fij|shine on|cina}} (light/torch shines on)
 +
 
 +
{{transferTest|eng|fij|shine on|cila}} (sun/moon/star shines on)
 +
 
 +
*(a disambiguation problem) The third person singular pronoun in English does not distinguish between nominative and accusative case.
 +
 
 
{{transferTest|eng|fij|it|e}} (subj)
 
{{transferTest|eng|fij|it|e}} (subj)
  
Line 47: Line 90:
  
 
===fij → eng one-to-many mapping===
 
===fij → eng one-to-many mapping===
*Case 1: The word ''levu'' can be used either as an adjective meaning "big", or a verb meaning "be a lot".
+
*Case 1.1:
{{transferTest|fij|eng|levu|big}} (adj)
+
 
 +
{{transferTest|fij|eng|yava|leg}}
 +
 
 +
{{transferTest|fij|eng|yava|foot}}
 +
 
 +
*Case 1.2:
 +
 
 +
{{transferTest|fij|eng|liga|arm}}
 +
 
 +
{{transferTest|fij|eng|liga|hand}}
 +
 
 +
*Case 1.3:
 +
 
 +
{{transferTest|fij|eng|mata|face}}
 +
 
 +
{{transferTest|fij|eng|mata|eye}}
 +
 
 +
*Case 2:
 +
 
 +
{{transferTest|fij|eng|vula|moon}}
 +
 
 +
{{transferTest|fij|eng|vula|month}}
 +
 
 +
*Case 3:
 +
 
 +
{{transferTest|fij|eng|basu|tear up}} (e.g. old clothes)
 +
 
 +
{{transferTest|fij|eng|basu|tear down}} (e.g. old buildings)
  
{{transferTest|fij|eng|levu|be a lot}} (v)
+
*Case 4:Fijian does not distinguish genders on pronouns.
  
*Case2: Fijian does not distinguish genders on pronouns.
 
 
{{transferTest|fij|eng|koya|him}}
 
{{transferTest|fij|eng|koya|him}}
  
Line 58: Line 127:
  
 
{{transferTest|fij|eng|koya|it}}
 
{{transferTest|fij|eng|koya|it}}
 +
 +
*Case 5: (a disambiguation problem?)
 +
The word ''levu'' can be used either as an adjective meaning "big", or a number meaning "many, much", but both numbers and adjectives can be a predicate head (like a verb).
 +
 +
{{transferTest|fij|eng|levu|big}} (adj)
 +
 +
{{transferTest|fij|eng|levu|a lot of}} (num)
 +
 +
==Additions==
 +
*126 more stems in the Fijian transducer and the bilingual dictionary (finished adding all stems to the transducer; continuing adding them to the bilingual dictionary)
 +
*Morphologies:
 +
:-Causative prefix ''vaka-''
 +
:-Collective prefix ''vei-''
 +
:-Deriving verb from noun: prefix "i-"
 +
*Lexical Selection rules:
 +
:-Case 1: Select 'arrive' as the translation of ''yaco'' when the subject NP following the verb is something that can move around; select 'happen' as the translation for ''yaco'' when the subject NP is inanimate.
 +
 +
{{transferTest|fij|eng|yaco|arrive}}
 +
 +
{{transferTest|fij|eng|yaco|happen}}
 +
 +
*Disambiguation rules:
 +
 +
:-Several verbs in Fijian can be used as post-head adverbs, such as ''oti'' ('already' or 'finish').
 +
:*Rules: If it follows a verb or an object pronoun, then choose <adv>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.
 +
 +
:-The word ''soqo'' can be either a verb, meaning 'gather', or a noun, meaning 'meeting'.
 +
:*Rules: If it follows an article, choose <n>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.
 +
 +
==Final Evaluation==
 +
===Fijian Transducer===
 +
*Precision: 92.19219%
 +
*Recall: 55.11670%
 +
*Coverage over the large corpus: 70.06%
 +
*Number of words in the large corpus: 1099762
 +
*Number of stems in the transducer: 280
 +
 +
===MT===
 +
====fij → eng====
 +
*WER:132.31%
 +
*PER:123.26%
 +
*Proportion of correctly translated stems: 16.7%
 +
*Trimmed coverage over <code>fij.longer.text</code>:45.49%
 +
*Trimmed coverage over <code>fij.corpus.large</code>:33.5%
 +
*Number of tokens in <code>longer</code> corpus:1076
 +
====eng → fij====
 +
*WER:92.94%
 +
*PER:85.32%
 +
*Proportion of correctly translated stems:14.68%
 +
*Trimmed coverage of <code>eng.longer.txt</code>:47.67%
 +
*Number of tokens in <code>longer</code> corpus:718.
  
 
==Contrastive Grammar==
 
==Contrastive Grammar==
Line 65: Line 185:
 
https://github.swarthmore.edu/hwang11/ling073-fij-eng  
 
https://github.swarthmore.edu/hwang11/ling073-fij-eng  
  
 +
https://github.swarthmore.edu/hwang11/ling073-fij-eng-corpus
  
  

Latest revision as of 13:42, 5 May 2018

Resources for machine translation between Fijian and English

fij → eng evaluation

Current WER and PER:

Test file: 'fij-eng.tests.txt'
Reference file 'eng.tests.txt'

Statistics about input files
-------------------------------------------------------
Number of words in reference: 56
Number of words in test: 59
Number of unknown words (marked with a star) in test:
Percentage of unknown words: 0.00 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 50
Word error rate (WER): 89.29 %
Number of position-independent correct words: 12
Position-independent word error rate (PER): 83.93 %

Results when unknown-word marks (stars) are not removed
-------------------------------------------------------
Edit distance: 50
Word Error Rate (WER): 89.29 %
Number of position-independent correct words: 12
Position-independent word error rate (PER): 83.93 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 0
Percentage of unknown words that were free rides: 0%

eng → fij evaluation

Current WER and PER :

Test file: 'eng-fij.tests.txt'
Reference file 'fij.tests.txt'

Statistics about input files
-------------------------------------------------------
Number of words in reference: 62
Number of words in test: 56
Number of unknown words (marked with a star) in test: 2
Percentage of unknown words: 3.57 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 52
Word error rate (WER): 83.87 %
Number of position-independent correct words: 13
Position-independent word error rate (PER): 79.03 %

Results when unknown-word marks (stars) are not removed
-------------------------------------------------------
Edit distance: 52
Word Error Rate (WER): 83.87 %
Number of position-independent correct words: 13
Position-independent word error rate (PER): 79.03 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 0
Percentage of unknown words that were free rides: 0.00 %

Lexical Selection

https://wikis.swarthmore.edu/ling073/Fijian_and_English/Lexical_selection#fij_.E2.86.92_eng

eng → fij one-to-many mapping

  • Case 1: Pelu and lo’i describe two different kinds of bending action.

(eng) bend → (fij) pelu (e.g. bend of metal)

(eng) bend → (fij) lo’i (e.g. bend at a joint)

  • Case 2:

(eng) shine on → (fij) cina (light/torch shines on)

(eng) shine on → (fij) cila (sun/moon/star shines on)

  • (a disambiguation problem) The third person singular pronoun in English does not distinguish between nominative and accusative case.

(eng) it → (fij) e (subj)

(eng) it → (fij) koya (obj)

fij → eng one-to-many mapping

  • Case 1.1:

(fij) yava → (eng) leg

(fij) yava → (eng) foot

  • Case 1.2:

(fij) liga → (eng) arm

(fij) liga → (eng) hand

  • Case 1.3:

(fij) mata → (eng) face

(fij) mata → (eng) eye

  • Case 2:

(fij) vula → (eng) moon

(fij) vula → (eng) month

  • Case 3:

(fij) basu → (eng) tear up (e.g. old clothes)

(fij) basu → (eng) tear down (e.g. old buildings)

  • Case 4:Fijian does not distinguish genders on pronouns.

(fij) koya → (eng) him

(fij) koya → (eng) her

(fij) koya → (eng) it

  • Case 5: (a disambiguation problem?)

The word levu can be used either as an adjective meaning "big", or a number meaning "many, much", but both numbers and adjectives can be a predicate head (like a verb).

(fij) levu → (eng) big (adj)

(fij) levu → (eng) a lot of (num)

Additions

  • 126 more stems in the Fijian transducer and the bilingual dictionary (finished adding all stems to the transducer; continuing adding them to the bilingual dictionary)
  • Morphologies:
-Causative prefix vaka-
-Collective prefix vei-
-Deriving verb from noun: prefix "i-"
  • Lexical Selection rules:
-Case 1: Select 'arrive' as the translation of yaco when the subject NP following the verb is something that can move around; select 'happen' as the translation for yaco when the subject NP is inanimate.

(fij) yaco → (eng) arrive

(fij) yaco → (eng) happen

  • Disambiguation rules:
-Several verbs in Fijian can be used as post-head adverbs, such as oti ('already' or 'finish').
  • Rules: If it follows a verb or an object pronoun, then choose <adv>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.
-The word soqo can be either a verb, meaning 'gather', or a noun, meaning 'meeting'.
  • Rules: If it follows an article, choose <n>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.

Final Evaluation

Fijian Transducer

  • Precision: 92.19219%
  • Recall: 55.11670%
  • Coverage over the large corpus: 70.06%
  • Number of words in the large corpus: 1099762
  • Number of stems in the transducer: 280

MT

fij → eng

  • WER:132.31%
  • PER:123.26%
  • Proportion of correctly translated stems: 16.7%
  • Trimmed coverage over fij.longer.text:45.49%
  • Trimmed coverage over fij.corpus.large:33.5%
  • Number of tokens in longer corpus:1076

eng → fij

  • WER:92.94%
  • PER:85.32%
  • Proportion of correctly translated stems:14.68%
  • Trimmed coverage of eng.longer.txt:47.67%
  • Number of tokens in longer corpus:718.

Contrastive Grammar

https://wikis.swarthmore.edu/ling073/Fijian_and_English/Contrastive_Grammar

Developed Resources for Machine Translation

https://github.swarthmore.edu/hwang11/ling073-fij-eng

https://github.swarthmore.edu/hwang11/ling073-fij-eng-corpus