Difference between revisions of "Fijian and English"

From LING073
Jump to: navigation, search
(eng → fij evaluation)
(Additions)
 
(33 intermediate revisions by the same user not shown)
Line 67: Line 67:
  
 
</pre>
 
</pre>
Unknown words in English: ''my'' and ''your'' in ''my eye''and ''your father''. (For Fijian bound nouns like 'eye' and 'father', possessive pronouns are suffixes attached to the noun stems instead of as separate words. How should the possessive pronouns in English be translated?)
 
  
 
==Lexical Selection==
 
==Lexical Selection==
 +
https://wikis.swarthmore.edu/ling073/Fijian_and_English/Lexical_selection#fij_.E2.86.92_eng
 
===eng → fij one-to-many mapping===
 
===eng → fij one-to-many mapping===
 
*Case 1: ''Pelu'' and ''lo’i'' describe two different kinds of bending action.
 
*Case 1: ''Pelu'' and ''lo’i'' describe two different kinds of bending action.
 +
 
{{transferTest|eng|fij|bend|pelu}} (e.g. bend of metal)
 
{{transferTest|eng|fij|bend|pelu}} (e.g. bend of metal)
  
 
{{transferTest|eng|fij|bend|lo’i}} (e.g. bend at a joint)
 
{{transferTest|eng|fij|bend|lo’i}} (e.g. bend at a joint)
  
*Case 2: The third person singular pronoun in English does not distinguish between nominative and accusative case.
+
*Case 2:  
 +
 
 +
{{transferTest|eng|fij|shine on|cina}} (light/torch shines on)
 +
 
 +
{{transferTest|eng|fij|shine on|cila}} (sun/moon/star shines on)
 +
 
 +
*(a disambiguation problem) The third person singular pronoun in English does not distinguish between nominative and accusative case.
 +
 
 
{{transferTest|eng|fij|it|e}} (subj)
 
{{transferTest|eng|fij|it|e}} (subj)
  
Line 82: Line 90:
  
 
===fij → eng one-to-many mapping===
 
===fij → eng one-to-many mapping===
*Case 1: The word ''levu'' can be used either as an adjective meaning "big", or a verb meaning "be a lot".
+
*Case 1.1:
{{transferTest|fij|eng|levu|big}} (adj)
+
 
 +
{{transferTest|fij|eng|yava|leg}}
 +
 
 +
{{transferTest|fij|eng|yava|foot}}
 +
 
 +
*Case 1.2:
 +
 
 +
{{transferTest|fij|eng|liga|arm}}
 +
 
 +
{{transferTest|fij|eng|liga|hand}}
 +
 
 +
*Case 1.3:
 +
 
 +
{{transferTest|fij|eng|mata|face}}
 +
 
 +
{{transferTest|fij|eng|mata|eye}}
 +
 
 +
*Case 2:
 +
 
 +
{{transferTest|fij|eng|vula|moon}}
 +
 
 +
{{transferTest|fij|eng|vula|month}}
 +
 
 +
*Case 3:
 +
 
 +
{{transferTest|fij|eng|basu|tear up}} (e.g. old clothes)
 +
 
 +
{{transferTest|fij|eng|basu|tear down}} (e.g. old buildings)
  
{{transferTest|fij|eng|levu|be a lot}} (v)
+
*Case 4:Fijian does not distinguish genders on pronouns.
  
*Case2: Fijian does not distinguish genders on pronouns.
 
 
{{transferTest|fij|eng|koya|him}}
 
{{transferTest|fij|eng|koya|him}}
  
Line 93: Line 127:
  
 
{{transferTest|fij|eng|koya|it}}
 
{{transferTest|fij|eng|koya|it}}
 +
 +
*Case 5: (a disambiguation problem?)
 +
The word ''levu'' can be used either as an adjective meaning "big", or a number meaning "many, much", but both numbers and adjectives can be a predicate head (like a verb).
 +
 +
{{transferTest|fij|eng|levu|big}} (adj)
 +
 +
{{transferTest|fij|eng|levu|a lot of}} (num)
 +
 +
==Additions==
 +
*126 more stems in the Fijian transducer and the bilingual dictionary (finished adding all stems to the transducer; continuing adding them to the bilingual dictionary)
 +
*Morphologies:
 +
:-Causative prefix ''vaka-''
 +
:-Collective prefix ''vei-''
 +
:-Deriving verb from noun: prefix "i-"
 +
*Lexical Selection rules:
 +
:-Case 1: Select 'arrive' as the translation of ''yaco'' when the subject NP following the verb is something that can move around; select 'happen' as the translation for ''yaco'' when the subject NP is inanimate.
 +
 +
{{transferTest|fij|eng|yaco|arrive}}
 +
 +
{{transferTest|fij|eng|yaco|happen}}
 +
 +
*Disambiguation rules:
 +
 +
:-Several verbs in Fijian can be used as post-head adverbs, such as ''oti'' ('already' or 'finish').
 +
:*Rules: If it follows a verb or an object pronoun, then choose <adv>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.
 +
 +
:-The word ''soqo'' can be either a verb, meaning 'gather', or a noun, meaning 'meeting'.
 +
:*Rules: If it follows an article, choose <n>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.
 +
 +
==Final Evaluation==
 +
===Fijian Transducer===
 +
*Precision: 92.19219%
 +
*Recall: 55.11670%
 +
*Coverage over the large corpus: 70.06%
 +
*Number of words in the large corpus: 1099762
 +
*Number of stems in the transducer: 280
 +
 +
===MT===
 +
====fij → eng====
 +
*WER:132.31%
 +
*PER:123.26%
 +
*Proportion of correctly translated stems: 16.7%
 +
*Trimmed coverage over <code>fij.longer.text</code>:45.49%
 +
*Trimmed coverage over <code>fij.corpus.large</code>:33.5%
 +
*Number of tokens in <code>longer</code> corpus:1076
 +
====eng → fij====
 +
*WER:92.94%
 +
*PER:85.32%
 +
*Proportion of correctly translated stems:14.68%
 +
*Trimmed coverage of <code>eng.longer.txt</code>:47.67%
 +
*Number of tokens in <code>longer</code> corpus:718.
  
 
==Contrastive Grammar==
 
==Contrastive Grammar==

Latest revision as of 13:42, 5 May 2018

Resources for machine translation between Fijian and English

fij → eng evaluation

Current WER and PER:

Test file: 'fij-eng.tests.txt'
Reference file 'eng.tests.txt'

Statistics about input files
-------------------------------------------------------
Number of words in reference: 56
Number of words in test: 59
Number of unknown words (marked with a star) in test:
Percentage of unknown words: 0.00 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 50
Word error rate (WER): 89.29 %
Number of position-independent correct words: 12
Position-independent word error rate (PER): 83.93 %

Results when unknown-word marks (stars) are not removed
-------------------------------------------------------
Edit distance: 50
Word Error Rate (WER): 89.29 %
Number of position-independent correct words: 12
Position-independent word error rate (PER): 83.93 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 0
Percentage of unknown words that were free rides: 0%

eng → fij evaluation

Current WER and PER :

Test file: 'eng-fij.tests.txt'
Reference file 'fij.tests.txt'

Statistics about input files
-------------------------------------------------------
Number of words in reference: 62
Number of words in test: 56
Number of unknown words (marked with a star) in test: 2
Percentage of unknown words: 3.57 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 52
Word error rate (WER): 83.87 %
Number of position-independent correct words: 13
Position-independent word error rate (PER): 79.03 %

Results when unknown-word marks (stars) are not removed
-------------------------------------------------------
Edit distance: 52
Word Error Rate (WER): 83.87 %
Number of position-independent correct words: 13
Position-independent word error rate (PER): 79.03 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 0
Percentage of unknown words that were free rides: 0.00 %

Lexical Selection

https://wikis.swarthmore.edu/ling073/Fijian_and_English/Lexical_selection#fij_.E2.86.92_eng

eng → fij one-to-many mapping

  • Case 1: Pelu and lo’i describe two different kinds of bending action.

(eng) bend → (fij) pelu (e.g. bend of metal)

(eng) bend → (fij) lo’i (e.g. bend at a joint)

  • Case 2:

(eng) shine on → (fij) cina (light/torch shines on)

(eng) shine on → (fij) cila (sun/moon/star shines on)

  • (a disambiguation problem) The third person singular pronoun in English does not distinguish between nominative and accusative case.

(eng) it → (fij) e (subj)

(eng) it → (fij) koya (obj)

fij → eng one-to-many mapping

  • Case 1.1:

(fij) yava → (eng) leg

(fij) yava → (eng) foot

  • Case 1.2:

(fij) liga → (eng) arm

(fij) liga → (eng) hand

  • Case 1.3:

(fij) mata → (eng) face

(fij) mata → (eng) eye

  • Case 2:

(fij) vula → (eng) moon

(fij) vula → (eng) month

  • Case 3:

(fij) basu → (eng) tear up (e.g. old clothes)

(fij) basu → (eng) tear down (e.g. old buildings)

  • Case 4:Fijian does not distinguish genders on pronouns.

(fij) koya → (eng) him

(fij) koya → (eng) her

(fij) koya → (eng) it

  • Case 5: (a disambiguation problem?)

The word levu can be used either as an adjective meaning "big", or a number meaning "many, much", but both numbers and adjectives can be a predicate head (like a verb).

(fij) levu → (eng) big (adj)

(fij) levu → (eng) a lot of (num)

Additions

  • 126 more stems in the Fijian transducer and the bilingual dictionary (finished adding all stems to the transducer; continuing adding them to the bilingual dictionary)
  • Morphologies:
-Causative prefix vaka-
-Collective prefix vei-
-Deriving verb from noun: prefix "i-"
  • Lexical Selection rules:
-Case 1: Select 'arrive' as the translation of yaco when the subject NP following the verb is something that can move around; select 'happen' as the translation for yaco when the subject NP is inanimate.

(fij) yaco → (eng) arrive

(fij) yaco → (eng) happen

  • Disambiguation rules:
-Several verbs in Fijian can be used as post-head adverbs, such as oti ('already' or 'finish').
  • Rules: If it follows a verb or an object pronoun, then choose <adv>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.
-The word soqo can be either a verb, meaning 'gather', or a noun, meaning 'meeting'.
  • Rules: If it follows an article, choose <n>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.

Final Evaluation

Fijian Transducer

  • Precision: 92.19219%
  • Recall: 55.11670%
  • Coverage over the large corpus: 70.06%
  • Number of words in the large corpus: 1099762
  • Number of stems in the transducer: 280

MT

fij → eng

  • WER:132.31%
  • PER:123.26%
  • Proportion of correctly translated stems: 16.7%
  • Trimmed coverage over fij.longer.text:45.49%
  • Trimmed coverage over fij.corpus.large:33.5%
  • Number of tokens in longer corpus:1076

eng → fij

  • WER:92.94%
  • PER:85.32%
  • Proportion of correctly translated stems:14.68%
  • Trimmed coverage of eng.longer.txt:47.67%
  • Number of tokens in longer corpus:718.

Contrastive Grammar

https://wikis.swarthmore.edu/ling073/Fijian_and_English/Contrastive_Grammar

Developed Resources for Machine Translation

https://github.swarthmore.edu/hwang11/ling073-fij-eng

https://github.swarthmore.edu/hwang11/ling073-fij-eng-corpus