Fijian and English
Resources for machine translation between Fijian and English
Contents
fij → eng evaluation
Current WER and PER:
Test file: 'fij-eng.tests.txt' Reference file 'eng.tests.txt' Statistics about input files ------------------------------------------------------- Number of words in reference: 56 Number of words in test: 59 Number of unknown words (marked with a star) in test: Percentage of unknown words: 0.00 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 50 Word error rate (WER): 89.29 % Number of position-independent correct words: 12 Position-independent word error rate (PER): 83.93 % Results when unknown-word marks (stars) are not removed ------------------------------------------------------- Edit distance: 50 Word Error Rate (WER): 89.29 % Number of position-independent correct words: 12 Position-independent word error rate (PER): 83.93 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0%
eng → fij evaluation
Current WER and PER :
Test file: 'eng-fij.tests.txt' Reference file 'fij.tests.txt' Statistics about input files ------------------------------------------------------- Number of words in reference: 62 Number of words in test: 56 Number of unknown words (marked with a star) in test: 2 Percentage of unknown words: 3.57 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 52 Word error rate (WER): 83.87 % Number of position-independent correct words: 13 Position-independent word error rate (PER): 79.03 % Results when unknown-word marks (stars) are not removed ------------------------------------------------------- Edit distance: 52 Word Error Rate (WER): 83.87 % Number of position-independent correct words: 13 Position-independent word error rate (PER): 79.03 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0.00 %
Lexical Selection
https://wikis.swarthmore.edu/ling073/Fijian_and_English/Lexical_selection#fij_.E2.86.92_eng
eng → fij one-to-many mapping
- Case 1: Pelu and lo’i describe two different kinds of bending action.
(eng) bend → (fij) pelu (e.g. bend of metal)
(eng) bend → (fij) lo’i (e.g. bend at a joint)
- Case 2:
(eng) shine on → (fij) cina (light/torch shines on)
(eng) shine on → (fij) cila (sun/moon/star shines on)
- (a disambiguation problem) The third person singular pronoun in English does not distinguish between nominative and accusative case.
(eng) it → (fij) e (subj)
(eng) it → (fij) koya (obj)
fij → eng one-to-many mapping
- Case 1.1:
(fij) yava → (eng) leg
(fij) yava → (eng) foot
- Case 1.2:
(fij) liga → (eng) arm
(fij) liga → (eng) hand
- Case 1.3:
(fij) mata → (eng) face
(fij) mata → (eng) eye
- Case 2:
(fij) vula → (eng) moon
(fij) vula → (eng) month
- Case 3:
(fij) basu → (eng) tear up (e.g. old clothes)
(fij) basu → (eng) tear down (e.g. old buildings)
- Case 4:Fijian does not distinguish genders on pronouns.
(fij) koya → (eng) him
(fij) koya → (eng) her
(fij) koya → (eng) it
- Case 5: (a disambiguation problem?)
The word levu can be used either as an adjective meaning "big", or a number meaning "many, much", but both numbers and adjectives can be a predicate head (like a verb).
(fij) levu → (eng) big (adj)
(fij) levu → (eng) a lot of (num)
Additions
- 100 more stems in the Fijian transducer and the bilingual dictionary (finished adding 100 stems to the transducer; continuing adding them to the bilingual dictionary)
- Morphologies:
- -Causative prefix vaka-
- -Collective prefix vei-
Problems with adding prefixes in the transducer: the form "vakataro" can be correctly analyzed as <vblex><caus>, but the plain form "taro" ('ask') gets two analyses: <vblex><iv> (the correct one) and <vblex><caus>. (Same problem with "vei-".) Besides, the prefix "vaka-" is not always a causative prefix. In fact, attaching to the verb "taro" ('ask'), "vala-" only changes the meaning to 'ask many people' or 'ask many times'.
- Lexical Selection rules:
- -Case 1:
(fij) bale → (eng) fall (fall from a position of standing)
(fij) bale → (eng) die
(fij) bale → (eng) mean
- -Case 2: Select 'arrive' as the translation of yaco when the subject NP following the verb is something that can move around; select 'happen' as the translation for yaco when the subject NP is inanimate.
(fij) yaco → (eng) arrive
(fij) yaco → (eng) happen
- Disambiguation rules:
- -Several verbs in Fijian can be used as post-head adverbs, such as oti ('already' or 'finish').
- Rules: If it follows a verb or an object pronoun, then choose <adv>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.
- -The word soqo can be either a verb, meaning 'gather', or a noun, meaning 'meeting'.
- Rules: If it follows an article, choose <n>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.
Final Evaluation
Fijian Transducer
- Precision and Recall:
- Coverage over the large corpus: 67.16%
- Number of words in the large corpus: 1092208
- Number of stems in the transducer: 249
MT
- WER:
- PER:
Contrastive Grammar
https://wikis.swarthmore.edu/ling073/Fijian_and_English/Contrastive_Grammar
Developed Resources for Machine Translation
https://github.swarthmore.edu/hwang11/ling073-fij-eng
https://github.swarthmore.edu/hwang11/ling073-fij-eng-corpus