Difference between revisions of "Fijian and English"
(→eng → fij evaluation) |
(→Additions) |
||
(32 intermediate revisions by the same user not shown) | |||
Line 67: | Line 67: | ||
</pre> | </pre> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==Lexical Selection== | ==Lexical Selection== | ||
+ | https://wikis.swarthmore.edu/ling073/Fijian_and_English/Lexical_selection#fij_.E2.86.92_eng | ||
===eng → fij one-to-many mapping=== | ===eng → fij one-to-many mapping=== | ||
*Case 1: ''Pelu'' and ''lo’i'' describe two different kinds of bending action. | *Case 1: ''Pelu'' and ''lo’i'' describe two different kinds of bending action. | ||
+ | |||
{{transferTest|eng|fij|bend|pelu}} (e.g. bend of metal) | {{transferTest|eng|fij|bend|pelu}} (e.g. bend of metal) | ||
{{transferTest|eng|fij|bend|lo’i}} (e.g. bend at a joint) | {{transferTest|eng|fij|bend|lo’i}} (e.g. bend at a joint) | ||
− | *Case 2: The third person singular pronoun in English does not distinguish between nominative and accusative case. | + | *Case 2: |
+ | |||
+ | {{transferTest|eng|fij|shine on|cina}} (light/torch shines on) | ||
+ | |||
+ | {{transferTest|eng|fij|shine on|cila}} (sun/moon/star shines on) | ||
+ | |||
+ | *(a disambiguation problem) The third person singular pronoun in English does not distinguish between nominative and accusative case. | ||
+ | |||
{{transferTest|eng|fij|it|e}} (subj) | {{transferTest|eng|fij|it|e}} (subj) | ||
Line 87: | Line 90: | ||
===fij → eng one-to-many mapping=== | ===fij → eng one-to-many mapping=== | ||
− | *Case 1: | + | *Case 1.1: |
− | {{transferTest|fij|eng| | + | |
+ | {{transferTest|fij|eng|yava|leg}} | ||
+ | |||
+ | {{transferTest|fij|eng|yava|foot}} | ||
+ | |||
+ | *Case 1.2: | ||
+ | |||
+ | {{transferTest|fij|eng|liga|arm}} | ||
+ | |||
+ | {{transferTest|fij|eng|liga|hand}} | ||
+ | |||
+ | *Case 1.3: | ||
+ | |||
+ | {{transferTest|fij|eng|mata|face}} | ||
+ | |||
+ | {{transferTest|fij|eng|mata|eye}} | ||
+ | |||
+ | *Case 2: | ||
+ | |||
+ | {{transferTest|fij|eng|vula|moon}} | ||
+ | |||
+ | {{transferTest|fij|eng|vula|month}} | ||
+ | |||
+ | *Case 3: | ||
+ | |||
+ | {{transferTest|fij|eng|basu|tear up}} (e.g. old clothes) | ||
+ | |||
+ | {{transferTest|fij|eng|basu|tear down}} (e.g. old buildings) | ||
− | + | *Case 4:Fijian does not distinguish genders on pronouns. | |
− | |||
{{transferTest|fij|eng|koya|him}} | {{transferTest|fij|eng|koya|him}} | ||
Line 98: | Line 127: | ||
{{transferTest|fij|eng|koya|it}} | {{transferTest|fij|eng|koya|it}} | ||
+ | |||
+ | *Case 5: (a disambiguation problem?) | ||
+ | The word ''levu'' can be used either as an adjective meaning "big", or a number meaning "many, much", but both numbers and adjectives can be a predicate head (like a verb). | ||
+ | |||
+ | {{transferTest|fij|eng|levu|big}} (adj) | ||
+ | |||
+ | {{transferTest|fij|eng|levu|a lot of}} (num) | ||
+ | |||
+ | ==Additions== | ||
+ | *126 more stems in the Fijian transducer and the bilingual dictionary (finished adding all stems to the transducer; continuing adding them to the bilingual dictionary) | ||
+ | *Morphologies: | ||
+ | :-Causative prefix ''vaka-'' | ||
+ | :-Collective prefix ''vei-'' | ||
+ | :-Deriving verb from noun: prefix "i-" | ||
+ | *Lexical Selection rules: | ||
+ | :-Case 1: Select 'arrive' as the translation of ''yaco'' when the subject NP following the verb is something that can move around; select 'happen' as the translation for ''yaco'' when the subject NP is inanimate. | ||
+ | |||
+ | {{transferTest|fij|eng|yaco|arrive}} | ||
+ | |||
+ | {{transferTest|fij|eng|yaco|happen}} | ||
+ | |||
+ | *Disambiguation rules: | ||
+ | |||
+ | :-Several verbs in Fijian can be used as post-head adverbs, such as ''oti'' ('already' or 'finish'). | ||
+ | :*Rules: If it follows a verb or an object pronoun, then choose <adv>; if it follows an aspect/tense marker or a subject pronoun, choose <v>. | ||
+ | |||
+ | :-The word ''soqo'' can be either a verb, meaning 'gather', or a noun, meaning 'meeting'. | ||
+ | :*Rules: If it follows an article, choose <n>; if it follows an aspect/tense marker or a subject pronoun, choose <v>. | ||
+ | |||
+ | ==Final Evaluation== | ||
+ | ===Fijian Transducer=== | ||
+ | *Precision: 92.19219% | ||
+ | *Recall: 55.11670% | ||
+ | *Coverage over the large corpus: 70.06% | ||
+ | *Number of words in the large corpus: 1099762 | ||
+ | *Number of stems in the transducer: 280 | ||
+ | |||
+ | ===MT=== | ||
+ | ====fij → eng==== | ||
+ | *WER:132.31% | ||
+ | *PER:123.26% | ||
+ | *Proportion of correctly translated stems: 16.7% | ||
+ | *Trimmed coverage over <code>fij.longer.text</code>:45.49% | ||
+ | *Trimmed coverage over <code>fij.corpus.large</code>:33.5% | ||
+ | *Number of tokens in <code>longer</code> corpus:1076 | ||
+ | ====eng → fij==== | ||
+ | *WER:92.94% | ||
+ | *PER:85.32% | ||
+ | *Proportion of correctly translated stems:14.68% | ||
+ | *Trimmed coverage of <code>eng.longer.txt</code>:47.67% | ||
+ | *Number of tokens in <code>longer</code> corpus:718. | ||
==Contrastive Grammar== | ==Contrastive Grammar== |
Latest revision as of 13:42, 5 May 2018
Resources for machine translation between Fijian and English
Contents
fij → eng evaluation
Current WER and PER:
Test file: 'fij-eng.tests.txt' Reference file 'eng.tests.txt' Statistics about input files ------------------------------------------------------- Number of words in reference: 56 Number of words in test: 59 Number of unknown words (marked with a star) in test: Percentage of unknown words: 0.00 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 50 Word error rate (WER): 89.29 % Number of position-independent correct words: 12 Position-independent word error rate (PER): 83.93 % Results when unknown-word marks (stars) are not removed ------------------------------------------------------- Edit distance: 50 Word Error Rate (WER): 89.29 % Number of position-independent correct words: 12 Position-independent word error rate (PER): 83.93 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0%
eng → fij evaluation
Current WER and PER :
Test file: 'eng-fij.tests.txt' Reference file 'fij.tests.txt' Statistics about input files ------------------------------------------------------- Number of words in reference: 62 Number of words in test: 56 Number of unknown words (marked with a star) in test: 2 Percentage of unknown words: 3.57 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 52 Word error rate (WER): 83.87 % Number of position-independent correct words: 13 Position-independent word error rate (PER): 79.03 % Results when unknown-word marks (stars) are not removed ------------------------------------------------------- Edit distance: 52 Word Error Rate (WER): 83.87 % Number of position-independent correct words: 13 Position-independent word error rate (PER): 79.03 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0.00 %
Lexical Selection
https://wikis.swarthmore.edu/ling073/Fijian_and_English/Lexical_selection#fij_.E2.86.92_eng
eng → fij one-to-many mapping
- Case 1: Pelu and lo’i describe two different kinds of bending action.
(eng) bend → (fij) pelu (e.g. bend of metal)
(eng) bend → (fij) lo’i (e.g. bend at a joint)
- Case 2:
(eng) shine on → (fij) cina (light/torch shines on)
(eng) shine on → (fij) cila (sun/moon/star shines on)
- (a disambiguation problem) The third person singular pronoun in English does not distinguish between nominative and accusative case.
(eng) it → (fij) e (subj)
(eng) it → (fij) koya (obj)
fij → eng one-to-many mapping
- Case 1.1:
(fij) yava → (eng) leg
(fij) yava → (eng) foot
- Case 1.2:
(fij) liga → (eng) arm
(fij) liga → (eng) hand
- Case 1.3:
(fij) mata → (eng) face
(fij) mata → (eng) eye
- Case 2:
(fij) vula → (eng) moon
(fij) vula → (eng) month
- Case 3:
(fij) basu → (eng) tear up (e.g. old clothes)
(fij) basu → (eng) tear down (e.g. old buildings)
- Case 4:Fijian does not distinguish genders on pronouns.
(fij) koya → (eng) him
(fij) koya → (eng) her
(fij) koya → (eng) it
- Case 5: (a disambiguation problem?)
The word levu can be used either as an adjective meaning "big", or a number meaning "many, much", but both numbers and adjectives can be a predicate head (like a verb).
(fij) levu → (eng) big (adj)
(fij) levu → (eng) a lot of (num)
Additions
- 126 more stems in the Fijian transducer and the bilingual dictionary (finished adding all stems to the transducer; continuing adding them to the bilingual dictionary)
- Morphologies:
- -Causative prefix vaka-
- -Collective prefix vei-
- -Deriving verb from noun: prefix "i-"
- Lexical Selection rules:
- -Case 1: Select 'arrive' as the translation of yaco when the subject NP following the verb is something that can move around; select 'happen' as the translation for yaco when the subject NP is inanimate.
(fij) yaco → (eng) arrive
(fij) yaco → (eng) happen
- Disambiguation rules:
- -Several verbs in Fijian can be used as post-head adverbs, such as oti ('already' or 'finish').
- Rules: If it follows a verb or an object pronoun, then choose <adv>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.
- -The word soqo can be either a verb, meaning 'gather', or a noun, meaning 'meeting'.
- Rules: If it follows an article, choose <n>; if it follows an aspect/tense marker or a subject pronoun, choose <v>.
Final Evaluation
Fijian Transducer
- Precision: 92.19219%
- Recall: 55.11670%
- Coverage over the large corpus: 70.06%
- Number of words in the large corpus: 1099762
- Number of stems in the transducer: 280
MT
fij → eng
- WER:132.31%
- PER:123.26%
- Proportion of correctly translated stems: 16.7%
- Trimmed coverage over
fij.longer.text
:45.49% - Trimmed coverage over
fij.corpus.large
:33.5% - Number of tokens in
longer
corpus:1076
eng → fij
- WER:92.94%
- PER:85.32%
- Proportion of correctly translated stems:14.68%
- Trimmed coverage of
eng.longer.txt
:47.67% - Number of tokens in
longer
corpus:718.
Contrastive Grammar
https://wikis.swarthmore.edu/ling073/Fijian_and_English/Contrastive_Grammar
Developed Resources for Machine Translation
https://github.swarthmore.edu/hwang11/ling073-fij-eng
https://github.swarthmore.edu/hwang11/ling073-fij-eng-corpus