Difference between revisions of "Fijian and English/Structural transfer"

From LING073
Jump to: navigation, search
(eng → fij)
(Post-evaluation)
 
(20 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
:-WER:75.81%
 
:-WER:75.81%
 
:-PER:70.97%
 
:-PER:70.97%
<pre>
 
Test file: 'eng-fij.tests.txt'
 
Reference file 'fij.tests.txt'
 
 
Statistics about input files
 
-------------------------------------------------------
 
Number of words in reference: 62
 
Number of words in test: 57
 
Number of unknown words (marked with a star) in test:
 
Percentage of unknown words: 0.00 %
 
 
Results when removing unknown-word marks (stars)
 
-------------------------------------------------------
 
Edit distance: 47
 
Word error rate (WER): 75.81 %
 
Number of position-independent correct words: 18
 
Position-independent word error rate (PER): 70.97 %
 
 
Results when unknown-word marks (stars) are not removed
 
-------------------------------------------------------
 
Edit distance: 47
 
Word Error Rate (WER): 75.81 %
 
Number of position-independent correct words: 18
 
Position-independent word error rate (PER): 70.97 %
 
 
Statistics about the translation of unknown words
 
-------------------------------------------------------
 
Number of unknown words which were free rides: 0
 
Percentage of unknown words that were free rides: 0%
 
</pre>
 
 
*fij → eng
 
*fij → eng
 
:-WER:82.14%
 
:-WER:82.14%
 
:-PER:76.79%
 
:-PER:76.79%
<pre>
 
Test file: 'fij-eng.tests.txt'
 
Reference file 'eng.tests.txt'
 
 
Statistics about input files
 
-------------------------------------------------------
 
Number of words in reference: 56
 
Number of words in test: 57
 
Number of unknown words (marked with a star) in test:
 
Percentage of unknown words: 0.00 %
 
 
Results when removing unknown-word marks (stars)
 
-------------------------------------------------------
 
Edit distance: 46
 
Word error rate (WER): 82.14 %
 
Number of position-independent correct words: 14
 
Position-independent word error rate (PER): 76.79 %
 
 
Results when unknown-word marks (stars) are not removed
 
-------------------------------------------------------
 
Edit distance: 46
 
Word Error Rate (WER): 82.14 %
 
Number of position-independent correct words: 14
 
Position-independent word error rate (PER): 76.79 %
 
 
Statistics about the translation of unknown words
 
-------------------------------------------------------
 
Number of unknown words which were free rides: 0
 
Percentage of unknown words that were free rides: 0%
 
</pre>
 
  
 
==eng → fij==
 
==eng → fij==
Line 73: Line 13:
 
*Taggers: <code>^this/this<det><dem><sg>$ ^big/big<adj><sint>$ ^village/village<n><sg>$^./.<sent>$</code>
 
*Taggers: <code>^this/this<det><dem><sg>$ ^big/big<adj><sint>$ ^village/village<n><sg>$^./.<sent>$</code>
 
*Biltrans: <code>^this<det><dem><sg>/yai<det><dem><sg>$ ^big<adj><sint>/levu<adj>$ ^village<n><sg>/’oro<n><sg>$^.<sent>/.<sent>$</code>
 
*Biltrans: <code>^this<det><dem><sg>/yai<det><dem><sg>$ ^big<adj><sint>/levu<adj>$ ^village<n><sg>/’oro<n><sg>$^.<sent>/.<sent>$</code>
Step 1: remove number tags
+
Step 1: remove number tags and add ''a<art>'' in the nom chunk:
*Chunker: <code>^dem<det><dem>{^yai<det><dem>$}$ ^adj<adj><sint>{^levu<adj>$}$ ^nom<n>{^’oro<n>$}$^sent<SENT>{^.<sent>$}$</code>
+
*Chunker: <code>^dem<det><dem>{^yai<det><dem>$}$ ^adj<adj><sint>{^levu<adj>$}$ ^nom<n>{^a<art> ^’oro<n>$}$^sent<SENT>{^.<sent>$}$</code>
Step 2: change word orders
+
Step 2: change word orders:
*Interchunk: <code>^nom<n>{^’oro<n>$}$ ^adj<adj><sint>{^levu<adj>$}$ ^dem<det><dem>{^yai<det><dem>$}$^sent<SENT>{^.<sent>$}$</code>
+
*Interchunk: <code>^nom<n>{^a<art> ^’oro<n>$}$ ^adj<adj><sint>{^levu<adj>$}$ ^dem<det><dem>{^yai<det><dem>$}$^sent<SENT>{^.<sent>$}$</code>
*Postchunk: <code>^’oro<n>$ ^levu<adj>$ ^yai<det><dem>$^.<sent>$</code>
+
*Postchunk: <code>^a<art>$ ^’oro<n>$ ^levu<adj>$ ^yai<det><dem>$^.<sent>$</code>
*Translation: <code>’oro levu yai</code>
+
*Translation: <code>a ’oro levu yai</code>
*Problems:
 
:-How to add an extra word--the article ''a'' in Fijian?
 
:-Both demonstratives and adjectives are nonobligatory, but the rule above to change word orders only works when both appear in the NP. (i.e. the word orders for "this village" and "the big village" are still incorrect.) Do I have to write separate rules for them?
 
  
 
==fij → eng==
 
==fij → eng==
 +
===Same grammar point as above===
  
 +
The relationship between the Fijian article ''a'' and the English definite determiner ''the'' was complicated to implement. For now, I set the article ''a'' to translate to null all the time in the bilingual dictionary. This works here and for most complex NPs such as those with demonstratives and possessives, but not for NPs without any <det>, such as ''a ’oro levu'' (ART village big) or ''a ’oro'' (ART village); for these simple NPs, I will need to write separate rules with the two patterns in the fij-eng.t1x file.
 +
*Example phrase: {{transferTest|fij|eng|a ’oro levu yai|this big village}}
 +
*Current translation from Fijian to English: <code>#village big #this</code>
 +
*Taggers:
 +
:-Fijian:<code>^a/a<art>$ ^’oro/’oro<n>$ ^levu/levu<adj>$ ^yai/yai<det><dem>$^./.<sent>$</code>
 +
:-English(desired):<code>^this/this<det><dem><sg>$ ^big/big<adj><sint>$ ^village/village<n><sg>$^./.<sent>$</code>
 +
*Biltrans:<code>^a<art>/$ ^’oro<n>/village<n>$ ^levu<adj>/big<adj><sint>$ ^yai<det><dem>/this<det><dem>$^.<sent>/.<sent>$</code>
 +
Fijian nouns and demonstratives do not distinguish numbers, so I want to add a number tag for the English noun and the demonstrative ''this'', and then change the word order from head first in Fijian to head final in English:
 +
<pre>
 +
<rule comment="REGLA: NPs with demonstratives">
 +
      <pattern>
 +
        <pattern-item n="article"/>
 +
        <pattern-item n="noun_reg"/>
 +
        <pattern-item n="adj"/>
 +
        <pattern-item n="dem"/>
 +
      </pattern>
 +
      <action>
 +
        <out>
 +
          <chunk name="nom_adj_dem" case="caseFirstWord">
 +
            <tags>
 +
              <tag><lit-tag v="SN"/></tag>
 +
              <tag><lit-tag v="sg"/></tag>
 +
            </tags>
 +
            <lu>
 +
              <clip pos="4" side="tl" part="lem"/>
 +
              <clip pos="4" side="tl" part="a_pos"/>
 +
              <clip pos="4" side="tl" part="subcategories"/>
 +
              <lit-tag v="sg"/>
 +
            </lu>
 +
            <b/>
 +
            <lu>
 +
              <clip pos="3" side="tl" part="whole"/>
 +
            </lu>
 +
            <b/>
 +
            <lu>
 +
              <clip pos="2" side="tl" part="lem"/>
 +
              <clip pos="2" side="tl" part="a_pos"/>
 +
              <lit-tag v="sg"/>
 +
            </lu>
 +
          </chunk>
 +
        </out>
 +
      </action>
 +
    </rule>
 +
</pre>
 +
*Chunker:<code>^nom_adj_dem<SN><sg>{^this<det><dem><sg>$ ^big<adj><sint>$ ^village<n><sg>$}$^sent<SENT>{^.<sent>$}$</code>
 +
*Interchunk:<code>^nom_adj_dem<SN><sg>{^this<det><dem><sg>$ ^big<adj><sint>$ ^village<n><sg>$}$^sent<SENT>{^.<sent>$}$</code>
 +
*Postchunk:<code>^this<det><dem><sg>$ ^big<adj><sint>$ ^village<n><sg>$^.<sent>$</code>
 +
*Note and problems with current codes:
 +
Before this I tried to add number tags by having <code><clip pos="2" side="tl" part="a_num"></code> in the lexical unit section, but since Fijian nouns do not change for numbers (e.g. ’oro can be translated as both 'village' and 'villages'), the English noun still did not what number tag to take and thus the number tag never showed up in the chunk. Instead, I added the number tag manually to the noun and demonstrative in the English translation by <lit-tag>, setting it to be "sg" in this pattern. It seems to work in this case, but in fact without contexts from sentences, it is inaccurate to translate "a ’oro levu yai" as "this big village" (e.g. {{transferTest|fij|eng|Au raica (koya) a ’oro levu yai|I see this big village.}} and {{transferTest|fij|eng|Au raica ira a ’oro levu yai|I see these big villages.}}). So I need to change my example from a phrase to a sentence with enough contexts and write a rule for that pattern instead.
 +
===In sentences===
 +
*Example sentence: {{transferTest|fij|eng|Au raici ira a gone.|I see the children.}}
 +
*Codes:
 +
<pre>
 +
<rule comment="REGLA: Clause_1">
 +
      <pattern>
 +
        <pattern-item n="pronoun"/>
 +
        <pattern-item n="tense_mk"/>
 +
        <pattern-item n="verb"/>
 +
        <pattern-item n="pronoun"/>
 +
        <pattern-item n="article"/>
 +
        <pattern-item n="noun_reg"/>
 +
      </pattern>
 +
      <action>
 +
        <let>
 +
          <var n="tense"/>
 +
          <clip pos="2" side="sl" part="a_tense"/>
 +
        </let>
 +
        <out>
 +
          <chunk name="Clause_1" case="caseFirstWord">
 +
            <tags>
 +
              <tag><lit-tag v="SV"/></tag>
 +
              <tag><var n="tense"/></tag>
 +
              <tag><lit-tag v="SN"/></tag>
 +
              <tag><lit-tag v="pl"/></tag>
 +
            </tags>
 +
            <lu>
 +
              <clip pos="1" side="tl" part="whole"/>
 +
            </lu>
 +
            <b/>
 +
            <lu>
 +
              <clip pos="3" side="tl" part="lem"/>
 +
              <clip pos="3" side="tl" part="a_pos"/>
 +
              <clip pos="2" side="sl" part="a_tense"/>
 +
            </lu>
 +
            <b/>
 +
            <lu>
 +
                <lit v="the"/>
 +
                <lit-tag v="det.def.sp"/>
 +
            </lu>
 +
            <b/>
 +
            <lu>
 +
              <clip pos="6" side="tl" part="lem"/>
 +
              <clip pos="6" side="tl" part="a_pos"/>
 +
              <clip pos="4" side="sl" part="a_num"/>
 +
            </lu>
 +
          </chunk>
 +
      </out>
 +
  </action>
 +
</rule>
 +
</pre>
 +
*Chunker: <code>^clause_2<SV><pres><SN><pl>{^Prpers<prn><p1><mf><sg><subj>$ ^see<vblex><pres>$ ^the<det><def><sp>$ ^child<n><pl>$}$^sent<SENT>{^.<sent>$}$^sent<SENT>{^.<sent>$}$</code>
 +
*Interchunk: <code>^clause_2<SV><pres><SN><pl>{^Prpers<prn><p1><mf><sg><subj>$ ^see<vblex><pres>$ ^the<det><def><sp>$ ^child<n><pl>$}$^sent<SENT>{^.<sent>$}$^sent<SENT>{^.<sent>$}$</code>
 +
*Postchunk: <code>^Prpers<prn><p1><mf><sg><subj>$ ^see<vblex><pres>$ ^the<det><def><sp>$ ^child<n><pl>$^.<sent>$^.<sent>$</code>
 +
*Translation: I see the children.
  
 
==Post-evaluation==
 
==Post-evaluation==
 +
*eng → fij
 +
:-WER: 64.52%
 +
:-PER: 56.45%
 +
*fij → eng
 +
:-WER: 51.79%
 +
:-PER: 46.43%
 +
 +
[[Category:Sp18_StructuralTransfer]]
 +
[[Category:Fijian]]

Latest revision as of 17:00, 11 April 2019

Initial Evaluation

  • eng → fij
-WER:75.81%
-PER:70.97%
  • fij → eng
-WER:82.14%
-PER:76.79%

eng → fij

I am working on the first contrastive grammar point--word orders within NPs. An English NP has the demonstrative and adjectives preceding the head noun, while Fijian has all of the modifiers following the head noun. Besides, the article in front of the head noun is obligatory even if a demonstrative is present, but its corresponding translation in English--the determiner the--is not allowed to occur when there is a demonstrative in the NP. In terms of tags, Fijian does not have number tags on either nouns or demonstratives.

  • Example phrase: (eng) this big village → (fij) a ’oro levu yai
  • Current translation to Fijian:#yai levu #’oro
  • Taggers: ^this/this<det><dem><sg>$ ^big/big<adj><sint>$ ^village/village<n><sg>$^./.<sent>$
  • Biltrans: ^this<det><dem><sg>/yai<det><dem><sg>$ ^big<adj><sint>/levu<adj>$ ^village<n><sg>/’oro<n><sg>$^.<sent>/.<sent>$

Step 1: remove number tags and add a<art> in the nom chunk:

  • Chunker: ^dem<det><dem>{^yai<det><dem>$}$ ^adj<adj><sint>{^levu<adj>$}$ ^nom<n>{^a<art> ^’oro<n>$}$^sent<SENT>{^.<sent>$}$

Step 2: change word orders:

  • Interchunk: ^nom<n>{^a<art> ^’oro<n>$}$ ^adj<adj><sint>{^levu<adj>$}$ ^dem<det><dem>{^yai<det><dem>$}$^sent<SENT>{^.<sent>$}$
  • Postchunk: ^a<art>$ ^’oro<n>$ ^levu<adj>$ ^yai<det><dem>$^.<sent>$
  • Translation: a ’oro levu yai

fij → eng

Same grammar point as above

The relationship between the Fijian article a and the English definite determiner the was complicated to implement. For now, I set the article a to translate to null all the time in the bilingual dictionary. This works here and for most complex NPs such as those with demonstratives and possessives, but not for NPs without any <det>, such as a ’oro levu (ART village big) or a ’oro (ART village); for these simple NPs, I will need to write separate rules with the two patterns in the fij-eng.t1x file.

  • Example phrase: (fij) a ’oro levu yai → (eng) this big village
  • Current translation from Fijian to English: #village big #this
  • Taggers:
-Fijian:^a/a<art>$ ^’oro/’oro<n>$ ^levu/levu<adj>$ ^yai/yai<det><dem>$^./.<sent>$
-English(desired):^this/this<det><dem><sg>$ ^big/big<adj><sint>$ ^village/village<n><sg>$^./.<sent>$
  • Biltrans:^a<art>/$ ^’oro<n>/village<n>$ ^levu<adj>/big<adj><sint>$ ^yai<det><dem>/this<det><dem>$^.<sent>/.<sent>$

Fijian nouns and demonstratives do not distinguish numbers, so I want to add a number tag for the English noun and the demonstrative this, and then change the word order from head first in Fijian to head final in English:

<rule comment="REGLA: NPs with demonstratives">
      <pattern>
        <pattern-item n="article"/>
        <pattern-item n="noun_reg"/>
        <pattern-item n="adj"/>
        <pattern-item n="dem"/>
      </pattern>
      <action>
        <out>
          <chunk name="nom_adj_dem" case="caseFirstWord">
            <tags>
              <tag><lit-tag v="SN"/></tag>
              <tag><lit-tag v="sg"/></tag>
            </tags>
            <lu>
              <clip pos="4" side="tl" part="lem"/>
              <clip pos="4" side="tl" part="a_pos"/>
              <clip pos="4" side="tl" part="subcategories"/>
              <lit-tag v="sg"/>
            </lu>
            <b/>
            <lu>
              <clip pos="3" side="tl" part="whole"/>
            </lu>
            <b/>
            <lu>
              <clip pos="2" side="tl" part="lem"/>
              <clip pos="2" side="tl" part="a_pos"/>
              <lit-tag v="sg"/>
            </lu>
          </chunk>
        </out>
      </action>
    </rule>
  • Chunker:^nom_adj_dem<SN><sg>{^this<det><dem><sg>$ ^big<adj><sint>$ ^village<n><sg>$}$^sent<SENT>{^.<sent>$}$
  • Interchunk:^nom_adj_dem<SN><sg>{^this<det><dem><sg>$ ^big<adj><sint>$ ^village<n><sg>$}$^sent<SENT>{^.<sent>$}$
  • Postchunk:^this<det><dem><sg>$ ^big<adj><sint>$ ^village<n><sg>$^.<sent>$
  • Note and problems with current codes:

Before this I tried to add number tags by having <clip pos="2" side="tl" part="a_num"> in the lexical unit section, but since Fijian nouns do not change for numbers (e.g. ’oro can be translated as both 'village' and 'villages'), the English noun still did not what number tag to take and thus the number tag never showed up in the chunk. Instead, I added the number tag manually to the noun and demonstrative in the English translation by <lit-tag>, setting it to be "sg" in this pattern. It seems to work in this case, but in fact without contexts from sentences, it is inaccurate to translate "a ’oro levu yai" as "this big village" (e.g. (fij) Au raica (koya) a ’oro levu yai → (eng) I see this big village. and (fij) Au raica ira a ’oro levu yai → (eng) I see these big villages.). So I need to change my example from a phrase to a sentence with enough contexts and write a rule for that pattern instead.

In sentences

  • Example sentence: (fij) Au raici ira a gone. → (eng) I see the children.
  • Codes:
<rule comment="REGLA: Clause_1">
      <pattern>
        <pattern-item n="pronoun"/>
        <pattern-item n="tense_mk"/>
        <pattern-item n="verb"/>
        <pattern-item n="pronoun"/>
        <pattern-item n="article"/>
        <pattern-item n="noun_reg"/>
      </pattern>
      <action>
        <let>
          <var n="tense"/>
          <clip pos="2" side="sl" part="a_tense"/>
        </let>
        <out>
          <chunk name="Clause_1" case="caseFirstWord">
            <tags>
              <tag><lit-tag v="SV"/></tag>
              <tag><var n="tense"/></tag>
              <tag><lit-tag v="SN"/></tag>
              <tag><lit-tag v="pl"/></tag>
            </tags>
            <lu>
              <clip pos="1" side="tl" part="whole"/>
            </lu>
            <b/>
            <lu>
              <clip pos="3" side="tl" part="lem"/>
              <clip pos="3" side="tl" part="a_pos"/>
              <clip pos="2" side="sl" part="a_tense"/>
            </lu>
            <b/>
            <lu>
                <lit v="the"/>
                <lit-tag v="det.def.sp"/>
            </lu>
            <b/>
            <lu>
              <clip pos="6" side="tl" part="lem"/>
              <clip pos="6" side="tl" part="a_pos"/>
              <clip pos="4" side="sl" part="a_num"/>
            </lu>
          </chunk>
      </out>
   </action>
</rule>
  • Chunker: ^clause_2<SV><pres><SN><pl>{^Prpers<prn><p1><mf><sg><subj>$ ^see<vblex><pres>$ ^the<det><def><sp>$ ^child<n><pl>$}$^sent<SENT>{^.<sent>$}$^sent<SENT>{^.<sent>$}$
  • Interchunk: ^clause_2<SV><pres><SN><pl>{^Prpers<prn><p1><mf><sg><subj>$ ^see<vblex><pres>$ ^the<det><def><sp>$ ^child<n><pl>$}$^sent<SENT>{^.<sent>$}$^sent<SENT>{^.<sent>$}$
  • Postchunk: ^Prpers<prn><p1><mf><sg><subj>$ ^see<vblex><pres>$ ^the<det><def><sp>$ ^child<n><pl>$^.<sent>$^.<sent>$
  • Translation: I see the children.

Post-evaluation

  • eng → fij
-WER: 64.52%
-PER: 56.45%
  • fij → eng
-WER: 51.79%
-PER: 46.43%