Difference between revisions of "Hokkien and English"

From LING073
Jump to: navigation, search
(Resources Developed for Hokkien)
(Resources Developed for Hokkien)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
====Resources Developed for Hokkien====
 
*[https://github.swarthmore.edu/Ling073-sp22/ling073-nan Hokkien transducer] (internal Swarthmore access may be required to view)
 
*[https://github.swarthmore.edu/Ling073-sp22/ling073-nan-corpus Hokkien corpus] (internal Swarthmore access may be required to view)
 
 
 
=====10 Sentences=====
 
=====10 Sentences=====
  
Line 8: Line 4:
  
 
* {{transferTest|nan|eng|I chit-má bô teh tha̍k Gô-bûn--ah.|He is no longer studying Russian.}}  
 
* {{transferTest|nan|eng|I chit-má bô teh tha̍k Gô-bûn--ah.|He is no longer studying Russian.}}  
*: {{transferMorphTest|nan|eng|I{{tag|prn}} chit-má{{tag|adv}} bô{{tag|prn}} teh {{tag|v}} tha̍k{{tag|v}} Gô-bûn{{tag|n}} ah{{tag|det}}| He{{tag|prn}} is{{tag|v}} no{{tag|det}} longer{{tag|adv}} studying{{tag|v}} Russian{{tag|n}}}}
+
* output of lexical transfer: I<prn>/He<prn>$ ^*chit/*chit$-^*má/*$ ^bô<adv>/not<adv>$ ^teh<v>/is<vblex>$ ^*tha/*tha$̍^*k/*k$ ^Gô-bûn<n>/Russian language<n>$--^*ah/*ah$^.<sent>/.<sent>$^.<sent>/.<sent>$
 +
* output of biltrans: #He #is #not #Russian language
  
 
* {{transferTest|nan|eng|Góa kin-á-jı̍t ê kong-khò í-keng ôan-sêng--ah.|I've already finished today's homework}}  
 
* {{transferTest|nan|eng|Góa kin-á-jı̍t ê kong-khò í-keng ôan-sêng--ah.|I've already finished today's homework}}  
*: {{transferMorphTest|nan|eng|Góa{{tag|n}} kin-á-jı̍t{{tag|adj}} ê{{tag|det}} kong-khò{{tag|n}} í-keng{{tag|adv}} ôan-sêng{{tag|v}} ah{{tag|det}}| I've{{tag|n}} already{{tag|adv}} finished{{tag|v}} today's{{tag|adj}} homework{{tag|n}}}}
+
* output of lexical transfer: ^Góa<prn>/I<prn>$ ^*kin/*kin$-^*á/*á$-^*jı/*jı$̍^*t/*t$ ^*ê/*ê$ ^kong-khò<n>/homework<n>$ ^í-keng<adv>/already<adv>$ ^ôan-sêng<v>/finish<vblex>$--^*ah/*ah$^.<sent>/.<sent>$^.<sent>/.<sent>$
 +
* output of biltrans: #I #homework #already #finish.
  
  
 
* {{transferTest|nan|eng|Hit chhut tiān-iá góa jú-lâi-jú siu-beh khòa.|I would like to see that movie more and more.}}  
 
* {{transferTest|nan|eng|Hit chhut tiān-iá góa jú-lâi-jú siu-beh khòa.|I would like to see that movie more and more.}}  
*: {{transferMorphTest|nan|eng|Hit chhut{{tag|det}} tiān-iá{{tag|n}} góa{{tag|prn}} jú-lâi-jú{{tag|adv}} siu-beh{{tag|v}} khòa{{tag|v}}| I{{tag|prn}} would{{tag|v}} like{{tag|v}} to{{tag|pr}} see{{tag|v}} that{{tag|det}} movie{{tag|n}} more and more{{tag|adv}}}}
+
* output of lexical transfer: ^*Hit/*Hit$ ^*chhut/*chhut$ ^tiān-iá<n>/movie<n>$ ^góa<prn>/I<prn>$ ^*/*jú$-^*lâi/*lâi$-^*/*jú$ ^siu-beh<v>/want<vblex>$ ^*khòa/*khòa$^.<sent>/.<sent>$^.<sent>/.<sent>$
 +
* output of biltrans: #movie #I #want
  
 
* {{transferTest|nan|eng|Góa jú-lâi-jú kah-ì lí ê pêng-iú.|I like your friend more and more.}}  
 
* {{transferTest|nan|eng|Góa jú-lâi-jú kah-ì lí ê pêng-iú.|I like your friend more and more.}}  
*: {{transferMorphTest|nan|eng|Góa{{tag|prn}} jú-lâi-jú{{tag|adv}} kah-ì{{tag|v}} {{tag|prn}} ê{{tag|det}} pêng-iú{{tag|n}}| I{{tag|prn}} like{{tag|v}} your{{tag|prn}} friend{{tag|n}} more and more{{tag|adv}}}}
+
* output of lexical transfer: ^Góa<prn>/I<prn>$ ^*/*jú$-^*lâi/*lâi$-^*/*jú$ ^kah-ì<v>/like<vblex>$ ^<prn>/you<prn>$ ^*ê/*ê$ ^pêng-iú<n>/friend<n>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 
+
* output of biltrans: #I #like #you #friend
  
 
* {{transferTest|nan|eng|Àm-tǹg chia̍h liáu, góa ū teh lim tê.|I drink tea after dinner.}}  
 
* {{transferTest|nan|eng|Àm-tǹg chia̍h liáu, góa ū teh lim tê.|I drink tea after dinner.}}  
*: {{transferMorphTest|nan|eng| Àm-tǹg{{tag|n}} chia̍h liáu{{tag|adv}} góa{{tag|prn}} ū teh{{tag|det}} lim{{tag|v}} {{tag|n}} | I{{tag|prn}} drink{{tag|v}} tea{{tag|n}} after{{tag|adv}} dinner{{tag|n}}}}
+
* output of lexical transfer: ^Àm-tǹg<n>/dinner<n>$ ^*chia/*chia$̍^*h/*h$ ^liáu<adv>/after<adv>$^,<cm>/,<cm>$ ^góa<prn>/I<prn>$ ^*ū/*ū$ ^teh<v>/is<vblex>$ ^lim<v>/drink<vblex>$ ^*/*tê$^.<sent>/.<sent>$^.<sent>/.<sent>$
 +
* output of biltrans: #dinner #after #I #is #drink  
  
 
* {{transferTest|nan|eng|Âng-e-á ū teh kóng ōe!|The baby is talking!}}  
 
* {{transferTest|nan|eng|Âng-e-á ū teh kóng ōe!|The baby is talking!}}  
*: {{transferMorphTest|nan|eng| Âng-e-á{{tag|n}} ū teh{{tag|det}} kóng{{tag|v}} ōe{{tag|det}}| The{{tag|det}} baby{{tag|n}} is{{tag|v}} talking{{tag|v}}}}
+
* output of lexical transfer: ^Âng-e-á<n>/baby<n>$ ^*ū/*ū$ ^teh<v>/is<vblex>$ ^kóng<v>/talk<vblex>$ ^ōe<det>/are<det>$^!<sent>/!<sent>$^.<sent>/.<sent>$
 
+
* output of biltrans: #baby #is #talk #are
  
 
* {{transferTest|nan|eng|Chúi-chi-teng àn chhù-téng tiàu--leh.|Crystal lights are hanging from the ceiling.}}  
 
* {{transferTest|nan|eng|Chúi-chi-teng àn chhù-téng tiàu--leh.|Crystal lights are hanging from the ceiling.}}  
*: {{transferMorphTest|nan|eng| Chúi-chi-teng{{tag|n}} àn{{tag|v}} chhù-téng{{tag|n}} tiàu{{tag|v}} leh{{tag|det}}| Crystal{{tag|n}} lights{{tag|n}} are{{tag|v}} hanging{{tag|v}} from{{tag|pr}} the{{tag|det}} ceiling{{tag|n}}}}
+
* output of lexical transfer: ^*Chúi/*Chúi$-^*chi/*chi$-^teng<n>/lights<n>$ ^àn<pr>/from<pr>$ ^chhù-téng<n>/ceiling<n>$ ^tiàu<v>/hang<vblex>$--^leh<det>/are<det>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 
+
* output of biltrans: #lights #from #ceiling #hang #are
 
* {{transferTest|nan|eng|I tī chhia thâu-chêng khiā-leh.|She is standing in front of the car.}}  
 
* {{transferTest|nan|eng|I tī chhia thâu-chêng khiā-leh.|She is standing in front of the car.}}  
*: {{transferMorphTest|nan|eng| I{{tag|prn}} {{tag|v}} chhia{{tag|n}} thâu{{tag|pr}} chêng{{tag|n}} khiā{{tag|v}} leh{{tag|det}} | She{{tag|prn}} is{{tag|v}} standing{{tag|v}} in{{tag|pr}} front{{tag|n}} of{{tag|pr}} the{{tag|det}} car{{tag|n}}}}
+
* output of lexical transfer: ^I<prn>/He<prn>$ ^*tī/*$ ^chhia<n>/car<n>$ ^thâu<pr>/in<pr>$-^chêng<n>/front<n>$ ^khiā<v>/stand<vblex>$-^leh<det>/are<det>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 +
* output of biltrans: #He #car #in #front #stand #are
  
  
 
* {{transferTest|nan|eng|Ha̍k-seng chheh the̍h--leh.|The students are holding their books.}}  
 
* {{transferTest|nan|eng|Ha̍k-seng chheh the̍h--leh.|The students are holding their books.}}  
*: {{transferMorphTest|nan|eng| Ha̍k-seng{{tag|n}} chheh{{tag|n}} the̍h{{tag|v}} leh{{tag|det}}| The{{tag|det}} students{{tag|n}} are{{tag|v}} holding{{tag|v}} their{{tag|prn}} books{{tag|n}}}}
+
* output of lexical transfer: ^*Ha/*Ha$̍^*k/*k$-^*seng/*seng$ ^chheh<n>/books<n>$ ^*the/*the$̍^*h/*h$--^leh<det>/are<det>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 +
* output of biltrans: #books #the #are
  
 
* {{transferTest|nan|eng|Mn̂g lóng koai--leh.|All the doors are closed.}}  
 
* {{transferTest|nan|eng|Mn̂g lóng koai--leh.|All the doors are closed.}}  
*: {{transferMorphTest|nan|eng| Mn̂g{{tag|n}} lóng{{tag|adv}} koai{{tag|v}} leh{{tag|det}}| All{{tag|adv}} the{{tag|det}} doors{{tag|n}} are{{tag|v}} closed{{tag|v}}}}
+
* output of lexical transfer: ^*Mn/*Mn$̂^*g/*g$ ^lóng<adv>/all<adv>$ ^koai<v>/close<vblex>$--^leh<det>/are<det>$^.<sent>/.<sent>$^.<sent>/.<sent>$
 +
* output of biltrans: #all #close #are  
 +
 
 +
===Evaluation===
 +
 
 +
*Coverage on Monolingual Transducer : 697/4756 ~0.14655172413793103448 (14.7%)
 +
 
 +
**TOP UNKNOWN WORDS:
 +
    254 ^ê/*ê$
 +
    85 ^kóng/*kóng$
 +
    78 ^lâng/*lâng$
 +
    71 ^kā/*kā$
 +
    64 ^so͘/*so͘$
 +
    64 ^Iâ/*Iâ$
 +
    58 ^sī/*sī$
 +
    54 ^ōe/*ōe$
 +
    49 ^tio̍h/*tio̍h$
 +
    49 ^in/*in$
 +
    47 ^chi̍t/*chi̍t$
 +
    44 ^lâi/*lâi$
 +
    44 ^khì/*khì$
 +
    42 ^tī/*tī$
 +
    41 ^m̄/*m̄$
 +
    40 ^chiū/*chiū$
 +
    39 ^ū/*ū$
 +
    36 ^sî/*sî$
 +
    36 ^lín/*lín$
 +
    35 ^góa/*góa$
 +
coverage: 697 / 4756 (~0.14655172413793103448)
 +
remaining unknown forms: 4059
  
====Resources for Machine Translation Between Hokkien and English====
 
*
 
  
 +
*Coverage on Bilingual Transducer : 428 / 2717 (~15.8%)
 +
**TOP UNKNOWN WORDS:
 +
    156 ^ê/*ê$
 +
    40 ^lâng/*lâng$
 +
    40 ^kóng/*kóng$
 +
    36 ^sī/*sī$
 +
    36 ^ōe/*ōe$
 +
    32 ^tio̍h/*tio̍h$
 +
    29 ^kā/*kā$
 +
    26 ^m̄/*m̄$
 +
    25 ^ū/*ū$
 +
    25 ^so͘/*so͘$
 +
    25 ^in/*in$
 +
    25 ^hō͘/*hō͘$
 +
    25 ^Iâ/*Iâ$
 +
    24 ^tī/*tī$
 +
    24 ^khì/*khì$
 +
    23 ^lâi/*lâi$
 +
    22 ^tùi/*tùi$
 +
    22 ^sî/*sî$
 +
    22 ^chi̍t/*chi̍t$
 +
    20 ^tè/*tè$
 +
coverage: 428 / 2717 (~0.15752668384247331616)
  
 
[[Category:Sp22_TranslationPairs]]
 
[[Category:Sp22_TranslationPairs]]
 
[[Category:Hokkien]]
 
[[Category:Hokkien]]
 
[[Category:English]]
 
[[Category:English]]

Latest revision as of 22:30, 1 June 2022

10 Sentences
  • (nan) I chit-má bô teh tha̍k Gô-bûn--ah. → (eng) He is no longer studying Russian.
  • output of lexical transfer: I<prn>/He<prn>$ ^*chit/*chit$-^*má/*má$ ^bô<adv>/not<adv>$ ^teh<v>/is<vblex>$ ^*tha/*tha$̍^*k/*k$ ^Gô-bûn<n>/Russian language<n>$--^*ah/*ah$^.<sent>/.<sent>$^.<sent>/.<sent>$
  • output of biltrans: #He #is #not #Russian language
  • (nan) Góa kin-á-jı̍t ê kong-khò í-keng ôan-sêng--ah. → (eng) I've already finished today's homework
  • output of lexical transfer: ^Góa<prn>/I<prn>$ ^*kin/*kin$-^*á/*á$-^*jı/*jı$̍^*t/*t$ ^*ê/*ê$ ^kong-khò<n>/homework<n>$ ^í-keng<adv>/already<adv>$ ^ôan-sêng<v>/finish<vblex>$--^*ah/*ah$^.<sent>/.<sent>$^.<sent>/.<sent>$
  • output of biltrans: #I #homework #already #finish.


  • (nan) Hit chhut tiān-iá góa jú-lâi-jú siu-beh khòa. → (eng) I would like to see that movie more and more.
  • output of lexical transfer: ^*Hit/*Hit$ ^*chhut/*chhut$ ^tiān-iá<n>/movie<n>$ ^góa<prn>/I<prn>$ ^*jú/*jú$-^*lâi/*lâi$-^*jú/*jú$ ^siu-beh<v>/want<vblex>$ ^*khòa/*khòa$^.<sent>/.<sent>$^.<sent>/.<sent>$
  • output of biltrans: #movie #I #want
  • (nan) Góa jú-lâi-jú kah-ì lí ê pêng-iú. → (eng) I like your friend more and more.
  • output of lexical transfer: ^Góa<prn>/I<prn>$ ^*jú/*jú$-^*lâi/*lâi$-^*jú/*jú$ ^kah-ì<v>/like<vblex>$ ^lí<prn>/you<prn>$ ^*ê/*ê$ ^pêng-iú<n>/friend<n>$^.<sent>/.<sent>$^.<sent>/.<sent>$
  • output of biltrans: #I #like #you #friend
  • (nan) Àm-tǹg chia̍h liáu, góa ū teh lim tê. → (eng) I drink tea after dinner.
  • output of lexical transfer: ^Àm-tǹg<n>/dinner<n>$ ^*chia/*chia$̍^*h/*h$ ^liáu<adv>/after<adv>$^,<cm>/,<cm>$ ^góa<prn>/I<prn>$ ^*ū/*ū$ ^teh<v>/is<vblex>$ ^lim<v>/drink<vblex>$ ^*tê/*tê$^.<sent>/.<sent>$^.<sent>/.<sent>$
  • output of biltrans: #dinner #after #I #is #drink
  • (nan) Âng-e-á ū teh kóng ōe! → (eng) The baby is talking!
  • output of lexical transfer: ^Âng-e-á<n>/baby<n>$ ^*ū/*ū$ ^teh<v>/is<vblex>$ ^kóng<v>/talk<vblex>$ ^ōe<det>/are<det>$^!<sent>/!<sent>$^.<sent>/.<sent>$
  • output of biltrans: #baby #is #talk #are
  • (nan) Chúi-chi-teng àn chhù-téng tiàu--leh. → (eng) Crystal lights are hanging from the ceiling.
  • output of lexical transfer: ^*Chúi/*Chúi$-^*chi/*chi$-^teng<n>/lights<n>$ ^àn<pr>/from<pr>$ ^chhù-téng<n>/ceiling<n>$ ^tiàu<v>/hang<vblex>$--^leh<det>/are<det>$^.<sent>/.<sent>$^.<sent>/.<sent>$
  • output of biltrans: #lights #from #ceiling #hang #are
  • (nan) I tī chhia thâu-chêng khiā-leh. → (eng) She is standing in front of the car.
  • output of lexical transfer: ^I<prn>/He<prn>$ ^*tī/*tī$ ^chhia<n>/car<n>$ ^thâu<pr>/in<pr>$-^chêng<n>/front<n>$ ^khiā<v>/stand<vblex>$-^leh<det>/are<det>$^.<sent>/.<sent>$^.<sent>/.<sent>$
  • output of biltrans: #He #car #in #front #stand #are


  • (nan) Ha̍k-seng chheh the̍h--leh. → (eng) The students are holding their books.
  • output of lexical transfer: ^*Ha/*Ha$̍^*k/*k$-^*seng/*seng$ ^chheh<n>/books<n>$ ^*the/*the$̍^*h/*h$--^leh<det>/are<det>$^.<sent>/.<sent>$^.<sent>/.<sent>$
  • output of biltrans: #books #the #are
  • (nan) Mn̂g lóng koai--leh. → (eng) All the doors are closed.
  • output of lexical transfer: ^*Mn/*Mn$̂^*g/*g$ ^lóng<adv>/all<adv>$ ^koai<v>/close<vblex>$--^leh<det>/are<det>$^.<sent>/.<sent>$^.<sent>/.<sent>$
  • output of biltrans: #all #close #are

Evaluation

  • Coverage on Monolingual Transducer : 697/4756 ~0.14655172413793103448 (14.7%)
    • TOP UNKNOWN WORDS:
   254 ^ê/*ê$
    85 ^kóng/*kóng$
    78 ^lâng/*lâng$
    71 ^kā/*kā$
    64 ^so͘/*so͘$
    64 ^Iâ/*Iâ$
    58 ^sī/*sī$
    54 ^ōe/*ōe$
    49 ^tio̍h/*tio̍h$
    49 ^in/*in$
    47 ^chi̍t/*chi̍t$
    44 ^lâi/*lâi$
    44 ^khì/*khì$
    42 ^tī/*tī$
    41 ^m̄/*m̄$
    40 ^chiū/*chiū$
    39 ^ū/*ū$
    36 ^sî/*sî$
    36 ^lín/*lín$
    35 ^góa/*góa$

coverage: 697 / 4756 (~0.14655172413793103448) remaining unknown forms: 4059


  • Coverage on Bilingual Transducer : 428 / 2717 (~15.8%)
    • TOP UNKNOWN WORDS:
   156 ^ê/*ê$
    40 ^lâng/*lâng$
    40 ^kóng/*kóng$
    36 ^sī/*sī$
    36 ^ōe/*ōe$
    32 ^tio̍h/*tio̍h$
    29 ^kā/*kā$
    26 ^m̄/*m̄$
    25 ^ū/*ū$
    25 ^so͘/*so͘$
    25 ^in/*in$
    25 ^hō͘/*hō͘$
    25 ^Iâ/*Iâ$
    24 ^tī/*tī$
    24 ^khì/*khì$
    23 ^lâi/*lâi$
    22 ^tùi/*tùi$
    22 ^sî/*sî$
    22 ^chi̍t/*chi̍t$
    20 ^tè/*tè$

coverage: 428 / 2717 (~0.15752668384247331616)