Difference between revisions of "Latin and Mandarin Chinese/Structural transfer"
m |
|||
(7 intermediate revisions by one other user not shown) | |||
Line 79: | Line 79: | ||
"Maria saw the boy's head." | "Maria saw the boy's head." | ||
− | tagger: | + | '''tagger:''' |
− | + | ^Maria<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>$ ^puer<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>$^.<sent>$ | |
− | chunker: apertium-transfer: Rule 2 caput<n><nt><sg><nom>/头<n><nt><sg><nom> | + | '''biltrans:''' |
+ | |||
+ | ^Maria<np><ant><f><sg><nom>/小红<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>/头<n><nt><sg><nom>$ ^puer<n><m><sg><gen>/男孩<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>/看到<vblex><pri><act><p3><sg>$^.<sent>/。<sent>$ | ||
+ | |||
+ | '''chunker:''' | ||
+ | |||
+ | apertium-transfer: Rule 2 caput<n><nt><sg><nom>/头<n><nt><sg><nom> | ||
apertium-transfer: Rule 2 puer<n><m><sg><gen>/男孩<n><m><sg><gen> | apertium-transfer: Rule 2 puer<n><m><sg><gen>/男孩<n><m><sg><gen> | ||
Line 90: | Line 96: | ||
^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$ | ^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$ | ||
− | interchunk: Rule 1 noun<SN>{^男孩<n>$ ^的<pr>$} | + | '''interchunk:''' |
+ | |||
+ | Rule 1 noun<SN>{^男孩<n>$ ^的<pr>$} | ||
^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$ | ^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$ | ||
− | postchunk: ^小红<np><ant><f><sg><nom>$ ^头<n>$ ^男孩<n>$ ^的<pr>$ ^看到<vblex><pri><act><p3><sg>$^。<sent>$ | + | '''postchunk:''' |
+ | |||
+ | ^小红<np><ant><f><sg><nom>$ ^头<n>$ ^男孩<n>$ ^的<pr>$ ^看到<vblex><pri><act><p3><sg>$^。<sent>$ | ||
+ | |||
+ | '''lat-zho:''' | ||
+ | |||
+ | #小红 头 男孩 的 #看到。 | ||
+ | |||
+ | === zho → lat === | ||
+ | |||
+ | For translating into Latin, we implemented rules to take away the preposition 给 and change the following noun into dative case. | ||
+ | |||
+ | 我给狗听。 | ||
+ | |||
+ | "I listen for the dog (i.e., for the dog's sake/at the dog's behest)." | ||
+ | |||
+ | '''tagger:''' | ||
+ | |||
+ | ^我<prn>$ ^给<pr>$ ^狗<n>$ ^听<vblex>$ | ||
+ | |||
+ | '''biltrans:''' | ||
+ | |||
+ | ^我<prn>/ego<prn>$ ^给<pr>/$ ^狗<n>/canis<n><m>$ ^听<vblex>/auscultare$ | ||
+ | |||
+ | '''chunker:''' | ||
+ | |||
+ | apertium-transfer: Rule 2 给<pr>/ | ||
+ | |||
+ | apertium-transfer: Rule 3 狗<n>/canis<n><m> | ||
+ | ^default<default>{^ego<prn>$}$ ^pr<SPR><dat>{}$ ^nom<SN><ND><CD>{^canis<n><m><2><3>$}$ ^default<default>{^auscultare$}$ | ||
+ | |||
+ | '''interchunk:''' | ||
+ | |||
+ | apertium-interchunk: Rule 2 pr<SPR><dat>{} nom<SN><ND><CD>{^canis<n><m><2><3>$} | ||
+ | ^default<default>{^ego<prn>$}$ ^nom<SN><sg><dat>{^canis<n><m><2><3>$}$ ^default<default>{^auscultare$}$ | ||
+ | |||
+ | '''postchunk:''' | ||
+ | |||
+ | ^ego<prn>$ ^canis<n><m><sg><dat>$ ^auscultare$ | ||
+ | |||
+ | '''zho-lat''' | ||
+ | |||
+ | #ego cani #auscultare | ||
+ | |||
+ | == Post-evaluation == | ||
+ | |||
+ | === Latin corpus coverage === | ||
+ | |||
+ | Number of tokenised words in the corpus: 380 | ||
+ | |||
+ | Coverage: 88.68% | ||
+ | |||
+ | Top unknown words in the corpus: | ||
+ | |||
+ | 2 possit | ||
+ | |||
+ | 2 potest | ||
+ | |||
+ | 2 quo | ||
+ | |||
+ | 2 facet | ||
+ | |||
+ | 1 abest | ||
+ | |||
+ | 1 duo | ||
+ | |||
+ | 1 Id | ||
+ | |||
+ | 1 Te | ||
+ | |||
+ | 1 quid | ||
+ | |||
+ | 1 vult | ||
+ | |||
+ | 1 Utrique | ||
+ | |||
+ | 1 poterunt | ||
+ | |||
+ | 1 accipiunt | ||
+ | |||
+ | 1 se | ||
+ | |||
+ | 1 veta | ||
+ | |||
+ | 1 Scisne | ||
+ | |||
+ | 1 tibi | ||
+ | |||
+ | 1 not | ||
+ | |||
+ | 1 possum | ||
+ | |||
+ | 1 posset | ||
+ | |||
+ | === Chinese corpus coverage === | ||
+ | |||
+ | Number of tokenised words in the corpus: 447 | ||
+ | |||
+ | Coverage: 100.00% | ||
+ | |||
+ | === lat → zho === | ||
+ | |||
+ | WER: 97.84% | ||
+ | |||
+ | PER: 97.04% | ||
+ | |||
+ | (after adding spaces to zho.sentences.txt) | ||
+ | |||
+ | === zho → lat === | ||
+ | |||
+ | WER: 95.45% | ||
− | + | PER: 89.39% | |
− | [[Category: | + | [[Category:Sp18_StructuralTransfer]] |
Latest revision as of 17:02, 30 July 2018
This is the page for the structural transfer of Latin and Mandarin Chinese. The main page for this language pair can be found here.
Contents
Pre-evaluation
Latin corpus coverage
Number of tokenised words in the corpus: 380
Coverage: 88.68%
Top unknown words in the corpus:
2 potest
2 facet
2 possit
2 quo
1 tibi
1 Mariaene
1 quid
1 audit
1 possum
1 matrae
1 James
1 loquent
1 audire
1 not
1 Videbasne
1 duo
1 poterunt
1 eae
1 aliquid
1 posset
Chinese corpus coverage
Number of tokenised words in the corpus: 447
Coverage: 100.00%
lat → zho
WER: 586.79%
PER: 586.79%
zho → lat
WER: 97.42%
PER: 93.55%
Implementation
lat → zho
For translating into Chinese, we implemented macros that take case off of nouns and add the associative particle 的 to nouns that, in Latin, had the genitive case.
Maria caput pueri videt.
"Maria saw the boy's head."
tagger:
^Maria<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>$ ^puer<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>$^.<sent>$
biltrans:
^Maria<np><ant><f><sg><nom>/小红<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>/头<n><nt><sg><nom>$ ^puer<n><m><sg><gen>/男孩<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>/看到<vblex><pri><act><p3><sg>$^.<sent>/。<sent>$
chunker:
apertium-transfer: Rule 2 caput<n><nt><sg><nom>/头<n><nt><sg><nom>
apertium-transfer: Rule 2 puer<n><m><sg><gen>/男孩<n><m><sg><gen>
apertium-transfer: Rule 1 .<sent>/。<sent> ^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$
interchunk:
Rule 1 noun<SN>{^男孩<n>$ ^的<pr>$} ^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$
postchunk:
^小红<np><ant><f><sg><nom>$ ^头<n>$ ^男孩<n>$ ^的<pr>$ ^看到<vblex><pri><act><p3><sg>$^。<sent>$
lat-zho:
#小红 头 男孩 的 #看到。
zho → lat
For translating into Latin, we implemented rules to take away the preposition 给 and change the following noun into dative case.
我给狗听。
"I listen for the dog (i.e., for the dog's sake/at the dog's behest)."
tagger:
^我<prn>$ ^给<pr>$ ^狗<n>$ ^听<vblex>$
biltrans:
^我<prn>/ego<prn>$ ^给<pr>/$ ^狗<n>/canis<n><m>$ ^听<vblex>/auscultare$
chunker:
apertium-transfer: Rule 2 给<pr>/
apertium-transfer: Rule 3 狗<n>/canis<n><m> ^default<default>{^ego<prn>$}$ ^pr<SPR><dat>{}$ ^nom<SN><ND><CD>{^canis<n><m><2><3>$}$ ^default<default>{^auscultare$}$
interchunk:
apertium-interchunk: Rule 2 pr<SPR><dat>{} nom<SN><ND><CD>{^canis<n><m><2><3>$} ^default<default>{^ego<prn>$}$ ^nom<SN><sg><dat>{^canis<n><m><2><3>$}$ ^default<default>{^auscultare$}$
postchunk:
^ego<prn>$ ^canis<n><m><sg><dat>$ ^auscultare$
zho-lat
#ego cani #auscultare
Post-evaluation
Latin corpus coverage
Number of tokenised words in the corpus: 380
Coverage: 88.68%
Top unknown words in the corpus:
2 possit
2 potest
2 quo
2 facet
1 abest
1 duo
1 Id
1 Te
1 quid
1 vult
1 Utrique
1 poterunt
1 accipiunt
1 se
1 veta
1 Scisne
1 tibi
1 not
1 possum
1 posset
Chinese corpus coverage
Number of tokenised words in the corpus: 447
Coverage: 100.00%
lat → zho
WER: 97.84%
PER: 97.04%
(after adding spaces to zho.sentences.txt)
zho → lat
WER: 95.45%
PER: 89.39%