Difference between revisions of "Latin and Mandarin Chinese/Structural transfer"
Line 68: | Line 68: | ||
PER: 93.55% | PER: 93.55% | ||
+ | |||
+ | == Implementation == | ||
+ | |||
+ | === lat → zho === | ||
+ | |||
+ | For translating into Chinese, we implemented macros that take case off of nouns and add the associative particle 的 to nouns that, in Latin, had the genitive case. | ||
+ | |||
+ | ''Maria caput pueri videt.'' | ||
+ | |||
+ | "Maria saw the boy's head." | ||
+ | |||
+ | tagger: ^Maria<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>$ ^puer<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>$^.<sent>$ | ||
+ | |||
+ | biltrans: ^Maria<np><ant><f><sg><nom>/小红<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>/头<n><nt><sg><nom>$ ^puer<n><m><sg><gen>/男孩<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>/看到<vblex><pri><act><p3><sg>$^.<sent>/。<sent>$ | ||
+ | |||
+ | chunker: apertium-transfer: Rule 2 caput<n><nt><sg><nom>/头<n><nt><sg><nom> | ||
+ | |||
+ | apertium-transfer: Rule 2 puer<n><m><sg><gen>/男孩<n><m><sg><gen> | ||
+ | |||
+ | apertium-transfer: Rule 1 .<sent>/。<sent> | ||
+ | ^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$ | ||
+ | |||
+ | interchunk: Rule 1 noun<SN>{^男孩<n>$ ^的<pr>$} | ||
+ | ^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$ | ||
+ | |||
+ | postchunk: ^小红<np><ant><f><sg><nom>$ ^头<n>$ ^男孩<n>$ ^的<pr>$ ^看到<vblex><pri><act><p3><sg>$^。<sent>$ | ||
+ | |||
+ | lat-zho: #小红 头 男孩 的 #看到。 | ||
[[Category:Sp17_StructuralTransfer]] | [[Category:Sp17_StructuralTransfer]] |
Revision as of 21:34, 14 April 2018
This is the page for the structural transfer of Latin and Mandarin Chinese. The main page for this language pair can be found here.
Contents
Pre-evaluation
Latin corpus coverage
Number of tokenised words in the corpus: 380
Coverage: 88.68%
Top unknown words in the corpus:
2 potest
2 facet
2 possit
2 quo
1 tibi
1 Mariaene
1 quid
1 audit
1 possum
1 matrae
1 James
1 loquent
1 audire
1 not
1 Videbasne
1 duo
1 poterunt
1 eae
1 aliquid
1 posset
Chinese corpus coverage
Number of tokenised words in the corpus: 447
Coverage: 100.00%
lat → zho
WER: 586.79%
PER: 586.79%
zho → lat
WER: 97.42%
PER: 93.55%
Implementation
lat → zho
For translating into Chinese, we implemented macros that take case off of nouns and add the associative particle 的 to nouns that, in Latin, had the genitive case.
Maria caput pueri videt.
"Maria saw the boy's head."
tagger: ^Maria<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>$ ^puer<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>$^.<sent>$
biltrans: ^Maria<np><ant><f><sg><nom>/小红<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>/头<n><nt><sg><nom>$ ^puer<n><m><sg><gen>/男孩<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>/看到<vblex><pri><act><p3><sg>$^.<sent>/。<sent>$
chunker: apertium-transfer: Rule 2 caput<n><nt><sg><nom>/头<n><nt><sg><nom>
apertium-transfer: Rule 2 puer<n><m><sg><gen>/男孩<n><m><sg><gen>
apertium-transfer: Rule 1 .<sent>/。<sent> ^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$
interchunk: Rule 1 noun<SN>{^男孩<n>$ ^的<pr>$} ^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$
postchunk: ^小红<np><ant><f><sg><nom>$ ^头<n>$ ^男孩<n>$ ^的<pr>$ ^看到<vblex><pri><act><p3><sg>$^。<sent>$
lat-zho: #小红 头 男孩 的 #看到。