Difference between revisions of "Latin and Mandarin Chinese/Structural transfer"

From LING073
Jump to: navigation, search
m
 
(7 intermediate revisions by one other user not shown)
Line 79: Line 79:
 
"Maria saw the boy's head."
 
"Maria saw the boy's head."
  
tagger: ^Maria<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>$ ^puer<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>$^.<sent>$
+
'''tagger:'''
  
biltrans: ^Maria<np><ant><f><sg><nom>/小红<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>/头<n><nt><sg><nom>$ ^puer<n><m><sg><gen>/男孩<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>/看到<vblex><pri><act><p3><sg>$^.<sent>/。<sent>$
+
^Maria<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>$ ^puer<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>$^.<sent>$
  
chunker: apertium-transfer: Rule 2 caput<n><nt><sg><nom>/头<n><nt><sg><nom>
+
'''biltrans:'''
 +
 
 +
^Maria<np><ant><f><sg><nom>/小红<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>/头<n><nt><sg><nom>$ ^puer<n><m><sg><gen>/男孩<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>/看到<vblex><pri><act><p3><sg>$^.<sent>/。<sent>$
 +
 
 +
'''chunker:'''
 +
 
 +
apertium-transfer: Rule 2 caput<n><nt><sg><nom>/头<n><nt><sg><nom>
  
 
apertium-transfer: Rule 2 puer<n><m><sg><gen>/男孩<n><m><sg><gen>
 
apertium-transfer: Rule 2 puer<n><m><sg><gen>/男孩<n><m><sg><gen>
Line 90: Line 96:
 
^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$
 
^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$
  
interchunk: Rule 1 noun<SN>{^男孩<n>$ ^的<pr>$}
+
'''interchunk:'''
 +
 
 +
Rule 1 noun<SN>{^男孩<n>$ ^的<pr>$}
 
^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$
 
^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$
  
postchunk: ^小红<np><ant><f><sg><nom>$ ^头<n>$ ^男孩<n>$ ^的<pr>$ ^看到<vblex><pri><act><p3><sg>$^。<sent>$
+
'''postchunk:'''
 +
 
 +
^小红<np><ant><f><sg><nom>$ ^头<n>$ ^男孩<n>$ ^的<pr>$ ^看到<vblex><pri><act><p3><sg>$^。<sent>$
 +
 
 +
'''lat-zho:'''
 +
 
 +
#小红 头 男孩 的 #看到。
 +
 
 +
=== zho → lat ===
 +
 
 +
For translating into Latin, we implemented rules to take away the preposition 给 and change the following noun into dative case.
 +
 
 +
我给狗听。
 +
 
 +
"I listen for the dog (i.e., for the dog's sake/at the dog's behest)."
 +
 
 +
'''tagger:'''
 +
 
 +
^我<prn>$ ^给<pr>$ ^狗<n>$ ^听<vblex>$
 +
 
 +
'''biltrans:'''
 +
 
 +
^我<prn>/ego<prn>$ ^给<pr>/$ ^狗<n>/canis<n><m>$ ^听<vblex>/auscultare$
 +
 
 +
'''chunker:'''
 +
 
 +
apertium-transfer: Rule 2 给<pr>/
 +
 
 +
apertium-transfer: Rule 3 狗<n>/canis<n><m>
 +
^default<default>{^ego<prn>$}$ ^pr<SPR><dat>{}$ ^nom<SN><ND><CD>{^canis<n><m><2><3>$}$ ^default<default>{^auscultare$}$
 +
 
 +
'''interchunk:'''
 +
 
 +
apertium-interchunk: Rule 2 pr<SPR><dat>{} nom<SN><ND><CD>{^canis<n><m><2><3>$}
 +
^default<default>{^ego<prn>$}$ ^nom<SN><sg><dat>{^canis<n><m><2><3>$}$ ^default<default>{^auscultare$}$
 +
 
 +
'''postchunk:'''
 +
 
 +
^ego<prn>$ ^canis<n><m><sg><dat>$ ^auscultare$
 +
 
 +
'''zho-lat'''
 +
 
 +
#ego cani #auscultare
 +
 
 +
== Post-evaluation ==
 +
 
 +
=== Latin corpus coverage ===
 +
 
 +
Number of tokenised words in the corpus: 380
 +
 
 +
Coverage: 88.68%
 +
 
 +
Top unknown words in the corpus:
 +
 
 +
2        possit
 +
 
 +
2        potest
 +
 
 +
2        quo
 +
 
 +
2        facet
 +
 
 +
1        abest
 +
 
 +
1        duo
 +
 
 +
1        Id
 +
 
 +
1        Te
 +
 
 +
1        quid
 +
 
 +
1        vult
 +
 
 +
1        Utrique
 +
 
 +
1        poterunt
 +
 
 +
1        accipiunt
 +
 
 +
1        se
 +
 
 +
1        veta
 +
 
 +
1        Scisne
 +
 
 +
1        tibi
 +
 
 +
1        not
 +
 
 +
1        possum
 +
 
 +
1        posset
 +
 
 +
=== Chinese corpus coverage ===
 +
 
 +
Number of tokenised words in the corpus: 447
 +
 
 +
Coverage: 100.00%
 +
 
 +
=== lat → zho ===
 +
 
 +
WER: 97.84%
 +
 
 +
PER: 97.04%
 +
 
 +
(after adding spaces to zho.sentences.txt)
 +
 
 +
=== zho → lat ===
 +
 
 +
WER: 95.45%
  
lat-zho: #小红 头 男孩 的 #看到。
+
PER: 89.39%
  
[[Category:Sp17_StructuralTransfer]]
+
[[Category:Sp18_StructuralTransfer]]

Latest revision as of 17:02, 30 July 2018

This is the page for the structural transfer of Latin and Mandarin Chinese. The main page for this language pair can be found here.

Pre-evaluation

Latin corpus coverage

Number of tokenised words in the corpus: 380

Coverage: 88.68%

Top unknown words in the corpus:

2 potest

2 facet

2 possit

2 quo

1 tibi

1 Mariaene

1 quid

1 audit

1 possum

1 matrae

1 James

1 loquent

1 audire

1 not

1 Videbasne

1 duo

1 poterunt

1 eae

1 aliquid

1 posset

Chinese corpus coverage

Number of tokenised words in the corpus: 447

Coverage: 100.00%

lat → zho

WER: 586.79%

PER: 586.79%

zho → lat

WER: 97.42%

PER: 93.55%

Implementation

lat → zho

For translating into Chinese, we implemented macros that take case off of nouns and add the associative particle 的 to nouns that, in Latin, had the genitive case.

Maria caput pueri videt.

"Maria saw the boy's head."

tagger:

^Maria<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>$ ^puer<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>$^.<sent>$

biltrans:

^Maria<np><ant><f><sg><nom>/小红<np><ant><f><sg><nom>$ ^caput<n><nt><sg><nom>/头<n><nt><sg><nom>$ ^puer<n><m><sg><gen>/男孩<n><m><sg><gen>$ ^videre<vblex><pri><act><p3><sg>/看到<vblex><pri><act><p3><sg>$^.<sent>/。<sent>$

chunker:

apertium-transfer: Rule 2 caput<n><nt><sg><nom>/头<n><nt><sg><nom>

apertium-transfer: Rule 2 puer<n><m><sg><gen>/男孩<n><m><sg><gen>

apertium-transfer: Rule 1 .<sent>/。<sent> ^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$

interchunk:

Rule 1 noun<SN>{^男孩<n>$ ^的<pr>$} ^default<default>{^小红<np><ant><f><sg><nom>$}$ ^noun<SN>{^头<n>$ }$ ^noun<SN>{^男孩<n>$ ^的<pr>$}$ ^default<default>{^看到<vblex><pri><act><p3><sg>$}$^sent<SENT>{^。<sent>$}$

postchunk:

^小红<np><ant><f><sg><nom>$ ^头<n>$ ^男孩<n>$ ^的<pr>$ ^看到<vblex><pri><act><p3><sg>$^。<sent>$

lat-zho:

#小红 头 男孩 的 #看到。

zho → lat

For translating into Latin, we implemented rules to take away the preposition 给 and change the following noun into dative case.

我给狗听。

"I listen for the dog (i.e., for the dog's sake/at the dog's behest)."

tagger:

^我<prn>$ ^给<pr>$ ^狗<n>$ ^听<vblex>$

biltrans:

^我<prn>/ego<prn>$ ^给<pr>/$ ^狗<n>/canis<n><m>$ ^听<vblex>/auscultare$

chunker:

apertium-transfer: Rule 2 给<pr>/

apertium-transfer: Rule 3 狗<n>/canis<n><m> ^default<default>{^ego<prn>$}$ ^pr<SPR><dat>{}$ ^nom<SN><ND><CD>{^canis<n><m><2><3>$}$ ^default<default>{^auscultare$}$

interchunk:

apertium-interchunk: Rule 2 pr<SPR><dat>{} nom<SN><ND><CD>{^canis<n><m><2><3>$} ^default<default>{^ego<prn>$}$ ^nom<SN><sg><dat>{^canis<n><m><2><3>$}$ ^default<default>{^auscultare$}$

postchunk:

^ego<prn>$ ^canis<n><m><sg><dat>$ ^auscultare$

zho-lat

#ego cani #auscultare

Post-evaluation

Latin corpus coverage

Number of tokenised words in the corpus: 380

Coverage: 88.68%

Top unknown words in the corpus:

2 possit

2 potest

2 quo

2 facet

1 abest

1 duo

1 Id

1 Te

1 quid

1 vult

1 Utrique

1 poterunt

1 accipiunt

1 se

1 veta

1 Scisne

1 tibi

1 not

1 possum

1 posset

Chinese corpus coverage

Number of tokenised words in the corpus: 447

Coverage: 100.00%

lat → zho

WER: 97.84%

PER: 97.04%

(after adding spaces to zho.sentences.txt)

zho → lat

WER: 95.45%

PER: 89.39%