Difference between revisions of "Latin and Mandarin Chinese/Structural transfer"

From LING073
Jump to: navigation, search
Line 6: Line 6:
  
 
Number of tokenised words in the corpus: 380
 
Number of tokenised words in the corpus: 380
 +
 
Coverage: 88.68%
 
Coverage: 88.68%
 +
 
Top unknown words in the corpus:
 
Top unknown words in the corpus:
 +
 
2 potest
 
2 potest
 +
 
2 facet
 
2 facet
2 possit
+
 
 +
2   possit
 +
 
 
2 quo
 
2 quo
1 tibi
+
 
 +
1   tibi
 +
 
 
1 Mariaene
 
1 Mariaene
 +
 
1 quid
 
1 quid
 +
 
1 audit
 
1 audit
 +
 
1 possum
 
1 possum
 +
 
1 matrae
 
1 matrae
 +
 
1 James
 
1 James
 +
 
1 loquent
 
1 loquent
 +
 
1 audire
 
1 audire
 +
 
1 not
 
1 not
 +
 
1 Videbasne
 
1 Videbasne
 +
 
1 duo
 
1 duo
 +
 
1 poterunt
 
1 poterunt
 +
 
1 eae
 
1 eae
 +
 
1 aliquid
 
1 aliquid
 +
 
1 posset
 
1 posset
  
Line 32: Line 54:
  
 
Number of tokenised words in the corpus: 447
 
Number of tokenised words in the corpus: 447
 +
 
Coverage: 100.00%
 
Coverage: 100.00%
  

Revision as of 19:53, 11 April 2018

This is the page for the structural transfer of Latin and Mandarin Chinese. The main page for this language pair can be found here.

Pre-evaluation

Latin corpus coverage

Number of tokenised words in the corpus: 380

Coverage: 88.68%

Top unknown words in the corpus:

2 potest

2 facet

2 possit

2 quo

1 tibi

1 Mariaene

1 quid

1 audit

1 possum

1 matrae

1 James

1 loquent

1 audire

1 not

1 Videbasne

1 duo

1 poterunt

1 eae

1 aliquid

1 posset

Chinese corpus coverage

Number of tokenised words in the corpus: 447

Coverage: 100.00%

lat → zho

WER: 586.79%

PER: 586.79%

zho → lat

WER: 97.42%

PER: 93.55%