Difference between revisions of "Latin and Mandarin Chinese"
|Line 143:||Line 143:|
Revision as of 22:49, 24 April 2018
Resources for machine translation between Latin and Mandarin Chinese
lat → zho evaluation
Sentences WER: 586.79%
Sentences PER: 586.79%
Tests WER: 609.9%
Tests PER: 609.9%
zho → lat evaluation
Sentences WER: 97.42%
Sentences PER: 93.55%
Tests WER: 95.45%
Tests PER: 89.39%
Proportion of stems translated correctly
One case of one-to-many mapping from Chinese to Latin has to do with the preposition 在. This is a general locative preposition in Chinese, and it can be used to mean anything from 'in' to 'above' to 'outside' to 'to in front of' to 'west of.' Usually, the object of 在 is either a place word (such as 'China' or 'the library') or a word that means 'in', 'above', etc. which is then modified by a noun, resulting in a phrase sort of equivalent to "在 the inside of the box" or "在 the top of the table." In contrast, Latin has many different prepositions which specify different locational relationships to the object of the preposition, much like English.
Another case has to do with pronouns; each Latin pronoun has many different forms for different cases. In contrast, in Chinese each pronoun has at most two different forms: one singular and one plural.
It's a bit more difficult to find one-to-many mappings from Latin to Chinese due to the nature of the beast. However, note that Chinese is a classifier language, meaning that you need a classifier or measure word in order to couple a noun with a numeral or determiner. Different nouns can have different measure words depending on the shape of the object they represent, among other factors. This is different from Latin, in which you don't have to worry about all of these different measure words when pairing a noun with a numeral; you just have to make sure the numeral agrees with the noun in terms of gender, number, and case. In that sense, this might be considered a one-to-many mapping (and sometimes literally a "one" to many mapping).
Also note that Chinese has a number of sentence-ending particles that can be used to "soften" or otherwise modulate a sentence. For example, the particle 呢 often marks a rhetorical question. In Latin, a rhetorical question would, I think, have the same form as a genuine question, making this another possible one-to-many mapping from Latin to Chinese.
Added 2 new lexical selection rules
Added 2 new structural transfer rules
Precision and recall
Totals: 135 forms, 356 tp, 0 fp, 0 tn, 0 fn
Coverage over large corpus: 100.00%
# of words in large corpus: 232,490
# of stems in transducer: ~8574
Number of words in reference: 441
Number of words in test: 516
Number of unknown words (marked with a star) in test: 139
Percentage of unknown words: 26.94 %
Trimmed coverage: 100.00%
Number of tokenised words in the corpus: 4997
Top unknown words in the corpus: