Asturian and English

From LING073
Jump to: navigation, search

Resources for machine translation between Asturian and English

Machine Translation Resources

Github repository containing a number of tools for machine translation, including small test corpuses, a corpus of the Asturian Wikipedia, UD-annotated ConLLU files, and UDPipe models for UD annotation

Contrastive Grammar

Contrastive Grammar

Lexical Selection

Lexical Selection

Dependency Annotation

Dependency Annotation

ast -> eng evaluation

WER: 100%

PER: 100%

Number of words which were free rides: 0

Additions

Stems: added 100 new stems to the dictionary file

Lexical selection: Implemented 3 new lexical selection rules

Structural Transfer: Implemented 3 new structural transfer rules

Evaluation

Asturian Transducer

Coverage against the large corpus wasn't available because of encoding errors.

Measuring precision and recall between the annotated corpus and the sample corpus also wasn't available because of corpus length errors and compound words in Asturian.

There are 39861079 words in the longer corpus.

Total number of stems in transducer could not be calculated because the number of stems in the apertium-ast.ast.dix were too numerous to count by hand and the transducer did not use a lexc file.

Asturian-English Translation System

WER over longer corpus: 99.18% PER over longer corpus: 98.47%

Proportion of stems tranlated correctly in the longer corpus: what script do I use to find this?