Spring 2019/Structural transfer
The basic idea of structural transfer in RBMT
The idea of structural transfer in RBMT is to deal with the order and tag differences encountered in translation between two languages
How structural transfer works in Apertium
There are three stages of structural transfer in Apertium: chunker (t1x), interchunk (t2x), postchunk (t3x). The effect of some rules implemented at each stage are shown below:
Chunker has access to word-level lemmas and tags, interchunk has access to chunk-level names and tags, and postchunk has access only to chunk-level names.
The structure of a transfer file
The rules in a transfer file go in
<rule>...</rule> consists of a
<pattern>...</pattern> and an
The matched pattern is an ordered list of
<pattern-item>...</pattern-item>s, whose names refer to
<def-cat>...</def-cat>s, which contain
<cat-item tags=""/>s (tags defined as in lexical selection) and are defined in
The action section of a rule can contain
<out>...</out> blocks containing the general structure of what is output in place of the matched pattern,
<let>...</let> statements for setting variables (defined in
<section-def-vars>...</section-def-vars>) or mutating tags,
<choose>...</choose> conditional blocks,
<call-macro>...</call-macro> statements for calling a macro.
Macros are defined in
<def-macro>...</def-macro> blocks inside
<section-def-macros>...</section-def-macros>. They allow any combination of parts of an action section (though
<out>...</out> blocks are to be avoided) to be used within an arbitrary action section.
<out>...</out> block should immediately contain a
<chunk>...</chunk>, which in turn contains chunk
<lu>...</lu> (lexical unit) blocks (separated by
<b/> spaces) defining the lexical unit and corresponding tags to be output. For multiple units being output as a single lexical unit,
<lu>...</lu> blocks should be wrapped in an
Each lexical unit consists of
<clip/>s, which contain the attributes
pos="" for position matched in the pattern,
side="" for the side to output, and
part="" for the part of the material to output. Parts can be specified as
lem for the lemma,
whole for the entirety, and any set of tags (as a list of
<attr-item/>s) you define as
Plenty of examples are available:
- eng-kir transfer that covers the example above and basically nothing else.
- en-es: a mature translation pair with well developed structural transfer for English-Spanish and Spanish-English translation.
- And lots in between.
- eng-spa: a basic example from class showing how to transfer adjective+noun from English to Spanish ("large cats → gatos largos": number and gender agreement and reordering) using chunking (chunker+interchunk).
One of the best documented features of Apertium are its transfer rules. Here are some places to read, in approximate order of level of complexity
- Adding structural transfer rules to an existing pair
- Examples of transfer rules
- A long introduction to transfer rules
- Full apertium documentation (section 3.5 covers the transfer module)
This assignment is due at the end of week 12 (this semester, noon on Friday, April 7, 2017).
- Add a page to the wiki called
Language1_and_Language2/Structural_transfer, linking to it from the main page on the language pair.
- Put the page in the category Category:Sp17_StructuralTransfer.
- Perform WER, PER, and coverage tests on your short sentences corpus, and add this in to a pre-evaluation section.
- Implement at least one item from your contrastive grammar.
- Each person in each group should implement at least one item for the direction that translates into the language that they have been primarily working with. The same item does not need to be used for each direction.
- If the contrastive grammar item only involves relabelling or reordering tags within the same form, then please do at least two items.
- Add to your structural transfer wiki page:
- Add at least one example sentence for each item you implement. Show the outputs of the following modes for your translation system: tagger, biltrans, chunker, interchunk, postchunk, and the pair itself (abc-xyz).
- Perform WER, PER, and coverage tests again, and add into a post-evaluation section on the wiki page.