Spring 2019/Structural transfer
The basic idea of structural transfer in RBMT
The idea of structural transfer in RBMT is to deal with the order and tag differences encountered in translation between two languages
How structural transfer works in Apertium
There are three stages of structural transfer in Apertium: chunker (t1x), interchunk (t2x), postchunk (t3x). The effect of some rules implemented at each stage are shown below:
Chunker has access to word-level lemmas and tags, interchunk has access to chunk-level names and tags, and postchunk has access only to chunk-level names.
The structure of a transfer file
The rules in a transfer file go in
<rule>...</rule> consists of a
<pattern>...</pattern> and an
The matched pattern is an ordered list of
<pattern-item>...</pattern-item>s, whose names refer to
<def-cat>...</def-cat>s, which contain
<cat-item tags=""/>s (tags defined as in lexical selection) and are defined in
The action section of a rule can contain
<out>...</out> blocks containing the general structure of what is output in place of the matched pattern,
<let>...</let> statements for setting variables (defined in
<section-def-vars>...</section-def-vars>) or mutating tags,
<choose>...</choose> conditional blocks,
<call-macro>...</call-macro> statements for calling a macro.
Macros are defined in
<def-macro>...</def-macro></codes> blocks inside <code><section-def-macros>...</section-def-macros>. They allow any combination of parts of an action section (though
<out>...</out> blocks are to be avoided) to be used within an arbitrary action section.
<out>...</out> block should immediately contain a
<chunk>...</chunk>, which in turn contains chunk
<lu>...</lu> (lexical unit) blocks (separated by
<b/> spaces) defining the lexical unit and corresponding tags to be output.
Each lexical unit consists of
<clip/>s, which contain the attributes
pos="" for position matched in the pattern,
side="" for the side to output, and
part="" for the part of the material to output. Parts can include
lem for the lemma,
whole for the entirety, and any set of tags (
<attr-item/>s) you define as
Plenty of examples are available:
- eng-kir transfer that covers the example above and basically nothing else.
- en-es: a mature translation pair with well developed structural transfer for English-Spanish and Spanish-English translation.
- And lots in between.
One of the best documented features of Apertium are its transfer rules. Here are some places to read, in approximate order of level of complexity
- Adding structural transfer rules to an existing pair
- Examples of transfer rules
- A long introduction to transfer rules
- Full apertium documentation (section 3.5 covers the transfer module)
- Implement at least one thing in your contrastive grammar.
TODO: additional requirements