Spring 2019/Structural transfer

From LING073
Revision as of 10:41, 30 March 2017 by Jwashin1 (talk | contribs) (Background)

Jump to: navigation, search

Background

The basic idea of structural transfer in RBMT

The idea of structural transfer in RBMT is to deal with the order and tag differences encountered in translation between two languages

The arrows between the two tagged levels represent where structural transfer is needed. Colour coding shows [rough] correspondences.

How structural transfer works in Apertium

Three levels

There are three stages of structural transfer in Apertium: chunker (t1x), interchunk (t2x), postchunk (t3x). The effect of some rules implemented at each stage are shown below:

Each stage of structural transfer: chunker, interchunk, postchunk

Chunker has access to word-level lemmas and tags, interchunk has access to chunk-level names and tags, and postchunk has access only to chunk-level names.

The structure of a transfer file

The rules in a transfer file go in <section-rules>...</section-rules>. Each <rule>...</rule> consists of a <pattern>...</pattern> and an <action>...</action>.

The matched pattern is an ordered list of <pattern-item>...</pattern-item>s, whose names refer to <def-cat>...</def-cat>s, which contain <cat-item tags=""/>s (tags defined as in lexical selection) and are defined in <section-def-cats>...</section-def-cats>.

The action section of a rule can contain <out>...</out> blocks containing the general structure of what is output in place of the matched pattern, <let>...</let> statements for setting variables (defined in <section-def-vars>...</section-def-vars>) or mutating tags, <choose>...</choose> conditional blocks, <call-macro>...</call-macro> statements for calling a macro.

Macros are defined in <def-macro>...</def-macro></codes> blocks inside <code><section-def-macros>...</section-def-macros>. They allow any combination of parts of an action section (though <out>...</out> blocks are to be avoided) to be used within an arbitrary action section.

An <out>...</out> block should immediately contain a <chunk>...</chunk>, which in turn contains chunk <tags>...</tags> and <lu>...</lu> (lexical unit) blocks (separated by <b/> spaces) defining the lexical unit and corresponding tags to be output.

Each lexical unit consists of <clip/>s, which contain the attributes pos="" for position matched in the pattern, side="" for the side to output, and part="" for the part of the material to output. Parts can include lem for the lemma, whole for the entirety, and any set of tags (<attr-item/>s) you define as <def-attr>...</def-attr> in <section-def-attrs>...</section-def-attrs>.

Plenty of examples are available:

  • eng-kir transfer that covers the example above and basically nothing else.
  • en-es: a mature translation pair with well developed structural transfer for English-Spanish and Spanish-English translation.
  • And lots in between.

Writing rules

The assignment

  • Implement at least one thing in your contrastive grammar.

TODO: additional requirements