Difference between revisions of "Structural transfer"

From LING073
Jump to: navigation, search
(Syntactic Structures and Parsing)
(Syntactic Structures and Parsing)
Line 12: Line 12:
 
=== Syntactic Structures and Parsing ===
 
=== Syntactic Structures and Parsing ===
 
{{comment|under construction}}
 
{{comment|under construction}}
 +
[[File:Tree alignment eng-kir PP.png|thumb|The mapping between phrase-structure trees of the Kyrgyz and English sentence above]]
 
[[File:Tree alignment eng-spa PP.png|thumb|The mapping between phrase-structure trees of "in the big beautiful houses" (English) and "en las casas largas y bonitas" (Spanish)]]
 
[[File:Tree alignment eng-spa PP.png|thumb|The mapping between phrase-structure trees of "in the big beautiful houses" (English) and "en las casas largas y bonitas" (Spanish)]]
  

Revision as of 00:40, 26 April 2021

Background

The basic idea of structural transfer in RBMT

The idea of structural transfer in RBMT is to deal with the order and tag differences encountered in translation between two languages

The arrows between the two tagged levels represent where structural transfer is needed. Colour coding shows [rough] correspondences.

How structural transfer works in Apertium

Transfer takes the output of the biltrans mode (bilingual translation), matches series of words based on patterns you define, and performs operations on and output those things. It allows you to change the order of words, change tags, etc.

Syntactic Structures and Parsing

under construction

The mapping between phrase-structure trees of the Kyrgyz and English sentence above
The mapping between phrase-structure trees of "in the big beautiful houses" (English) and "en las casas largas y bonitas" (Spanish)

The structure of a transfer file

under construction

Some things to note

  • weighting

under construction

Examples of implemented Apertium transfer systems

Some examples are available:

  • eng-spa (in-class): a basic example from class showing how to transfer adjective+noun from English to Spanish ("big houses → casas largas": number and gender agreement and reordering) using chunking (chunker+interchunk).
  • eng-kir (apertium)
  • kaz-kir (apertium)
  • br-fr

under construction

Writing rules

Documentation is available...

under construction

Evaluating

Scrape a mini test corpus

  1. First make sure you have scrapeTransferTests. Test that running scrapeTransferTests gives you information on using the tool. If not, clone the tools repo (or git pull to update it, if you already have it cloned from other assignments) and run sudo make. Test again.
  2. Scrape the transferTests from your contrastive grammar page into a small parallel corpus. E.g., scrapeTransferTests -p abc-xyz "Language1_and_Language2/Contrastive_Grammar" will result in an abc.tests.txt and xyz.tests.txt file that contain the respective sides of any transferTests on your contrastive grammar page specified as being for abc-to-xyz translation.
  3. Add these two files to your bilingual corpus repository and add mention of their origin (the wiki page) to the MANIFEST file.

WER and PER

WER or word error rate is a measure of how different two texts are. You will want to know how different the translation your translation pair performs (the "test translation") is from the known good translation of phrases in your parallel corpus (the "reference translation").

PER (position-independent error rate) is the same measurement, just not sensitive to position in a phrase. I.e., a correct translation of every word but in an entirely wrong word order will give you high (bad) WER but low (good) PER.

To test WER and PER:

  1. First make sure you have apertium-eval-translator. Test that running apertium-eval-translator gives you information on using the tool. If not, clone the tools repo (or git pull to update it, if you already have it cloned from other assignments) and run make.
  2. You need two files: one test translation, and one reference translation. The reference translation is the parallel text in your corpus, e.g. abc.tests.txt. To get a test translation, run the source text through apertium and direct the output into a new file, e.g. cat xyz.tests.txt | apertium -d . xyz-abc > xyz-abc.tests.txt. You should add the [final] test translation to your repository.
  3. The following command should then give you WER and PER measures and some other useful numbers:
    • apertium-eval-translator -r abc.tests.txt -t xyz-abc.tests.txt

The assignment

This assignment is early in week 13 (this semester, noon on Monday, May 3, 2021).

Getting set up

  1. Add a page to the wiki called Language1_and_Language2/Structural_transfer, linking to it from the main page on the language pair.
    • Put the page in the category Category:Sp21_StructuralTransfer and the categories for the two languages.
    • Perform WER, PER, and coverage tests on your short sentences corpus, and add this in to a pre-evaluation section.

Adding stems

  1. Add all the words for the transfer tests (from the last assignment) to analyse to bilingual dictionary.
    • And make sure both analysers can analyse all sentences correctly, which includes adding the words to the relevant monolingual dictionaries as necessary.

Write structural transfer rules

  1. Implement at least one item from your contrastive grammar.
    • Each person in each group should implement at least one item for the direction that translates into the language that they have been primarily working with. The same item does not need to be used for each direction.
    • If the contrastive grammar item only involves relabelling or reordering tags within the same form, then please do at least two items.

Wrapping up

  1. Add to your structural transfer wiki page:
    • Add at least one example sentence for each item you implement. Show the outputs of the following modes for your translation system: tagger, biltrans, transfer, and the pair itself (abc-xyz).
    • Perform WER, PER, and coverage tests again, and add into a post-evaluation section on the wiki page.