Difference between revisions of "Apertium-init"

From LING073
Jump to: navigation, search
m (Bootstrapping a translation pair)
 
(22 intermediate revisions by 3 users not shown)
Line 1: Line 1:
You can the '''apertium-init''' or '''bootstrapping''' tool to create a directory for quick development of a transducer.
+
You can the '''apertium-init''' tool to create ("bootstrap") a directory for quick development of a transducer or translation pair.
  
== Install apertium-init ==
+
== Installing apertium-init ==
Download apertium-init and install it:
+
If needed (note that <code>apertium-init</code> '''should already be installed on the CS lab machines'''!), download apertium-init and install it:
* <code>cd ~/Source; git clone https://github.com/goavki/bootstrap; cd bootstrap; sudo make install</code>
+
* <code>cd ~/ling073; git clone https://github.com/apertium/apertium-init; cd apertium-init; PREFIX=$HOME make install</code>
  
== Create a language module ==
+
== Creating a language module ==
Create an hfst-based apertium language module (in your Source directory), replacing <code>xyz</code> with the ISO code of your language in all occurrences:
+
Create an hfst-based apertium language module (in your <code>~/ling073</code> directory), replacing <code>xyz</code> with the ISO code of your language in all occurrences:
 
* <code>apertium-init -a hfst xyz</code>
 
* <code>apertium-init -a hfst xyz</code>
 
Rename the module <code>ling073-xyz</code> if you want (so that it matches what will be in github later, and any further instructions):
 
Rename the module <code>ling073-xyz</code> if you want (so that it matches what will be in github later, and any further instructions):
Line 12: Line 12:
  
 
Notes:
 
Notes:
* You will probably get an error about SVN and the directory not being a working copy—you can safely ignore this.
+
* If get an error about SVN and the directory not being a working copy, then you have an old version of apertium-init.
 
* If something goes wrong (e.g., you make a typo), delete any directories/files that were created and try the step again.
 
* If something goes wrong (e.g., you make a typo), delete any directories/files that were created and try the step again.
  
== Commit to github ==
+
'''For the first day assignment''', skip down to [[#Pushing to github]]
If this is a language pair you would like to commit to github, do the following '''before modifying any files or compiling''' the module:
+
 
 +
== Bootstrapping a translation pair ==
 +
To bootstrap a translation pair whose primary function is to translate from language <code>xyz</code> to language <code>abc</code>, do the following:
 +
# Make sure you '''have a copy of both of the language modules''' you'll need (one for each language) and that they are both '''initialised and compiled'''.
 +
# '''Check which formalism each transducer is written in.'''
 +
#* The transducer you've written so far in this class should be written using <code>lexd</code>.  If the transducer that the module you cloned for the other language has a file like <code>apertium-abc.abc.dix</code>, then it's written using lttoolbox; if it has a file like <code>apertium-abc.abc.lexc</code>, then it's written using HFST.
 +
# The following command will initialise the directory for the translation pair:
 +
#* <code>apertium-init --a1 lexd --a2 hfst -t rtx --prefix ling073 xyz-abc</code>
 +
#* The <code>--a1</code> and <code>--a2</code> arguments tell <code>apertium-init</code> what formalism your transducers are written in.  You may need to say "<code>lttoolbox</code>" instead of "<code>hfst</code>" for one or more of those options. (If you are pairing with <code>apertium-eng</code> you will need <code>--a2 lttoolbox</code>.) The <code>-t</code> indicates which formalism you want to use for structural transfer.
 +
<!-- # Rename the directory to <code>ling073-xyz-abc</code>. -->
 +
# Create an empty <code>ling073-xyz-abc</code> repository in the semester's github group ([https://github.swarthmore.edu/Ling073-sp22/ Ling073-sp22]) being sure not to add a README or any other default files to it, set a remote origin in your repo, and push (for the last two, see [[#Push to github]] below).  Make sure all members of your group have access to the repository.
 +
# '''Initialise''' the compiler (needed once for each copy of the new directory, i.e., each member of the group will need to do it once they have a clone of the repository) with the following command:
 +
#* <code>./autogen.sh --with-lang1=/path/to/ling073-xyz --with-lang2=/path/to/apertium-abc</code>
 +
#* You'll need to substitute <code>/path/to/ling073-xyz</code> and <code>/path/to/apertium-abc</code> with the paths to the source-language transducer and the target-language transducers, respectively. If you have all the directories next to each other (recommended), these paths will be <code>../ling073-xyz</code> and <code>../apertium-abc</code>.
 +
# Compile with <code>make</code> as always.  Make sure each repository has been (initialised and) compiled recently first.
 +
#* If you get an error about "<code>Empty set of final states.</code>", this just means your dictionary is empty and you need to [[lexical transfer#Add to the lexicon|add words to it]].
 +
 
 +
== Pushing to github ==
 +
If this is a language pair you would like to push to github, do the following ideally before modifying any files or compiling the module:
 
# Create an ''empty'' (no files) repository named <code>ling073-xyz</code> on github.
 
# Create an ''empty'' (no files) repository named <code>ling073-xyz</code> on github.
# Initialise your new <code>ling073-xyz</code> directory as a github repository, and commit all the files:
+
<!-- # Initialise your new <code>ling073-xyz</code> directory as a github repository, and commit all the files:
 
#* <code>cd ling073-xyz; git init ./ ; git add * ; git commit -m "initialising directory with bootstrapped module"</code>
 
#* <code>cd ling073-xyz; git init ./ ; git add * ; git commit -m "initialising directory with bootstrapped module"</code>
# Set the github repository you created as the remote origin, replacing "username" and "xyz" below as appropriate:
+
-->
#* <code>git remote add origin git@github.swarthmore.edu:username/ling073-xyz.git</code>
+
# Make sure the repository really was created correctly by running <code>git log</code>.  You should see a single commit named "initial commit".
 +
# Set the github repository you created as the remote origin:
 +
#* <code>git remote add origin git@github.swarthmore.edu:groupname/ling073-xyz.git</code> (replacing "groupname" and "xyz" as appropriate)
 
# Push the bootstrapped module to origin:
 
# Push the bootstrapped module to origin:
 
#* <code>git push --set-upstream origin master</code>
 
#* <code>git push --set-upstream origin master</code>

Latest revision as of 10:43, 3 May 2022

You can the apertium-init tool to create ("bootstrap") a directory for quick development of a transducer or translation pair.

Installing apertium-init

If needed (note that apertium-init should already be installed on the CS lab machines!), download apertium-init and install it:

Creating a language module

Create an hfst-based apertium language module (in your ~/ling073 directory), replacing xyz with the ISO code of your language in all occurrences:

  • apertium-init -a hfst xyz

Rename the module ling073-xyz if you want (so that it matches what will be in github later, and any further instructions):

  • mv apertium-xyz ling073-xyz

Notes:

  • If get an error about SVN and the directory not being a working copy, then you have an old version of apertium-init.
  • If something goes wrong (e.g., you make a typo), delete any directories/files that were created and try the step again.

For the first day assignment, skip down to #Pushing to github

Bootstrapping a translation pair

To bootstrap a translation pair whose primary function is to translate from language xyz to language abc, do the following:

  1. Make sure you have a copy of both of the language modules you'll need (one for each language) and that they are both initialised and compiled.
  2. Check which formalism each transducer is written in.
    • The transducer you've written so far in this class should be written using lexd. If the transducer that the module you cloned for the other language has a file like apertium-abc.abc.dix, then it's written using lttoolbox; if it has a file like apertium-abc.abc.lexc, then it's written using HFST.
  3. The following command will initialise the directory for the translation pair:
    • apertium-init --a1 lexd --a2 hfst -t rtx --prefix ling073 xyz-abc
    • The --a1 and --a2 arguments tell apertium-init what formalism your transducers are written in. You may need to say "lttoolbox" instead of "hfst" for one or more of those options. (If you are pairing with apertium-eng you will need --a2 lttoolbox.) The -t indicates which formalism you want to use for structural transfer.
  4. Create an empty ling073-xyz-abc repository in the semester's github group (Ling073-sp22) being sure not to add a README or any other default files to it, set a remote origin in your repo, and push (for the last two, see #Push to github below). Make sure all members of your group have access to the repository.
  5. Initialise the compiler (needed once for each copy of the new directory, i.e., each member of the group will need to do it once they have a clone of the repository) with the following command:
    • ./autogen.sh --with-lang1=/path/to/ling073-xyz --with-lang2=/path/to/apertium-abc
    • You'll need to substitute /path/to/ling073-xyz and /path/to/apertium-abc with the paths to the source-language transducer and the target-language transducers, respectively. If you have all the directories next to each other (recommended), these paths will be ../ling073-xyz and ../apertium-abc.
  6. Compile with make as always. Make sure each repository has been (initialised and) compiled recently first.
    • If you get an error about "Empty set of final states.", this just means your dictionary is empty and you need to add words to it.

Pushing to github

If this is a language pair you would like to push to github, do the following ideally before modifying any files or compiling the module:

  1. Create an empty (no files) repository named ling073-xyz on github.
  2. Make sure the repository really was created correctly by running git log. You should see a single commit named "initial commit".
  3. Set the github repository you created as the remote origin:
    • git remote add origin git@github.swarthmore.edu:groupname/ling073-xyz.git (replacing "groupname" and "xyz" as appropriate)
  4. Push the bootstrapped module to origin:
    • git push --set-upstream origin master
  5. After this you should be able to see the same files from the github web interface and in the directory. You should also be able to commit, push, pull, etc. all normally.