http://wikis.swarthmore.edu/ling073/api.php?action=feedcontributions&user=Twarner2&feedformat=atomLING073 - User contributions [en]2024-03-28T18:30:48ZUser contributionsMediaWiki 1.27.7http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa/Universal_Dependencies&diff=5506Wamesa/Universal Dependencies2017-05-15T02:53:12Z<p>Twarner2: </p>
<hr />
<div>==Corpora==<br />
I took the fully ambiguous corpus from the last assignment and disambiguated it. Then I created another corpus and disambiguated it as well as adding dependency relations. There were some words that I wasn't sure what they were; I suspected them to be modals and tried to tag them with relations I thought made sense. Finally, I converted these to conllu format and removed the morphological information from them.<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Sp17_UD]]<br />
[[Category:Wamesa]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa/Universal_Dependencies&diff=5505Wamesa/Universal Dependencies2017-05-15T02:37:12Z<p>Twarner2: Created page with "Category:Sp17_UD"</p>
<hr />
<div>[[Category:Sp17_UD]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa&diff=5394Wamesa2017-05-11T02:15:30Z<p>Twarner2: /* Software */</p>
<hr />
<div>==External Resources==<br />
===Dictionaries===<br />
====Wamesa Talking Dictionary====<br />
by Emily Anne Gasser<br />
* Contains Wamesa words and some conjugated verbs, but only ones beginning with particular letters, as the project is not finished. All rights reserved.<br />
<br />
===Scientific Works===<br />
====Windesi Wamesa Morphophonology====<br />
by Emily Anne Gasser<br />
* 2014 dissertation on the morphology and phonology of Wamesa. Contains example sentences. All rights reserved.<br />
<br />
====Notes on Windesi Grammar====<br />
by H.K.J. Cowan<br />
* 1955 article on Wamesa. Very broad, reporting on aspects from syntax to phonetics/phonology with special focus on numbers, pronouns, articles, and verb conjugation. Contains very many example sentences. Some of the orthography might not agree between this and the above; unsure. Public domain.<br />
<br />
====A Sketch Grammar of Wandamen====<br />
by Naomi Saggers<br />
* 1979 publication on Wamesa phonology. Contains a few example sentences. Less focused on Windesi Wamesa than the other languages in the family. Quite hard to find a copy of.<br />
<br />
===Corpora===<br />
====Field Notebooks====<br />
by Emily Anne Gasser<br />
* Many (mostly handwritten) notebooks containing full sentences, sentence fragments, phrases, and words. Everything in Wamesa is translated into Indonesian, but far fewer things are then translated into English. By far my most fertile source of example sentences. No licensing exists, since they are unpublished.<br />
<br />
===Documentation===<br />
[https://wikis.swarthmore.edu/ling073/User:Twarner2/Language_selection Language Selection]<br/><br />
[https://wikis.swarthmore.edu/ling073/Wamesa Documenting Resources]<br/><br />
<br />
==Developed Resources==<br />
===Software===<br />
[https://wikis.swarthmore.edu/ling073/Wamesa/Keyboard Wamesa transcription keyboard]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-corpus Wamesa corpus]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-transducer Wamesa transducer]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-ton-wad-corpus Wamesa/Tongan translator] <br/><br />
[https://github.com/MidasDoas/ud-annotatrix/ UD annotatrix repo] <br/><br />
[https://midasdoas.github.io/visualise.html UD annotatrix page] <br/><br />
<br />
===Documentation===<br />
[https://wikis.swarthmore.edu/ling073/Wamesa/Grammar Grammar]<br/><br />
[https://wikis.swarthmore.edu/ling073/Wamesa/Transducer Transducer] <br/><br />
[https://wikis.swarthmore.edu/ling073/Wamesa/Disambiguator Disambiguator] <br/><br />
[https://wikis.swarthmore.edu/ling073/Wamesa_and_Tongan Wamesa/Tongan translator] <br/><br />
[https://wikis.swarthmore.edu/ling073/User:Twarner2/Final_project UD annotatrix] <br/><br />
<br />
<br />
[[Category:sp17_ResourceDocumentation]]<br />
[[Category:Wamesa]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa&diff=5393Wamesa2017-05-11T02:15:11Z<p>Twarner2: /* Developed Resources */</p>
<hr />
<div>==External Resources==<br />
===Dictionaries===<br />
====Wamesa Talking Dictionary====<br />
by Emily Anne Gasser<br />
* Contains Wamesa words and some conjugated verbs, but only ones beginning with particular letters, as the project is not finished. All rights reserved.<br />
<br />
===Scientific Works===<br />
====Windesi Wamesa Morphophonology====<br />
by Emily Anne Gasser<br />
* 2014 dissertation on the morphology and phonology of Wamesa. Contains example sentences. All rights reserved.<br />
<br />
====Notes on Windesi Grammar====<br />
by H.K.J. Cowan<br />
* 1955 article on Wamesa. Very broad, reporting on aspects from syntax to phonetics/phonology with special focus on numbers, pronouns, articles, and verb conjugation. Contains very many example sentences. Some of the orthography might not agree between this and the above; unsure. Public domain.<br />
<br />
====A Sketch Grammar of Wandamen====<br />
by Naomi Saggers<br />
* 1979 publication on Wamesa phonology. Contains a few example sentences. Less focused on Windesi Wamesa than the other languages in the family. Quite hard to find a copy of.<br />
<br />
===Corpora===<br />
====Field Notebooks====<br />
by Emily Anne Gasser<br />
* Many (mostly handwritten) notebooks containing full sentences, sentence fragments, phrases, and words. Everything in Wamesa is translated into Indonesian, but far fewer things are then translated into English. By far my most fertile source of example sentences. No licensing exists, since they are unpublished.<br />
<br />
===Documentation===<br />
[https://wikis.swarthmore.edu/ling073/User:Twarner2/Language_selection Language Selection]<br/><br />
[https://wikis.swarthmore.edu/ling073/Wamesa Documenting Resources]<br/><br />
<br />
==Developed Resources==<br />
===Software===<br />
[https://wikis.swarthmore.edu/ling073/Wamesa/Keyboard Wamesa transcription keyboard]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-corpus Wamesa corpus]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-transducer Wamesa transducer]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-ton-wad-corpus Wamesa/Tongan translator] <br/><br />
[https://github.com/MidasDoas/ud-annotatrix/ UD annotatrix repo] <br/><br />
[midasdoas.github.io/visualise.html UD annotatrix page] <br/><br />
<br />
===Documentation===<br />
[https://wikis.swarthmore.edu/ling073/Wamesa/Grammar Grammar]<br/><br />
[https://wikis.swarthmore.edu/ling073/Wamesa/Transducer Transducer] <br/><br />
[https://wikis.swarthmore.edu/ling073/Wamesa/Disambiguator Disambiguator] <br/><br />
[https://wikis.swarthmore.edu/ling073/Wamesa_and_Tongan Wamesa/Tongan translator] <br/><br />
[https://wikis.swarthmore.edu/ling073/User:Twarner2/Final_project UD annotatrix] <br/><br />
<br />
<br />
[[Category:sp17_ResourceDocumentation]]<br />
[[Category:Wamesa]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5392User:Twarner2/Final project2017-05-11T02:10:56Z<p>Twarner2: /* What I Did */</p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.load.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
*** midasdoas.github.io/visualise.html?put+words+here<br />
*** Could be easily extended to parse encoded characters in URL<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Happens even though I call keyUpFunc at the end of checking for queries in the URL<br />
* More accommodating parsing of sentences<br />
** Now, in CG format, if you leave out some information, it will still draw the dependencies, but with some labels stubbed out as <code>UND</code>.<br />
** '''NB:''' For some reason, you can't have line-initial spaces in CG format.<br />
The repository containing my code can be found [https://github.com/MidasDoas/MidasDoas.github.io here].<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a <code><script>...</script></code> section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency drawing can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL is a great new development for this project, too, because it allows dependency drawings to be easily shared, with the simple click of a link. Previously, one would have had to send the link to the page, and then tell the recipient the text that they would need to copy into the <code>textarea</code>. This development is probably especially helpful for teachers of linguistics, who might desire to send a link containing an example sentence and its dependencies to a group of people.<br/><br />
The parsing of sparse CG formatting is perhaps the most gratifying of these points, for the user, because it makes nodes and dependencies instantly, instead of having to complete the line of CG format before anything generates. It is also working very seamlessly with the user in that as soon as the user adds more information (that wasn't previously there), the dependency changes to represent it.<br />
<br />
==Link==<br />
[https://jonorthwash.github.io/visualise.html Flagship page]<br/><br />
[https://midasdoas.github.io/visualise.html My page]<br />
<br />
<br />
<br />
<br />
<br />
[[Category:sp17_FinalProjects]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5389User:Twarner2/Final project2017-05-11T02:01:28Z<p>Twarner2: /* What I Did */</p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.load.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
*** midasdoas.github.io/visualise.html?put+words+here<br />
*** Could be easily extended to parse encoded characters in URL<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Happens even though I call keyUpFunc at the end of checking for queries in the URL<br />
* More accommodating parsing of sentences<br />
** Now, in CG format, if you leave out some information, it will still draw the dependencies, but with some labels stubbed out as <code>UND</code>.<br />
The repository containing my code can be found [https://github.com/MidasDoas/MidasDoas.github.io here].<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a <code><script>...</script></code> section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency drawing can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL is a great new development for this project, too, because it allows dependency drawings to be easily shared, with the simple click of a link. Previously, one would have had to send the link to the page, and then tell the recipient the text that they would need to copy into the <code>textarea</code>. This development is probably especially helpful for teachers of linguistics, who might desire to send a link containing an example sentence and its dependencies to a group of people.<br/><br />
The parsing of sparse CG formatting is perhaps the most gratifying of these points, for the user, because it makes nodes and dependencies instantly, instead of having to complete the line of CG format before anything generates. It is also working very seamlessly with the user in that as soon as the user adds more information (that wasn't previously there), the dependency changes to represent it.<br />
<br />
==Link==<br />
[https://jonorthwash.github.io/visualise.html Flagship page]<br/><br />
[https://midasdoas.github.io/visualise.html My page]<br />
<br />
<br />
<br />
<br />
<br />
[[Category:sp17_FinalProjects]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5388User:Twarner2/Final project2017-05-11T02:01:07Z<p>Twarner2: /* What I Did */</p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.load.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
*** midasdoas.github.io/visualise.html?put+words+here<br />
*** Could be easily extended to parse encoded characters in URL<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Happens even though I call keyUpFunc at the end of checking for queries in the URL<br />
* More accommodating parsing of sentences<br />
** Now, in CG format, if you leave out some information, it will still draw the dependencies, but with some labels stubbed out as <code>UND</code>.<br />
The repository containing my code can be found [[https://github.com/MidasDoas/MidasDoas.github.io here]].<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a <code><script>...</script></code> section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency drawing can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL is a great new development for this project, too, because it allows dependency drawings to be easily shared, with the simple click of a link. Previously, one would have had to send the link to the page, and then tell the recipient the text that they would need to copy into the <code>textarea</code>. This development is probably especially helpful for teachers of linguistics, who might desire to send a link containing an example sentence and its dependencies to a group of people.<br/><br />
The parsing of sparse CG formatting is perhaps the most gratifying of these points, for the user, because it makes nodes and dependencies instantly, instead of having to complete the line of CG format before anything generates. It is also working very seamlessly with the user in that as soon as the user adds more information (that wasn't previously there), the dependency changes to represent it.<br />
<br />
==Link==<br />
[https://jonorthwash.github.io/visualise.html Flagship page]<br/><br />
[https://midasdoas.github.io/visualise.html My page]<br />
<br />
<br />
<br />
<br />
<br />
[[Category:sp17_FinalProjects]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5387User:Twarner2/Final project2017-05-11T01:53:49Z<p>Twarner2: /* What I Did */</p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.load.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
*** midasdoas.github.io/visualise.html?put+words+here<br />
*** Could be easily extended to parse encoded characters in URL<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Happens even though I call keyUpFunc at the end of checking for queries in the URL<br />
* More accommodating parsing of sentences<br />
** Now, in CG format, if you leave out some information, it will still draw the dependencies, but with some labels stubbed out as <code>UND</code>.<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a <code><script>...</script></code> section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency drawing can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL is a great new development for this project, too, because it allows dependency drawings to be easily shared, with the simple click of a link. Previously, one would have had to send the link to the page, and then tell the recipient the text that they would need to copy into the <code>textarea</code>. This development is probably especially helpful for teachers of linguistics, who might desire to send a link containing an example sentence and its dependencies to a group of people.<br/><br />
The parsing of sparse CG formatting is perhaps the most gratifying of these points, for the user, because it makes nodes and dependencies instantly, instead of having to complete the line of CG format before anything generates. It is also working very seamlessly with the user in that as soon as the user adds more information (that wasn't previously there), the dependency changes to represent it.<br />
<br />
==Link==<br />
[https://jonorthwash.github.io/visualise.html Flagship page]<br/><br />
[https://midasdoas.github.io/visualise.html My page]<br />
<br />
<br />
<br />
<br />
<br />
[[Category:sp17_FinalProjects]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5379User:Twarner2/Final project2017-05-11T00:18:19Z<p>Twarner2: /* What I Did */</p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.load.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Happens even though I call keyUpFunc at the end of checking for queries in the URL<br />
* More accommodating parsing of sentences<br />
** Now, in CG format, if you leave out some information, it will still draw the dependencies, but with some labels stubbed out as <code>UND</code>.<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a <code><script>...</script></code> section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency drawing can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL is a great new development for this project, too, because it allows dependency drawings to be easily shared, with the simple click of a link. Previously, one would have had to send the link to the page, and then tell the recipient the text that they would need to copy into the <code>textarea</code>. This development is probably especially helpful for teachers of linguistics, who might desire to send a link containing an example sentence and its dependencies to a group of people.<br/><br />
The parsing of sparse CG formatting is perhaps the most gratifying of these points, for the user, because it makes nodes and dependencies instantly, instead of having to complete the line of CG format before anything generates. It is also working very seamlessly with the user in that as soon as the user adds more information (that wasn't previously there), the dependency changes to represent it.<br />
<br />
==Link==<br />
[https://jonorthwash.github.io/visualise.html Flagship page]<br/><br />
[https://midasdoas.github.io/visualise.html My page]<br />
<br />
<br />
<br />
<br />
<br />
[[Category:sp17_FinalProjects]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5378User:Twarner2/Final project2017-05-11T00:17:54Z<p>Twarner2: </p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Happens even though I call keyUpFunc at the end of checking for queries in the URL<br />
* More accommodating parsing of sentences<br />
** Now, in CG format, if you leave out some information, it will still draw the dependencies, but with some labels stubbed out as <code>UND</code>.<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a <code><script>...</script></code> section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency drawing can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL is a great new development for this project, too, because it allows dependency drawings to be easily shared, with the simple click of a link. Previously, one would have had to send the link to the page, and then tell the recipient the text that they would need to copy into the <code>textarea</code>. This development is probably especially helpful for teachers of linguistics, who might desire to send a link containing an example sentence and its dependencies to a group of people.<br/><br />
The parsing of sparse CG formatting is perhaps the most gratifying of these points, for the user, because it makes nodes and dependencies instantly, instead of having to complete the line of CG format before anything generates. It is also working very seamlessly with the user in that as soon as the user adds more information (that wasn't previously there), the dependency changes to represent it.<br />
<br />
==Link==<br />
[https://jonorthwash.github.io/visualise.html Flagship page]<br/><br />
[https://midasdoas.github.io/visualise.html My page]<br />
<br />
<br />
<br />
<br />
<br />
[[Category:sp17_FinalProjects]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5377User:Twarner2/Final project2017-05-11T00:16:59Z<p>Twarner2: </p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Happens even though I call keyUpFunc at the end of checking for queries in the URL<br />
* More accommodating parsing of sentences<br />
** Now, in CG format, if you leave out some information, it will still draw the dependencies, but with some labels stubbed out as <code>UND</code>.<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a <code><script>...</script></code> section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency drawing can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL is a great new development for this project, too, because it allows dependency drawings to be easily shared, with the simple click of a link. Previously, one would have had to send the link to the page, and then tell the recipient the text that they would need to copy into the <code>textarea</code>. This development is probably especially helpful for teachers of linguistics, who might desire to send a link containing an example sentence and its dependencies to a group of people.<br/><br />
The parsing of sparse CG formatting is perhaps the most gratifying of these points, for the user, because it makes nodes and dependencies instantly, instead of having to complete the line of CG format before anything generates. It is also working very seamlessly with the user in that as soon as the user adds more information (that wasn't previously there), the dependency changes to represent it.<br />
<br />
==Link==<br />
[https://jonorthwash.github.io/visualise.html Flagship page]<br/><br />
[https://midasdoas.github.io/visualise.html My page]<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Spring 2017]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5375User:Twarner2/Final project2017-05-11T00:08:15Z<p>Twarner2: </p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Happens even though I call keyUpFunc at the end of checking for queries in the URL<br />
* More accommodating parsing of sentences<br />
** Now, in CG format, if you leave out some information, it will still draw the dependencies, but with some labels stubbed out as <code>UND</code>.<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a <code><script>...</script></code> section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency drawing can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL is a great new development for this project, too, because it allows dependency drawings to be easily shared, with the simple click of a link. Previously, one would have had to send the link to the page, and then tell the recipient the text that they would need to copy into the <code>textarea</code>. This development is probably especially helpful for teachers of linguistics, who might desire to send a link containing an example sentence and its dependencies to a group of people.<br/><br />
The parsing of sparse CG formatting is perhaps the most gratifying of these points, for the user, because it makes nodes and dependencies instantly, instead of having to complete the line of CG format before anything generates. It is also working very seamlessly with the user in that as soon as the user adds more information (that wasn't previously there), the dependency changes to represent it.<br />
<br />
==Link==<br />
[https://jonorthwash.github.io/visualise.html Flagship page]<br/><br />
[https://midasdoas.github.io/visualise.html My page]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5262User:Twarner2/Final project2017-05-10T00:48:41Z<p>Twarner2: /* Evaluation */</p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Happens even though I call keyUpFunc at the end of checking for queries in the URL<br />
* '''TODO:''' More accommodating parsing of sentences<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a <code><script>...</script></code> section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency drawing can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL is a great new development for this project, too, because it allows dependency drawings to be easily shared, with the simple click of a link. Previously, one would have had to send the link to the page, and then tell the recipient the text that they would need to copy into the <code>textarea</code>. This development is probably especially helpful for teachers of linguistics, who might desire to send a link containing an example sentence and its dependencies to a group of people.<br/><br />
'''TODO:''' parsing<br />
<br />
==Link==<br />
[https://jonorthwash.github.io/visualise.html Flagship page]<br/><br />
[https://midasdoas.github.io/visualise.html My page]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5261User:Twarner2/Final project2017-05-10T00:47:32Z<p>Twarner2: </p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Happens even though I call keyUpFunc at the end of checking for queries in the URL<br />
* '''TODO:''' More accommodating parsing of sentences<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a '''<script>...</script>''' section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency drawing can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL is a great new development for this project, too, because it allows dependency drawings to be easily shared, with the simple click of a link. Previously, one would have had to send the link to the page, and then tell the recipient the text that they would need to copy into the <code>textarea</code>. This development is probably especially helpful for teachers of linguistics, who might desire to send a link containing an example sentence and its dependencies to a group of people.<br/><br />
'''TODO:''' parsing<br />
<br />
==Link==<br />
[https://jonorthwash.github.io/visualise.html Flagship page]<br/><br />
[https://midasdoas.github.io/visualise.html My page]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5260User:Twarner2/Final project2017-05-10T00:42:39Z<p>Twarner2: </p>
<hr />
<div>==What I Did==<br />
For my final project I chose to work on the UD annotatrix (linked below) and improve a few features. I chose the following 5 tasks to work on in trying to improve some of the functionality:<br />
* Merging of Github repositories<br />
** Copying over of new files from the UD repository<br />
* Cleanup of the main HTML file<br />
** Making the layout of the file nicer<br />
** Extradition of the script into a new file <code>body.js</code><br />
* Working with connection over HTTPS<br />
** Changed absolute URL's to protocol-agnostic ones by appending the domain and pathname of the queried page to whatever protocol is in the current URL<br />
* Loading text from the URL<br />
** Sentences can now be loaded as the query part of the URL, with words separated by +'s<br />
** '''NB:''' For some reason, the dependency tree doesn't generate immediately; you need to press a key to trigger the update called by the release of a key<br />
*** Even though I call keyUpFunc at the end of checking for queries in the URL<br />
* More accommodating parsing of sentences<br />
<br />
==Evaluation==<br />
The merging and cleanup tasks are pretty trivial to evaluate. The cleanup of the HTML file makes a little more sense now than it did to have a script written inside a '''<script>...</script>''' section, but nothing game-changing.<br/><br />
The HTTPS functionality is an extremely helpful functionality to have, because since new Github pages are forced HTTPS connection (as opposed to pre-2015, which are optional), any relatively new Github user who forks the repository would have been previously unable to test any changes. Even more importantly (and in conjunction with the URL reading functionality), the link to an example dependency sentence can be universally and reliably sent.<br/><br />
The loading of sentence(s) from the URL <br />
<br />
==Link==</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=User:Twarner2/Final_project&diff=5258User:Twarner2/Final project2017-05-09T22:23:47Z<p>Twarner2: Created page with "==What I Did== ==Evaluation== ==Link=="</p>
<hr />
<div>==What I Did==<br />
<br />
<br />
==Evaluation==<br />
<br />
<br />
==Link==</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_transfer&diff=5041Wamesa and Tongan/Structural transfer2017-05-05T02:54:09Z<p>Twarner2: </p>
<hr />
<div>==Pre-Evaluation==<br />
===wad→ton===<br />
On the tests file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton→wad===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7%<br />
<br />
Percentage of stems translated correctly: 93% . <br />
<br />
The WER and PER are both 100% when unknown-words are removed, and 100% when not removing unknown-words.<br />
<br />
There was one bug in our pre-evaluation of the data: siʻaku is being translated to #<poss>, but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
(We resolved this - realized that I had forgotten to add the <poss> tag in my lexc file).<br />
<br />
<br />
==Examples==<br />
===ton→wad===<br />
*Correct Translation <br />
**(ton) ʻoku ʻikai ke puke → (wad) pota va<br />
**(ton) ʻoku<vaux><present> ʻikai<neg> ke<prn><prepd><p3><sg> puke<adj> → (wad) pota<adj><p3><sg> va<neg><br />
<br />
*tagger<br />
**^ʻoku<vaux><pres>$ ^ʻikai<neg>$ ^ke<prn><prepd><p2><sg>$ ^puke<adj>$^.<sent>$<br />
*biltrans<br />
**^ʻoku<vaux><pres>/$ ^ʻikai<neg>/va<neg>$ ^ke<prn><prepd><p2><sg>/$ ^puke<adj>/pota<adj>$^.<sent>/.<sent>$<br />
*chunker<br />
**apertium-transfer: Rule 1 ʻikai<neg>/va<neg><br />
**apertium-transfer: Rule 2 .<sent>/.<sent><br />
**^neg<NEG>{}$ ^default<default>{^pota<adj>$}$ ^sent<SENT>{^va<neg>$}$^sent<SENT>{^.<sent>$}$<br />
*interchunk <br />
** ^neg<NEG>{}$ ^default<default>{^pota<adj>$}$ ^sent<SENT>{^va<neg>$}$^sent<SENT>{^.<sent>$}$<br />
*postchunk<br />
**^pota<adj>$ ^va<neg>$^.<sent>$<br />
*(ton-wad)<br />
**pota va#<br />
<br />
===wad→ton===<br />
*Correct Translation <br />
**(wad) pimunapat → (ton) siʻaku puaka<br />
**(wad) pimuna<n><p1><sg><poss> → (ton) siʻaku<p1><poss> puaka<n><sg><br />
<br />
*tagger<br />
**^pimuna<n><p1><sg><poss><.sent>$<br />
<br />
*biltrans<br />
**^pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>$<br />
<br />
*chunker<br />
**apertium-transfer: Rule 2 pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent> <br/> ^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*interchunk <br />
**^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*postchunk<br />
**^siʻaku$ ^puaka<n><sg>$<br />
<br />
*(wad-ton)<br />
** #siʻaku puaka<br />
<br />
==Post-Evaluation==<br />
===wad→ton===<br />
On the tests file, 100% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 100% when *'d words are removed, and when not removing *'s.<br />
<br />
===ton→wad===<br />
On the test file, 100% of the words translate correctly. The WER and PER are both 100% when unknown-words are removed, and 100% when not removing unknown-words.<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]<br />
[[Category:Sp17_StructuralTransfer]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Lexical_Selection&diff=5040Wamesa and Tongan/Lexical Selection2017-05-05T02:53:53Z<p>Twarner2: </p>
<hr />
<div>==wad → ton==<br />
The Wamesa word ''matotap'' can mean ''to fall'' or ''to collapse''.<br />
*(wad) matotap (to fall) → tō (ton)<br />
*(wad) matotap (to collapse) → holo (ton)<br />
<br />
In Wamesa, ''mun'' means ''to hunt'' or ''to kill''.<br />
*(wad) mun (hunt) → fakaʻete (ton)<br />
*(wad) mun (kill) → tāmateʻi (ton)<br />
===Example sentences===<br />
In this example sentence, the lexical selection tool should choose the translation ''tō'', because ''matotap'' is preceded by ''babin'' and ''antum'', which are less likely to collapse than to fall.<br />
*''Babinpai antumpai sumatotap.'' (The woman and the child fall.)<br />
In this example sentence, the lexical selection tool should choose the translation ''holo'', because ''matotap'' is preceded by ''anio'', which means house and is more likely to collapse than to fall.<br />
*''Aniopasiat simatotap.'' (The houses are collapsing.)<br />
In this example sentence, the lexical selection tool should choose the translation ''fakaʻete'', because ''mun'' is followed by ''pimuna'', and while you could kill a pig, it is more likely that it's a pig hunt than just a pig murder.<br />
*''Imun pimunapesi.'' (I am hunting a pig.)<br />
In this example sentence, the lexical selection tool should choose the translation ''fakaʻete'', because ''mun'' is followed by ''muan'', and you wouldn't hunt a man.<br />
*''Imun muanpai.'' (I killed the man.)<br />
<br />
==ton → wad==<br />
The Tongan word ''tangi'' means ''to cry/weep'' as well as ''to ask/appeal''.<br />
* (ton) (to cry/weep) tangi → sau (wad)<br />
* (ton) (to ask/appeal) tangi → utanusara (wad)<br />
In Tongan, ''kakalu'' means either ''cricket'' or ''whistle''. Generally speaking, kakalu will likely translate to cricket more often than whistle, because it is a secondary definition for whistle. <br />
* (ton) (large cricket or cicada) kakalu → sararer (wad) <br />
* (ton) (whistle) kakalu → nginggisi (wad)<br />
<br />
===Example sentences===<br />
In this example sentence, the lexical selection tool should choose the translation "nginggisi", because the "kakalu" is preceded by the verb "angi", which means to blow.<br />
*ʻOku ou angi e kakalu. (I blow the whistle.) <br />
<br />
The translation for "kakalu" here should be "sararer", because it is preceded by the verb, "moloki", meaning to step on. It is common to use the verb to step on, to refer to stepping on an insect.<br />
*ʻOku ou moloki e kakalu. (I step on a cricket.)<br />
<br />
<br />
In this example, the translation "utanusara", because "tangi" is followed by the word for question. <br />
*ʻOku ou tangi a fehuʻi. (I ask a question.)<br />
<br />
In this example, the translation "sau" should be chosen, because "tangi" is followed by the word for baby, "valevale". A baby is more likely to cry, than to ask a question.<br />
*ʻOku tangi e valevale. (The baby cries.)<br />
<br />
<br />
<br />
[[Category:Lexical selection]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4950Wamesa and Tongan2017-04-28T00:32:30Z<p>Twarner2: </p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
==Initial Evaluation==<br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Final Evaluation==<br />
===Wamesa Transducer===<br />
There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the <code>wad.annotated.basic.txt</code> corpus are both 100%. There are 940 words in the <code>wad.large.txt</code> corpus, and the coverage over it is 26.17%.<br />
<br />
===Tongan Transducer===<br />
<br />
===wad → ton evaluation===<br />
WER and PER over the <code>wad.longer.txt</code> corpus were both 100%. The coverage over the <code>large</code> corpus was 26.17%, and was the same for the <code>longer</code> corpus.<br/><br />
There are 934 tokens in the <code>longer</code> corpus and 940 in the <code>large</code> corpus.<br />
<br />
===ton → wad evaluation===<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad Bidirectional Wamesa/Tongan Translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4949Wamesa and Tongan2017-04-28T00:28:38Z<p>Twarner2: /* wad → ton evaluation */</p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
==Initial Evaluation==<br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Final Evaluation==<br />
===Wamesa Transducer===<br />
There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the <code>wad.annotated.basic.txt</code> corpus are both 100%. There are words in the <code>wad.large.txt</code> corpus, and the coverage over it is TODO.<br />
<br />
===Tongan Transducer===<br />
<br />
===wad → ton evaluation===<br />
WER and PER over the <code>wad.longer.txt</code> corpus were both 100%<br />
There are 934 tokens in the <code>longer</code> corpus and 940 in the <code>large</code> corpus.<br />
<br />
===ton → wad evaluation===<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4935Wamesa and Tongan2017-04-27T15:07:53Z<p>Twarner2: /* wad → ton evaluation */</p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
==Initial Evaluation==<br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Final Evaluation==<br />
===Wamesa Transducer===<br />
There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the <code>wad.annotated.basic.txt</code> corpus are both 100%. There are words in the <code>wad.large.txt</code> corpus, and the coverage over it is TODO.<br />
<br />
===Tongan Transducer===<br />
<br />
===wad → ton evaluation===<br />
WER and PER over the <code>wad.longer.txt</code> corpus were both 100%<br />
There are 934 tokens in both the <code>longer</code> and <code>large</code> corpora.<br />
<br />
===ton → wad evaluation===<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4934Wamesa and Tongan2017-04-27T14:56:28Z<p>Twarner2: /* wad → ton evaluation */</p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
==Initial Evaluation==<br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Final Evaluation==<br />
===Wamesa Transducer===<br />
There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the <code>wad.annotated.basic.txt</code> corpus are both 100%. There are words in the <code>wad.large.txt</code> corpus, and the coverage over it is TODO.<br />
<br />
===Tongan Transducer===<br />
<br />
===wad → ton evaluation===<br />
WER and PER over the <code>wad.longer.txt</code> corpus were both 100%<br />
<br />
===ton → wad evaluation===<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4933Wamesa and Tongan2017-04-27T14:13:33Z<p>Twarner2: /* Wamesa Transducer */</p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
==Initial Evaluation==<br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Final Evaluation==<br />
===Wamesa Transducer===<br />
There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the <code>wad.annotated.basic.txt</code> corpus are both 100%. There are words in the <code>wad.large.txt</code> corpus, and the coverage over it is TODO.<br />
<br />
===Tongan Transducer===<br />
<br />
===wad → ton evaluation===<br />
<br />
===ton → wad evaluation===<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4932Wamesa and Tongan2017-04-27T14:13:19Z<p>Twarner2: /* Wamesa Transducer */</p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
==Initial Evaluation==<br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Final Evaluation==<br />
===Wamesa Transducer===<br />
There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the <code>wad.annotated.basic.txt</code> corpus are both 100%. There are words in the <code>wad.large.txt</code> corpus, and the coverage over it is .<br />
<br />
===Tongan Transducer===<br />
<br />
===wad → ton evaluation===<br />
<br />
===ton → wad evaluation===<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4931Wamesa and Tongan2017-04-27T13:26:34Z<p>Twarner2: /* Final Evaluation */</p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
==Initial Evaluation==<br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Final Evaluation==<br />
===Wamesa Transducer===<br />
There are about 110 stems in the transducer, and about 160 in the translation system. Precision and recall for the <code>wad.annotated.basic.txt</code> corpus is . There are words in the <code>wad.large.txt</code> corpus, and the coverage over it is .<br />
<br />
===Tongan Transducer===<br />
<br />
===wad → ton evaluation===<br />
<br />
===ton → wad evaluation===<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4930Wamesa and Tongan2017-04-27T13:10:39Z<p>Twarner2: /* Final Evaluation */</p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
==Initial Evaluation==<br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Final Evaluation==<br />
===Wamesa Transducer===<br />
<br />
===Tongan Transducer===<br />
<br />
===wad → ton evaluation===<br />
<br />
===ton → wad evaluation===<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4929Wamesa and Tongan2017-04-27T13:09:24Z<p>Twarner2: </p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
==Initial Evaluation==<br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Final Evaluation==<br />
===wad → ton evaluation===<br />
<br />
===ton → wad evaluation===<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Polished_RBMT_system&diff=4797Polished RBMT system2017-04-24T10:01:05Z<p>Twarner2: /* The assignment */</p>
<hr />
<div>== Hand-annotating corpora ==<br />
First you want to analyze your corpus and output to CG format:<br />
cat corpus.txt | apertium -d . xyz-morph | cg-conv -a > corpus.out.txt<br />
<br />
Your new file probably now looks something like this:<br />
<pre><br />
"<This>"<br />
"this" det dem sg<br />
"this" prn dem sg<br />
"<is>"<br />
"be" vbser pres p3 sg<br />
"<my>"<br />
"I" prn p1 sg pos<br />
"<house>"<br />
"house" n sg<br />
"<.>"<br />
"." sent<br />
"<I>"<br />
"I" prn p1 sg subj<br />
"<live>"<br />
"live" vblex inf<br />
"live" vblex past<br />
"<here>"<br />
"*here"<br />
"<..>"<br />
".." sent<br />
</pre><br />
<br />
In this example, you might note a few fixes:<br />
* "here" isn't being analysed; it should have an adverb reading<br />
* "house" should have a verb reading<br />
* "live" should have an adjective reading<br />
* "live" isn't the past tense form of this verb<br />
<br />
The following annotation makes these corrections:<br />
<br />
<pre><br />
"<This>"<br />
"this" det dem sg<br />
"this" prn dem sg<br />
"<is>"<br />
"be" vbser pres p3 sg<br />
"<my>"<br />
"I" prn p1 sg pos<br />
"<house>"<br />
"house" n sg<br />
"house" vblex tv inf<br />
"<.>"<br />
"." sent<br />
"<I>"<br />
"I" prn p1 sg subj<br />
"<live>"<br />
"live" vblex inf<br />
"live" adj<br />
"<here>"<br />
"here" adv<br />
"<..>"<br />
".." sent<br />
</pre><br />
<br />
Note: There should be no unknown words ("analyses" with *) when you're done.<br />
<br />
== Measuring precision and recall ==<br />
[[:wikipedia:Precision and recall|Precision and recall]] are measures of how accurate a transducer is. Precision is the number of returned analyses that are correct, and recall is the number of correct analyses that are returned.<br />
<br />
In the example above of hand annotation, the precision is 90% (there are 9 true positives and 1 false positive), meaning that 90% of the returned analyses were correct. Recall is lower, at 75% (there are 9 true positives and 3 false negatives), meaning that only 75% of the correct analyses were returned.<br />
<br />
There is a script in the [[Using the tools on your own system#Misc tools|tools repo]] called <code>precisionRecall</code>. You can update the repo (<code>git pull</code>) and run <code>sudo make</code> to ensure that you have this script installed on your system. You can then run <code>precisionRecall referencecorpus.txt annotatedcorpus.txt</code>.<br />
<br />
== Measuring trimmed coverage ==<br />
Measuring trimmed coverage is just the same as measuring coverage, but with the appropriate "trimmed" transducer (e.g., xyz-abc.automorf.bin).<br />
<br />
== The assignment ==<br />
This assignment is due at the end of week 12 (this semester, at the end of the day on '''Tuesday, 18 April 2017, before class''').<br />
<br />
# '''Before you begin''', make sure all previous assignments are done, and add a "structural_transfer" tag to your transducer repositories and your translation pair repository/ies to mark the end of previous assignments.<br />
#* Also, please remove all binaries from all repositories! See [[removing binaries from transducer repo]].<br />
# Set up some '''new corpora''' based on existing ones:<br />
#* Combine your <code>sentences</code> and <code>tests</code> corpora so you have a new '''longer parallel corpus'''. Name the files <code>abc.longer.txt</code> and <code>xyz.longer.txt</code>.<br />
#* Make a '''large monolingual corpus''' of a bunch of raw text in your language. The more the better. This step may simply consist of you cleaning up and/or combining the existing corpora from the [[initial corpus assembly]] assignment. See if you can get it over 100K words. The bigger this corpus is the better. Call it <code>abc.corpus.large.txt</code> (in your monolingual corpus repo) and add notes to your <code>MANIFEST</code> file about where the text comes from.<br />
#* A '''hand-annotated monolingual corpus''' of sentences (see above) covering at least 1000 characters (500 for syllabic scripts) of your <code>abc.corpus.basic.txt</code> file, ideally sentences you understand / have English glosses of. Put the sentences you want to annotate in <code>abc.annotated.raw.txt</code> and dump this to <code>abc.annotated.basic.txt</code> to annotate it in CG format. Add these files to your monolingual corpus repository.<br />
# If you've been working on separate MT pairs, '''combine your MT pairs''' into one repository (which you both have full access to), making sure to incorporate all of the following:<br />
#* All entries from both dictionaries in a single <code>.dix</code> file. Make sure all translations are in the default direction of the pair (e.g., <code>abc-xyz</code>) and that <code>r="RL"</code> or <code>"LR"</code> attributes are set up for the right direction.<br />
#* Both <code>lrx</code> files are there and have the right names.<br />
#* All transfer files for both directions (up to 6 files) are there and have the right names and content.<br />
#* Also—again!—make sure that there are no compiled binaries or other compiled files committed to the repo. If needed, use the <code>apertium-init</code> script to bootstrap a new pair to get the list of just the files that need to be in the repo, and use the the tricks presented in [[removing binaries from transducer repo]] to clean it up.<br />
#* Make it clear on the pair's wiki page which repository link is the final resting place of all this (but leave a link to the other repo and don't remove it). Also add a link to the final repo in the <code>README</code> in the superseded repo, with a note that the code has been merged into the other pair.<br />
# '''Expand your MT pair''' in at least '''four''' of the following ways for each translation direction, listing in a "Final evaluation" section on the language pair's wiki page what you did (move existing evaluation sections under a new section called "Initial evaluation"), and for every rule (for all of the following except adding stems), list an example of what output was improved.<br />
#* At least '''100 more stems''' in the bilingual dictionary (and monolingual dictionaries). This counts for both translation directions.<br />
#* '''Expanded your morphology''' to cover '''at least 6 more elements''' of some paradigm(s). This can be anything from additional verb or noun morphology, to adding all the forms of all the determiners (articles, demonstratives, etc.), to implementing nominal morphology on adjectives (e.g., if your language allows adjectives to be substantivised, which you'll want to add a tag for too).<br />
#* At least '''4 more twol rules''' that make your (analysis and) generation cleaner.<br />
#* At least '''4 new disambiguation rules''' that make the output of your tagger more accurate.<br />
#* At least '''3 new lexical selection rules''' that make more of the right stems transfer over.<br />
#* At least '''3 new transfer rules''' that make more of the output of your MT system closer to an acceptable target translation.<br />
# When you are done with the above, '''document the following measures''' in the "Final evaluation" section of the pair's wiki page:.<br />
#* For each transducer:<br />
#** Precision and recall against the <code>annotated.basic</code> corpus,<br />
#** Coverage over the <code>large</code> corpus,<br />
#** The number of words in the <code>large</code> corpus,<br />
#** The number of stems in the transducer.<br />
#* For MT in each direction:<br />
#** WER and PER over <code>longer</code> corpus.<br />
#** The proportion of stems translated correctly in the <code>longer</code> corpus.<br />
#** Trimmed coverage over <code>longer</code> and <code>large</code> corpora.<br />
#** The number of tokens in <code>longer</code> and <code>large</code> corpora.<br />
<br />
[[Category:Assignments]]<br />
[[Category:Tutorials]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_transfer&diff=4475Wamesa and Tongan/Structural transfer2017-04-11T16:16:14Z<p>Twarner2: /* wad→ton */</p>
<hr />
<div>==Pre-Evaluation==<br />
===wad→ton===<br />
On the tests file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton→wad===<br />
<br />
<br />
==Examples==<br />
===ton→wad===<br />
*Correct Translation <br />
**(ton) ʻoku ʻikai ke puke → (wad) pota va<br />
**(ton) ʻoku<vaux><present> ʻikai<neg> ke<prn><prepd><p3><sg> puke<adj> → (wad) pota<adj><p3><sg> va<neg><br />
<br />
*tagger<br />
**<br />
*biltrans<br />
**<br />
*chunker<br />
**<br />
*interchunk <br />
**<br />
*postchunk<br />
**<br />
*(ton-wad)<br />
**<br />
<br />
===wad→ton===<br />
*Correct Translation <br />
**(wad) pimunapat → (ton) siʻaku puaka<br />
**(wad) pimuna<n><p1><sg><poss> → (ton) siʻaku<p1><poss> puaka<n><sg><br />
<br />
*tagger<br />
**^pimuna<n><p1><sg><poss><.sent>$<br />
<br />
*biltrans<br />
**^pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>$<br />
<br />
*chunker<br />
**apertium-transfer: Rule 2 pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent> <br/> ^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*interchunk <br />
**^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*postchunk<br />
**^siʻaku$ ^puaka<n><sg>$<br />
<br />
*(wad-ton)<br />
** #siʻaku puaka<br />
<br />
==Post-Evaluation==<br />
===wad→ton===<br />
On the tests file, 100% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 100% when *'d words are removed, and when not removing *'s.<br />
<br />
===ton→wad===<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]<br />
[[Category:Sp17_StructuralTransfer]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_transfer&diff=4474Wamesa and Tongan/Structural transfer2017-04-11T16:15:45Z<p>Twarner2: /* wad→ton */</p>
<hr />
<div>==Pre-Evaluation==<br />
===wad→ton===<br />
On the tests file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton→wad===<br />
<br />
<br />
==Examples==<br />
===ton→wad===<br />
*Correct Translation <br />
**(ton) ʻoku ʻikai ke puke → (wad) pota va<br />
**(ton) ʻoku<vaux><present> ʻikai<neg> ke<prn><prepd><p3><sg> puke<adj> → (wad) pota<adj><p3><sg> va<neg><br />
<br />
*tagger<br />
**<br />
*biltrans<br />
**<br />
*chunker<br />
**<br />
*interchunk <br />
**<br />
*postchunk<br />
**<br />
*(ton-wad)<br />
**<br />
<br />
===wad→ton===<br />
*Correct Translation <br />
**(wad) pimunapat → (ton) siʻaku puaka<br />
**(wad) pimuna<n><p1><sg><poss> → (ton) siʻaku<p1><poss> puaka<n><sg><br />
<br />
*tagger<br />
**^pimuna<n><p1><sg><poss><.sent>$<br />
<br />
*biltrans<br />
**^pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>$<br />
<br />
*chunker<br />
**apertium-transfer: Rule 2 pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent> <br/> ^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*interchunk <br />
**^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*postchunk<br />
**^siʻaku$ ^puaka<n><sg>$<br />
<br />
*(wad-ton)<br />
**\#siʻaku puaka<br />
<br />
==Post-Evaluation==<br />
===wad→ton===<br />
On the tests file, 100% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 100% when *'d words are removed, and when not removing *'s.<br />
<br />
===ton→wad===<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]<br />
[[Category:Sp17_StructuralTransfer]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_transfer&diff=4473Wamesa and Tongan/Structural transfer2017-04-11T16:15:21Z<p>Twarner2: </p>
<hr />
<div>==Pre-Evaluation==<br />
===wad→ton===<br />
On the tests file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton→wad===<br />
<br />
<br />
==Examples==<br />
===ton→wad===<br />
*Correct Translation <br />
**(ton) ʻoku ʻikai ke puke → (wad) pota va<br />
**(ton) ʻoku<vaux><present> ʻikai<neg> ke<prn><prepd><p3><sg> puke<adj> → (wad) pota<adj><p3><sg> va<neg><br />
<br />
*tagger<br />
**<br />
*biltrans<br />
**<br />
*chunker<br />
**<br />
*interchunk <br />
**<br />
*postchunk<br />
**<br />
*(ton-wad)<br />
**<br />
<br />
===wad→ton===<br />
*Correct Translation <br />
**(wad) pimunapat → (ton) siʻaku puaka<br />
**(wad) pimuna<n><p1><sg><poss> → (ton) siʻaku<p1><poss> puaka<n><sg><br />
<br />
*tagger<br />
**^pimuna<n><p1><sg><poss><.sent>$<br />
<br />
*biltrans<br />
**^pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>$<br />
<br />
*chunker<br />
**apertium-transfer: Rule 2 pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent> <br/> ^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*interchunk <br />
**^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*postchunk<br />
**^siʻaku$ ^puaka<n><sg>$<br />
<br />
*(wad-ton)<br />
**#siʻaku puaka<br />
<br />
==Post-Evaluation==<br />
===wad→ton===<br />
On the tests file, 100% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 100% when *'d words are removed, and when not removing *'s.<br />
<br />
===ton→wad===<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]<br />
[[Category:Sp17_StructuralTransfer]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_transfer&diff=4460Wamesa and Tongan/Structural transfer2017-04-11T15:51:29Z<p>Twarner2: /* wad→ton */</p>
<hr />
<div>==Examples==<br />
===ton→wad===<br />
*Correct Translation <br />
**(ton) ʻoku ʻikai ke puke → (wad) pota va<br />
**(ton) ʻoku<vaux><present> ʻikai<neg> ke<prn><prepd><p3><sg> puke<adj> → (wad) pota<adj><p3><sg> va<neg><br />
<br />
*tagger<br />
**<br />
*biltrans<br />
**<br />
*chunker<br />
**<br />
*interchunk <br />
**<br />
*postchunk<br />
**<br />
*(ton-wad)<br />
**<br />
<br />
===wad→ton===<br />
*Correct Translation <br />
**(wad) pimunapat → (ton) siʻaku puaka<br />
**(wad) pimuna<n><p1><sg><poss> → (ton) siʻaku<p1><poss> puaka<n><sg><br />
<br />
*tagger<br />
**^pimuna<n><p1><sg><poss><.sent>$<br />
<br />
*biltrans<br />
**^pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>$<br />
<br />
*chunker<br />
**apertium-transfer: Rule 2 pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent> <br/> ^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*interchunk <br />
**^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*postchunk<br />
**^siʻaku$ ^puaka<n><sg>$<br />
<br />
*(wad-ton)<br />
**#siʻaku puaka<br />
<br />
==Pre-Evaluation==<br />
===wad→ton===<br />
The WER <br />
<br />
===ton→wad===<br />
<br />
<br />
==Post-Evaluation==<br />
===wad→ton===<br />
The WER <br />
<br />
===ton→wad===<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]<br />
[[Category:Sp17_StructuralTransfer]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_transfer&diff=4455Wamesa and Tongan/Structural transfer2017-04-11T15:49:29Z<p>Twarner2: </p>
<hr />
<div>==Examples==<br />
===ton→wad===<br />
*Correct Translation <br />
**(ton) ʻoku ʻikai ke puke → (wad) pota va<br />
**(ton) ʻoku<vaux><present> ʻikai<neg> ke<prn><prepd><p3><sg> puke<adj> → (wad) pota<adj><p3><sg> va<neg><br />
<br />
*tagger<br />
**<br />
*biltrans<br />
**<br />
*chunker<br />
**<br />
*interchunk <br />
**<br />
*postchunk<br />
**<br />
*(ton-wad)<br />
**<br />
<br />
===wad→ton===<br />
*Correct Translation <br />
**(wad) pimunapat → (ton) siʻaku puaka<br />
**(wad) pimuna<n><p1><sg><poss> → (ton) siʻaku<p1><poss> puaka<n><sg><br />
<br />
*tagger<br />
**^pimuna<n><p1><sg><poss><.sent>$<br />
<br />
*biltrans<br />
**^pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent>$<br />
<br />
*chunker<br />
**apertium-transfer: Rule 2 pimuna<n><p1><sg><poss><.sent>/puaka<n><p1><sg><poss><.sent> <br/> ^poss<SA>{^siʻaku$ }$^n<SA>{^puaka<n><sg>$}$<br />
<br />
*interchunk <br />
**<br />
*postchunk<br />
**<br />
*(ton-wad)<br />
**<br />
<br />
==Pre-Evaluation==<br />
===wad→ton===<br />
The WER <br />
<br />
===ton→wad===<br />
<br />
<br />
==Post-Evaluation==<br />
===wad→ton===<br />
The WER <br />
<br />
===ton→wad===<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]<br />
[[Category:Sp17_StructuralTransfer]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_transfer&diff=4445Wamesa and Tongan/Structural transfer2017-04-11T07:56:15Z<p>Twarner2: </p>
<hr />
<div>==Pre-Evaluation==<br />
===wad-ton===<br />
The WER <br />
<br />
===ton-wad===<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]<br />
[[Category:Sp17_StructuralTransfer]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_transfer&diff=4433Wamesa and Tongan/Structural transfer2017-04-10T19:39:03Z<p>Twarner2: </p>
<hr />
<div>==Pre-Evaluation==<br />
===wad-ton===<br />
<br />
===ton-wad===<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Structural transfer]]<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_transfer&diff=4335Wamesa and Tongan/Structural transfer2017-04-06T15:42:50Z<p>Twarner2: </p>
<hr />
<div>stub<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Structural transfer]]<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_transfer&diff=4334Wamesa and Tongan/Structural transfer2017-04-06T15:40:51Z<p>Twarner2: Created page with "stub Category:Structural transfer"</p>
<hr />
<div>stub<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Structural transfer]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_Transfer&diff=4333Wamesa and Tongan/Structural Transfer2017-04-06T15:40:29Z<p>Twarner2: Blanked the page</p>
<hr />
<div></div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4332Wamesa and Tongan2017-04-06T15:39:51Z<p>Twarner2: /* Documentation */</p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan&diff=4331Wamesa and Tongan2017-04-06T15:38:53Z<p>Twarner2: </p>
<hr />
<div>This page is a resource for machine translation between [https://wikis.swarthmore.edu/ling073/Wamesa Wamesa] and [https://wikis.swarthmore.edu/ling073/Tongan Tongan].<br />
<br/><br />
===wad → ton evaluation===<br />
On the <code>tests</code> file, 77.47% of the words translate correctly, and the unknown word rate is 28.57%. The WER and PER are both 71.43% when *'d words are removed, and 100% when not removing *'s.<br />
<br />
===ton → wad evaluation===<br />
Percentage of unknown words in ton.sentences.txt (bilingual corpus): 7% <br />
<br />
Percentage of stems translated correctly: 93% <br />
<br />
Percentage of unknown words in test file: 0% <br />
<br />
Results when removing unknown word marks (stars) |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
Number of position-independent correct words: 1<br />
<br />
Results when unknown word marks (stars) not removed |<br />
'''WER:''' 100%<br />
'''PER:''' 100%<br />
<br />
*There is still one bug that we are trying to fix. '''siʻaku''' is being translated to '''#<poss>''', but it should translate to nothing as there is no equivalent in Wamesa. The percentages above are when the incorrect #<poss> is taken out of the ton-wad.tests.txt file. Otherwise, the percentage increases to 114.29%.<br />
<br />
==Documentation==<br />
[[Wamesa_and_Tongan/Contrastive_Grammar | Contrastive Grammar]]<br/><br />
[[Wamesa_and_Tongan/Lexical_Selection | Lexical Selection]]<br/><br />
[[Wamesa_and_Tongan/Structural_Transfer | Structural Transfer]]<br/><br />
<br />
==Developed Resources==<br />
[https://github.swarthmore.edu/mcostag1/ling073-ton-wad.git ton→wad translator]<br/><br />
[https://github.swarthmore.edu/twarner2/ling073-wad-ton.git wad→ton translator]<br/><br />
<br />
<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Structural_Transfer&diff=4330Wamesa and Tongan/Structural Transfer2017-04-06T15:38:06Z<p>Twarner2: Created page with "stub Category:Structural transfer"</p>
<hr />
<div>stub<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Structural transfer]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Lexical_Selection&diff=4231Wamesa and Tongan/Lexical Selection2017-03-31T15:58:00Z<p>Twarner2: </p>
<hr />
<div>==wad → ton==<br />
The Wamesa word ''matotap'' can mean ''to fall'' or ''to collapse''.<br />
*(wad) matotap (to fall) → tō (ton)<br />
*(wad) matotap (to collapse) → holo (ton)<br />
<br />
In Wamesa, ''mun'' means ''to hunt'' or ''to kill''.<br />
*(wad) mun (hunt) → fakaʻete (ton)<br />
*(wad) mun (kill) → tāmateʻi (ton)<br />
===Example sentences===<br />
In this example sentence, the lexical selection tool should choose the translation ''tō'', because ''matotap'' is preceded by ''babin'' and ''antum'', which are less likely to collapse than to fall.<br />
*''Babinpai antumpai sumatotap.'' (The woman and the child fall.)<br />
In this example sentence, the lexical selection tool should choose the translation ''holo'', because ''matotap'' is preceded by ''anio'', which means house and is more likely to collapse than to fall.<br />
*''Aniopasiat simatotap.'' (The houses are collapsing.)<br />
In this example sentence, the lexical selection tool should choose the translation ''fakaʻete'', because ''mun'' is followed by ''pimuna'', and while you could kill a pig, it is more likely that it's a pig hunt than just a pig murder.<br />
*''Imun pimunapesi.'' (I am hunting a pig.)<br />
In this example sentence, the lexical selection tool should choose the translation ''fakaʻete'', because ''mun'' is followed by ''muan'', and you wouldn't hunt a man.<br />
*''Imun muanpai.'' (I killed the man.)<br />
<br />
==ton → wad==<br />
The Tongan word ''tangi'' means ''to cry/weep'' as well as ''to ask/appeal''.<br />
* (ton) (to cry/weep) tangi → sau (wad)<br />
* (ton) (to ask/appeal) tangi → utanusara (wad)<br />
In Tongan, ''kakalu'' means either ''cricket'' or ''whistle''. Generally speaking, kakalu will likely translate to cricket more often than whistle, because it is a secondary definition for whistle. <br />
* (ton) (large cricket or cicada) kakalu → sararer (wad) <br />
* (ton) (whistle) kakalu → nginggisi (wad)<br />
<br />
===Example sentences===<br />
In this example sentence, the lexical selection tool should choose the translation "nginggisi", because the "kakalu" is preceded by the verb "angi", which means to blow.<br />
*ʻOku ou angi e kakalu. (I blow the whistle.) <br />
<br />
The translation for "kakalu" here should be "sararer", because it is preceded by the verb, "moloki", meaning to step on. It is common to use the verb to step on, to refer to stepping on an insect.<br />
*ʻOku ou moloki e kakalu. (I step on a cricket.)<br />
<br />
<br />
In this example, the translation "utanusara", because "tangi" is followed by the word for question. <br />
*ʻOku ou tangi a fehuʻi. (I ask a question.)<br />
<br />
In this example, the translation "sau" should be chosen, because "tangi" is followed by the word for baby, "valevale". A baby is more likely to cry, than to ask a question.<br />
*ʻOku tangi e valevale. (The baby cries.)<br />
<br />
<br />
<br />
[[Category:Lexical selection]]<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Lexical_Selection&diff=4223Wamesa and Tongan/Lexical Selection2017-03-31T03:05:57Z<p>Twarner2: /* Example sentences */</p>
<hr />
<div>==wad → ton==<br />
The Wamesa word ''matotap'' can mean ''to fall'' or ''to collapse''.<br />
*(wad) matotap (to fall) → tō (ton)<br />
*(wad) matotap (to collapse) → holo (ton)<br />
<br />
In Wamesa, ''mun'' means ''to hunt'' or ''to kill''.<br />
*(wad) mun (hunt) → fakaʻete (ton)<br />
*(wad) mun (kill) → tāmateʻi (ton)<br />
===Example sentences===<br />
*''Babinpai antumpai sumatotap.'' (The woman and the child fall.)<br />
*''Aniopasiat simatotap.'' (The houses are collapsing.)<br />
*''Imun pimunapesi.'' (I am hunting a pig.)<br />
*''Imun muanpai.'' (I killed the man.)<br />
<br />
==ton → wad==<br />
The Tongan word ''tangi'' means ''to cry/weep'' as well as ''to ask/appeal''.<br />
* (ton) (to cry/weep) tangi → sau (wad)<br />
* (ton) (to ask/appeal) tangi → utanusara (wad)<br />
In Tongan, ''kakalu'' means either ''cricket'' or ''whistle''. Generally speaking, kakalu will likely translate to cricket more often than whistle, because it is a secondary definition for whistle. <br />
* (ton) (large cricket or cicada) kakalu → sararer (wad) <br />
* (ton) (whistle) kakalu → nginggisi (wad)<br />
<br />
===Example sentences===<br />
In this example sentence, the lexical selection tool should choose the translation "nginggisi", because the "kakalu" is preceded by the verb "angi", which means to blow.<br />
*ʻOku ou angi e kakalu. (I blow the whistle.) <br />
<br />
*ʻOku ou moloki e kakalu. (I step on a cricket.)<br />
<br />
<br />
In this example, the translation "utanusara", because "tangi" is followed by the word for question. <br />
*ʻOku ou tangi a fehuʻi. (I ask a question.)<br />
*ʻOku tangi <br />
<br />
<br />
[[Category:Lexical selection]]<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Lexical_Selection&diff=4222Wamesa and Tongan/Lexical Selection2017-03-31T03:05:22Z<p>Twarner2: /* wad → ton */</p>
<hr />
<div>==wad → ton==<br />
The Wamesa word ''matotap'' can mean ''to fall'' or ''to collapse''.<br />
*(wad) matotap (to fall) → tō (ton)<br />
*(wad) matotap (to collapse) → holo (ton)<br />
<br />
In Wamesa, ''mun'' means ''to hunt'' or ''to kill''.<br />
*(wad) mun (hunt) → fakaʻete (ton)<br />
*(wad) mun (kill) → tāmateʻi (ton)<br />
===Example sentences===<br />
**'''Ex:''' ''Babinpai antumpai sumatotap.'' (The woman and the child fall.)<br />
**'''Ex:''' ''Aniopasiat simatotap.'' (The houses are collapsing.)<br />
**'''Ex:''' ''Imun pimunapesi.'' (I am hunting a pig.)<br />
**'''Ex:''' ''Imun muanpai.'' (I killed the man.)<br />
<br />
==ton → wad==<br />
The Tongan word ''tangi'' means ''to cry/weep'' as well as ''to ask/appeal''.<br />
* (ton) (to cry/weep) tangi → sau (wad)<br />
* (ton) (to ask/appeal) tangi → utanusara (wad)<br />
In Tongan, ''kakalu'' means either ''cricket'' or ''whistle''. Generally speaking, kakalu will likely translate to cricket more often than whistle, because it is a secondary definition for whistle. <br />
* (ton) (large cricket or cicada) kakalu → sararer (wad) <br />
* (ton) (whistle) kakalu → nginggisi (wad)<br />
<br />
===Example sentences===<br />
In this example sentence, the lexical selection tool should choose the translation "nginggisi", because the "kakalu" is preceded by the verb "angi", which means to blow.<br />
*ʻOku ou angi e kakalu. (I blow the whistle.) <br />
<br />
*ʻOku ou moloki e kakalu. (I step on a cricket.)<br />
<br />
<br />
In this example, the translation "utanusara", because "tangi" is followed by the word for question. <br />
*ʻOku ou tangi a fehuʻi. (I ask a question.)<br />
*ʻOku tangi <br />
<br />
<br />
[[Category:Lexical selection]]<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Lexical_Selection&diff=4221Wamesa and Tongan/Lexical Selection2017-03-31T03:04:04Z<p>Twarner2: </p>
<hr />
<div>==wad → ton==<br />
The Wamesa word ''matotap'' can mean ''to fall'' or ''to collapse''.<br />
*(wad) matotap (to fall) → tō (ton)<br />
**'''Ex:''' ''Babinpai antumpai sumatotap.'' (The woman and the child fall.)<br />
*(wad) matotap (to collapse) → holo (ton)<br />
===Example sentences===<br />
**'''Ex:''' ''Aniopasiat simatotap.'' (The houses are collapsing.)<br />
In Wamesa, ''mun'' means ''to hunt'' or ''to kill''.<br />
*(wad) mun (hunt) → fakaʻete (ton)<br />
**'''Ex:''' ''Imun pimunapesi.'' (I am hunting a pig.)<br />
*(wad) mun (kill) → tāmateʻi (ton)<br />
**'''Ex:''' ''Imun muanpai.'' (I killed the man.)<br />
<br />
==ton → wad==<br />
The Tongan word ''tangi'' means ''to cry/weep'' as well as ''to ask/appeal''.<br />
* (ton) (to cry/weep) tangi → sau (wad)<br />
* (ton) (to ask/appeal) tangi → utanusara (wad)<br />
In Tongan, ''kakalu'' means either ''cricket'' or ''whistle''. Generally speaking, kakalu will likely translate to cricket more often than whistle, because it is a secondary definition for whistle. <br />
* (ton) (large cricket or cicada) kakalu → sararer (wad) <br />
* (ton) (whistle) kakalu → nginggisi (wad)<br />
<br />
===Example sentences===<br />
In this example sentence, the lexical selection tool should choose the translation "nginggisi", because the "kakalu" is preceded by the verb "angi", which means to blow.<br />
*ʻOku ou angi e kakalu. (I blow the whistle.) <br />
<br />
*ʻOku ou moloki e kakalu. (I step on a cricket.)<br />
<br />
<br />
In this example, the translation "utanusara", because "tangi" is followed by the word for question. <br />
*ʻOku ou tangi a fehuʻi. (I ask a question.)<br />
*ʻOku tangi <br />
<br />
<br />
[[Category:Lexical selection]]<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Spring_2019/Structural_transfer&diff=4219Spring 2019/Structural transfer2017-03-31T03:01:10Z<p>Twarner2: /* The structure of a transfer file */</p>
<hr />
<div><br />
== Background ==<br />
<br />
=== The basic idea of structural transfer in RBMT ===<br />
The idea of structural transfer in RBMT is to deal with the order and tag differences encountered in translation between two languages<br />
<br />
[[File:Transfer - basic idea.png|thumb|The arrows between the two tagged levels represent where structural transfer is needed. Colour coding shows [rough] correspondences.]]<br />
<br />
=== How structural transfer works in Apertium ===<br />
<br />
==== Three levels ====<br />
There are three stages of structural transfer in Apertium: chunker (t1x), interchunk (t2x), postchunk (t3x). The effect of some rules implemented at each stage are shown below:<br />
<br />
[[File:Transfer - full.png|thumb|Each stage of structural transfer: chunker, interchunk, postchunk]]<br />
<br />
Chunker has access to word-level lemmas and tags, interchunk has access to chunk-level names and tags, and postchunk has access only to chunk-level names.<br />
<br />
==== The structure of a transfer file ====<br />
<br />
The rules in a transfer file go in '''<code><section-rules>...</section-rules></code>'''. Each <code><rule>...</rule></code> consists of a <code><pattern>...</pattern></code> and an <code><action>...</action></code>.<br />
<br />
The matched pattern is an ordered list of <code><pattern-item>...</pattern-item></code>s, whose names refer to <code><def-cat>...</def-cat></code>s, which contain <code><cat-item tags=""/></code>s (tags defined as in [[lexical selection]]) and are defined in '''<code><section-def-cats>...</section-def-cats></code>'''.<br />
<br />
The action section of a rule can contain <code><out>...</out></code> blocks containing the general structure of what is output in place of the matched pattern, <code><let>...</let></code> statements for setting variables (defined in '''<code><section-def-vars>...</section-def-vars></code>''') or mutating tags, <code><choose>...</choose></code> conditional blocks, <code><call-macro>...</call-macro></code> statements for calling a macro.<br />
<br />
Macros are defined in <code><def-macro>...</def-macro></code> blocks inside '''<code><section-def-macros>...</section-def-macros></code>'''. They allow any combination of parts of an action section (though <code><out>...</out></code> blocks are to be avoided) to be used within an arbitrary action section.<br />
<br />
An <code><out>...</out></code> block should immediately contain a <code><chunk>...</chunk></code>, which in turn contains chunk <code><tags>...</tags></code> and <code><lu>...</lu></code> (lexical unit) blocks (separated by <code><b/></code> spaces) defining the lexical unit and corresponding tags to be output.<br />
<br />
Each lexical unit consists of <code><clip/></code>s, which contain the attributes <code>pos=""</code> for position matched in the pattern, <code>side=""</code> for the side to output, and <code>part=""</code> for the part of the material to output. Parts can include <code>lem</code> for the lemma, <code>whole</code> for the entirety, and any set of tags (<code><attr-item/></code>s) you define as <code><def-attr>...</def-attr></code> in '''<code><section-def-attrs>...</section-def-attrs></code>'''.<br />
<br />
Plenty of examples are available:<br />
* [https://github.com/jonorthwash/apertium-kir-eng eng-kir] transfer that covers the example above and basically nothing else.<br />
* [http://svn.code.sf.net/p/apertium/svn/trunk/apertium-en-es/ en-es]: a mature translation pair with well developed structural transfer for English-Spanish and Spanish-English translation.<br />
* And lots in between.<br />
<br />
==== Writing rules ====<br />
One of the best documented features of Apertium are its transfer rules. Here are some places to read, in approximate order of level of complexity<br />
* [[:apertium:Contributing to an existing pair#Adding structural transfer (grammar) rules|Adding structural transfer rules to an existing pair]]<br />
* [[:apertium:Transfer rules examples|Examples of transfer rules]]<br />
* [[:apertium:A long introduction to transfer rules|A long introduction to transfer rules]]<br />
* [http://xixona.dlsi.ua.es/~fran/apertium2-documentation.pdf Full apertium documentation] (section 3.5 covers the transfer module)<br />
<br />
== The assignment ==<br />
* Add a page to the wiki called <code>Language1_and_Language2/Structural_transfer</code>, linking to it from the main page on the language pair.<br />
** Put the page in the category [[:Category:Sp17_StructuralTransfer]].<br />
** Perform WER, PER, and coverage tests on your short sentences corpus, and add this in to a pre-evaluation section.<br />
<br />
* Implement at least one item from your contrastive grammar.<br />
** Each person in each group should implement at least one item for the direction that translates into the language that they have been primarily working with. The same item does not need to be used for each direction.<br />
** If the contrastive grammar item only involves relabelling or reordering tags within the same form, then please do at least two items.<br />
<br />
* Add to your structural transfer wiki page:<br />
** Add at least one example sentence for each item you implement. Show the outputs of the following modes for your translation system: tagger, biltrans, chunker, postchunk, and the pair itself (abc-xyz).<br />
** Perform WER, PER, and coverage tests again, and add into a post-evaluation section on the wiki page.<br />
<br />
<br />
[[Category:Assignments]]<br />
[[Category:Structural transfer]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Lexical_Selection&diff=4215Wamesa and Tongan/Lexical Selection2017-03-31T01:28:49Z<p>Twarner2: </p>
<hr />
<div>==wad → ton==<br />
The Wamesa word ''matotap'' can mean ''to fall'' or ''to collapse''.<br />
*(wad) matotap (to fall) → tō (ton)<br />
**'''Ex:''' ''Babinpai antumpai sumatotap.'' (The woman and child fall.)<br />
*(wad) matotap (to collapse) → holo (ton)<br />
**'''Ex:''' ''Aniopasiat simatotap.'' (The houses are collapsing.)<br />
In Wamesa, ''mun'' means ''to hunt'' or ''to kill''.<br />
*(wad) mun (hunt) → fakaʻete (ton)<br />
**'''Ex:''' ''Imun pimunapesi.'' (I am hunting a pig.)<br />
*(wad) mun (kill) → tāmateʻi (ton)<br />
**'''Ex:''' ''Imun muanpai.'' (I killed the man.)<br />
<br />
==ton → wad==<br />
The Tongan word ''tangi'' means ''to cry/weep'' as well as ''to ask/appeal''.<br />
* (ton) (to cry/weep) tangi → sau (wad)<br />
* (ton) (to ask/appeal) tangi → utanusara (wad)<br />
In Tongan, ''kakalu'' means either ''cricket'' or ''whistle''.<br />
* (ton) (large cricket or cicada) kakalu → sararer (wad) <br />
* (ton) (whistle) kakalu → nginggisi (wad)<br />
<br />
[[Category:Lexical selection]]<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Lexical_Selection&diff=4214Wamesa and Tongan/Lexical Selection2017-03-31T01:27:51Z<p>Twarner2: </p>
<hr />
<div>==wad → ton==<br />
The Wamesa word ''matotap'' can mean ''to fall'' or ''to collapse''.<br />
*(wad) matotap (to fall) → tō (ton)<br />
**'''Ex:''' ''Kankanipasanu sumatotap.'' (The (two) eagles are falling.)<br />
*(wad) matotap (to collapse) → holo (ton)<br />
**'''Ex:''' ''Aniopasiat simatotap.'' (The houses are collapsing.)<br />
In Wamesa, ''mun'' means ''to hunt'' or ''to kill''.<br />
*(wad) mun (hunt) → fakaʻete (ton)<br />
**'''Ex:''' ''Imun pimunapesi.'' (I am hunting a pig.)<br />
*(wad) mun (kill) → tāmateʻi (ton)<br />
**'''Ex:''' ''Imun muanpai.'' (I killed the man.)<br />
<br />
==ton → wad==<br />
The Tongan word ''tangi'' means ''to cry/weep'' as well as ''to ask/appeal''.<br />
* (ton) (to cry/weep) tangi → sau (wad)<br />
* (ton) (to ask/appeal) tangi → utanusara (wad)<br />
In Tongan, ''kakalu'' means either ''cricket'' or ''whistle''.<br />
* (ton) (large cricket or cicada) kakalu → sararer (wad) <br />
* (ton) (whistle) kakalu → nginggisi (wad)<br />
<br />
[[Category:Lexical selection]]<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Lexical_Selection&diff=4213Wamesa and Tongan/Lexical Selection2017-03-31T01:26:30Z<p>Twarner2: </p>
<hr />
<div>==wad → ton==<br />
The Wamesa word ''matotap'' can mean ''to fall'' or ''to collapse''.<br />
*(wad) matotap (to fall) → tō (ton)<br />
**'''Ex:''' ''Antumpai miatotap.'' (The child is falling.)<br />
*(wad) matotap (to collapse) → holo (ton)<br />
**'''Ex:''' ''Aniopasiat simatotap.'' (The houses are collapsing.)<br />
In Wamesa, ''mun'' means ''to hunt'' or ''to kill''.<br />
*(wad) mun (hunt) → fakaʻete (ton)<br />
**'''Ex:''' ''Imun pimunapesi.'' (I am hunting a pig.)<br />
*(wad) mun (kill) → tāmateʻi (ton)<br />
**'''Ex:''' ''Imun muanpai.'' (I killed the man.)<br />
<br />
==ton → wad==<br />
The Tongan word ''tangi'' means ''to cry/weep'' as well as ''to ask/appeal''.<br />
* (ton) (to cry/weep) tangi → sau (wad)<br />
* (ton) (to ask/appeal) tangi → utanusara (wad)<br />
In Tongan, ''kakalu'' means either ''cricket'' or ''whistle''.<br />
* (ton) (large cricket or cicada) kakalu → sararer (wad) <br />
* (ton) (whistle) kakalu → nginggisi (wad)<br />
<br />
[[Category:Lexical selection]]<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2http://wikis.swarthmore.edu/ling073/index.php?title=Wamesa_and_Tongan/Lexical_Selection&diff=4208Wamesa and Tongan/Lexical Selection2017-03-31T00:59:41Z<p>Twarner2: </p>
<hr />
<div>==wad → ton==<br />
The Wamesa word ''matotap'' can mean ''to fall'' or ''to collapse''.<br />
*(wad) matotap (to fall) → tō (ton)<br />
**'''Ex:''' ''Antumpai miatotap.'' (The child is falling.)<br />
*(wad) matotap (to collapse) → holo (ton)<br />
**'''Ex:''' ''Aniopasiat simatotap.'' (The houses are collapsing.)<br />
In Wamesa, ''mun'' means ''to hunt'' or ''to kill''.<br />
*(wad) mun (hunt) → fakaʻete (ton)<br />
**'''Ex:''' ''Imun pimunapesi.'' (I hunt pigs.)<br />
*(wad) mun (kill) → tāmateʻi (ton)<br />
<br />
==ton → wad==<br />
The Tongan word ''tangi'' means ''to cry/weep'' as well as ''to ask/appeal''.<br />
* (ton) (to cry/weep) tangi → sau (wad)<br />
* (ton) (to ask/appeal) tangi → utanusara (wad)<br />
In Tongan, ''kakalu'' means either ''cricket'' or ''whistle''.<br />
* (ton) (large cricket or cicada) kakalu → sararer (wad) <br />
* (ton) (whistle) kakalu → nginggisi (wad)<br />
<br />
[[Category:Lexical selection]]<br />
[[Category:Sp17_TranslationPairs]]<br />
[[Category:Wamesa]]<br />
[[Category:Tongan]]</div>Twarner2