http://wikis.swarthmore.edu/ling073/api.php?action=feedcontributions&user=Tjones5&feedformat=atom
LING073 - User contributions [en]
2024-03-29T11:42:34Z
User contributions
MediaWiki 1.27.7
http://wikis.swarthmore.edu/ling073/index.php?title=Warlpiri/Universal_Dependencies&diff=5449
Warlpiri/Universal Dependencies
2017-05-12T07:58:26Z
<p>Tjones5: /* obj */</p>
<hr />
<div>=Evaluation=<br />
==Corpora==<br />
{| class="wikitable"<br />
|+Information<br />
|-<br />
|<br />
|Number of sentences<br />
|Number of forms<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 34<br />
| 353<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 8<br />
| 74<br />
|}<br />
<br />
==Withmorph==<br />
Results from using <code>wbp.withmorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 93.75%<br />
| 88.12%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 48.72%<br />
| 28.21%<br />
|}<br />
<br />
==Nomorph==<br />
Results from using <code>wbp.nomorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 71.25%<br />
| 65.00%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 35.90%<br />
| 23.08%<br />
|}<br />
<br />
=Dependency Relations=<br />
==aux==<br />
*An auxiliary is a defined as a "function word associated with a verbal predicate that expresses categories such as tense, mood, aspect, voice or evidentiality" ([http://universaldependencies.org/u/dep/aux_.html]). In Warlpiri, auxiliaries are used in every sentence, and they carry suffixes that signify tense and the number/person of subjects and objects.<br />
*Yani ''karna'' ngajulu ("I'm going") is glossed as:<br />
go + ''aux'' (present + I) + I<br />
*Yani ''kanpa'' nyuntulu ("You're going") is glossed as:<br />
go + ''aux'' (present + you) + you<br />
==nsubj==<br />
*A nominal subject is a "nominal which is the syntactic subject and the proto-agent of a clause" ([http://universaldependencies.org/u/dep/nsubj_.html]). In Warlpiri, the subject is often but not always included as its own word -- for example, pronouns are often omitted.<br />
*Yani karna ''ngajulu'' ("I'm going"):<br />
go + (present + I) + ''nsubj'' (I)<br />
*Yani ka ("He/she/it's going"):<br />
go + present<br />
==obj==<br />
*An object of a verb is usually "the noun phrase that denotes the entity acted upon or which undergoes a change of state or motion (the proto-patient)" ([http://universaldependencies.org/u/dep/obj_.html]). In Warlpiri, this noun is typically in the absolutive case when a transitive verb is used.<br />
*Nyanyi kanpa ''wawirri'' ("You can see a kangaroo"):<br />
see + (present + you) + ''obj'' (kangaroo)<br />
*Nyanyi kangku wawirrirli ''nyuntulu'' ("The kangaroo can see you"):<br />
see + (present + you) + kangaroo (ergative) + ''obj'' (you)<br />
<br />
==iobj==<br />
*An indirect object of a verb is "any nominal phrase that is a core argument of the verb but is not its subject or (direct) object" ([http://universaldependencies.org/u/dep/iobj_.html]). In Warlpiri, the dative case is used.<br />
*Nangala-rlu rla yungu pipa ''Jangala-ku''. ("Nangala gave the book to Jangala").<br />
Nangala + (dative subj3sg) + give + book + ''iobj'' (Jangala).<br />
*Karnta-patu-rlu-lu-jana ''kurdu-kurdu'' miyi yinyi. ("The women gave the food to the children")<br />
women + (past + subj3pl obj3pl) + ''iobj'' (children) + food + give<br />
<br />
==obl==<br />
*"The obl relation is used for a nominal (noun, pronoun, noun phrase) functioning as a non-core (oblique) argument or adjunct" ([http://universaldependencies.org/u/dep/obl_.html]). In Warlpiri, the ergative case is used when an instrument is being used to perform a task. The allative case is used when an action is being performed into another object.<br />
*Jakamarra-rlu ka Jupurrula luwarni ''karli-kirlirli'' ("Jakamarra is hitting Jupurrula with a boomerang").<br />
Jakamarra + pres + Jupurrula + hit + ''obl'' (boomerang + ergative)<br />
*Wakati ka-rnalu lurlurl-pi-nyi ''parraja-kurra'' ngurlu. ("We shake the seeds of the pigweed into a coolamon.")<br />
pigweed + present (subj1plexcl) + shake + ''obl'' (coolamon + allative) + seeds<br />
<br />
[[Category:sp17_UD]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=Warlpiri/Universal_Dependencies&diff=5448
Warlpiri/Universal Dependencies
2017-05-12T07:57:20Z
<p>Tjones5: /* Dependency Relations */</p>
<hr />
<div>=Evaluation=<br />
==Corpora==<br />
{| class="wikitable"<br />
|+Information<br />
|-<br />
|<br />
|Number of sentences<br />
|Number of forms<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 34<br />
| 353<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 8<br />
| 74<br />
|}<br />
<br />
==Withmorph==<br />
Results from using <code>wbp.withmorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 93.75%<br />
| 88.12%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 48.72%<br />
| 28.21%<br />
|}<br />
<br />
==Nomorph==<br />
Results from using <code>wbp.nomorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 71.25%<br />
| 65.00%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 35.90%<br />
| 23.08%<br />
|}<br />
<br />
=Dependency Relations=<br />
==aux==<br />
*An auxiliary is a defined as a "function word associated with a verbal predicate that expresses categories such as tense, mood, aspect, voice or evidentiality" ([http://universaldependencies.org/u/dep/aux_.html]). In Warlpiri, auxiliaries are used in every sentence, and they carry suffixes that signify tense and the number/person of subjects and objects.<br />
*Yani ''karna'' ngajulu ("I'm going") is glossed as:<br />
go + ''aux'' (present + I) + I<br />
*Yani ''kanpa'' nyuntulu ("You're going") is glossed as:<br />
go + ''aux'' (present + you) + you<br />
==nsubj==<br />
*A nominal subject is a "nominal which is the syntactic subject and the proto-agent of a clause" ([http://universaldependencies.org/u/dep/nsubj_.html]). In Warlpiri, the subject is often but not always included as its own word -- for example, pronouns are often omitted.<br />
*Yani karna ''ngajulu'' ("I'm going"):<br />
go + (present + I) + ''nsubj'' (I)<br />
*Yani ka ("He/she/it's going"):<br />
go + present<br />
==obj==<br />
*An object of a verb is usually "the noun phrase that denotes the entity acted upon or which undergoes a change of state or motion (the proto-patient)" ([http://universaldependencies.org/u/dep/obj_.html]). In Warlpiri, this noun is typically in the absolutive case when a transitive verb is used.<br />
*Nyanyi kanpa ''wawirri'' ("You can see a kangaroo):<br />
see + (present + you) + ''obj'' (kangaroo)<br />
*Nyanyi kangku wawirrirli ''nyuntulu'' ("The kangaroo can see you"):<br />
see + (present + you) + kangaroo (ergative) + ''obj'' (you)<br />
==iobj==<br />
*An indirect object of a verb is "any nominal phrase that is a core argument of the verb but is not its subject or (direct) object" ([http://universaldependencies.org/u/dep/iobj_.html]). In Warlpiri, the dative case is used.<br />
*Nangala-rlu rla yungu pipa ''Jangala-ku''. ("Nangala gave the book to Jangala").<br />
Nangala + (dative subj3sg) + give + book + ''iobj'' (Jangala).<br />
*Karnta-patu-rlu-lu-jana ''kurdu-kurdu'' miyi yinyi. ("The women gave the food to the children")<br />
women + (past + subj3pl obj3pl) + ''iobj'' (children) + food + give<br />
<br />
==obl==<br />
*"The obl relation is used for a nominal (noun, pronoun, noun phrase) functioning as a non-core (oblique) argument or adjunct" ([http://universaldependencies.org/u/dep/obl_.html]). In Warlpiri, the ergative case is used when an instrument is being used to perform a task. The allative case is used when an action is being performed into another object.<br />
*Jakamarra-rlu ka Jupurrula luwarni ''karli-kirlirli'' ("Jakamarra is hitting Jupurrula with a boomerang").<br />
Jakamarra + pres + Jupurrula + hit + ''obl'' (boomerang + ergative)<br />
*Wakati ka-rnalu lurlurl-pi-nyi ''parraja-kurra'' ngurlu. ("We shake the seeds of the pigweed into a coolamon.")<br />
pigweed + present (subj1plexcl) + shake + ''obl'' (coolamon + allative) + seeds<br />
<br />
[[Category:sp17_UD]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=Warlpiri/Universal_Dependencies&diff=5447
Warlpiri/Universal Dependencies
2017-05-12T07:41:44Z
<p>Tjones5: /* Dependency Relations */</p>
<hr />
<div>=Evaluation=<br />
==Corpora==<br />
{| class="wikitable"<br />
|+Information<br />
|-<br />
|<br />
|Number of sentences<br />
|Number of forms<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 34<br />
| 353<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 8<br />
| 74<br />
|}<br />
<br />
==Withmorph==<br />
Results from using <code>wbp.withmorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 93.75%<br />
| 88.12%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 48.72%<br />
| 28.21%<br />
|}<br />
<br />
==Nomorph==<br />
Results from using <code>wbp.nomorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 71.25%<br />
| 65.00%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 35.90%<br />
| 23.08%<br />
|}<br />
<br />
=Dependency Relations=<br />
==aux==<br />
*An auxiliary is a defined as a "function word associated with a verbal predicate that expresses categories such as tense, mood, aspect, voice or evidentiality" ([http://universaldependencies.org/u/dep/aux_.html]). In Warlpiri, auxiliaries are used in every sentence, and they carry suffixes that signify tense and the number/person of subjects and objects.<br />
*Yani ''karna'' ngajulu ("I'm going") is glossed as:<br />
go + ''aux'' (present + I) + I<br />
*Yani ''kanpa'' nyuntulu ("You're going") is glossed as:<br />
go + ''aux'' (present + you) + you<br />
==nsubj==<br />
*A nominal subject is a "nominal which is the syntactic subject and the proto-agent of a clause" ([http://universaldependencies.org/u/dep/nsubj_.html]). In Warlpiri, the subject is often but not always included as its own word -- for example, pronouns are often omitted.<br />
*Yani karna ''ngajulu'' ("I'm going"):<br />
go + (present + I) + ''nsubj'' (I)<br />
*Yani ka ("He/she/it's going"):<br />
go + present<br />
==obj==<br />
*An object of a verb is usually "the noun phrase that denotes the entity acted upon or which undergoes a change of state or motion (the proto-patient)" ([http://universaldependencies.org/u/dep/obj_.html]). In Warlpiri, this noun is typically in the absolutive case when a transitive verb is used.<br />
*Nyanyi kanpa ''wawirri'' ("You can see a kangaroo):<br />
see + (present + you) + ''obj'' (kangaroo)<br />
*Nyanyi kangku wawirrirli ''nyuntulu'' ("The kangaroo can see you"):<br />
see + (present + you) + kangaroo (ergative) + ''obj''(you)<br />
==iobj==<br />
<br />
==obl==<br />
<br />
<br />
[[Category:sp17_UD]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=Warlpiri/Universal_Dependencies&diff=5446
Warlpiri/Universal Dependencies
2017-05-12T07:11:49Z
<p>Tjones5: </p>
<hr />
<div>=Evaluation=<br />
==Corpora==<br />
{| class="wikitable"<br />
|+Information<br />
|-<br />
|<br />
|Number of sentences<br />
|Number of forms<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 34<br />
| 353<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 8<br />
| 74<br />
|}<br />
<br />
==Withmorph==<br />
Results from using <code>wbp.withmorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 93.75%<br />
| 88.12%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 48.72%<br />
| 28.21%<br />
|}<br />
<br />
==Nomorph==<br />
Results from using <code>wbp.nomorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 71.25%<br />
| 65.00%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 35.90%<br />
| 23.08%<br />
|}<br />
<br />
=Dependency Relations=<br />
<br />
<br />
<br />
[[Category:sp17_UD]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=Warlpiri/Universal_Dependencies&diff=5440
Warlpiri/Universal Dependencies
2017-05-12T01:16:55Z
<p>Tjones5: </p>
<hr />
<div>=Evaluation=<br />
==Corpora==<br />
{| class="wikitable"<br />
|+Information<br />
|-<br />
|<br />
|Number of sentences<br />
|Number of forms<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 33<br />
| 158<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 23<br />
| 103<br />
|}<br />
<br />
==Withmorph==<br />
Results from using <code>wbp.withmorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 91.14%<br />
| 78.48%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 60.19%<br />
| 47.57%<br />
|}<br />
<br />
==Nomorph==<br />
Results from using <code>wbp.nomorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 60.13%<br />
| 50.00%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 33.01%<br />
| 26.21%<br />
|}<br />
<br />
=Dependency Relations=<br />
<br />
<br />
<br />
[[Category:sp17_UD]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=Warlpiri/Universal_Dependencies&diff=5439
Warlpiri/Universal Dependencies
2017-05-12T01:16:20Z
<p>Tjones5: </p>
<hr />
<div>=Evaluation=<br />
==Corpora==<br />
{| class="wikitable"<br />
|+Information<br />
|-<br />
|<br />
|Number of sentences<br />
|Number of forms<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 33<br />
| 158<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 23<br />
| 103<br />
|}<br />
<br />
==Withmorph==<br />
Here are the results from using <code>wbp.withmorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 91.14%<br />
| 78.48%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 60.19%<br />
| 47.57%<br />
|}<br />
<br />
==Nomorph==<br />
Here are the results from using <code>wbp.nomorph.udpipe</code> on the two conllu files:<br />
<br />
{| class="wikitable"<br />
|+Results<br />
|-<br />
|<br />
|UAS<br />
|LAS<br />
|-<br />
|<code>wbp.annotated.ud.conllu</code><br />
| 60.13%<br />
| 50.00%<br />
|-<br />
|<code>wbp.annotated2.ud.conllu</code><br />
| 33.01%<br />
| 26.21%<br />
|}<br />
<br />
=Dependency Relations=<br />
<br />
<br />
<br />
[[Category:sp17_UD]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=Warlpiri&diff=5438
Warlpiri
2017-05-12T01:14:07Z
<p>Tjones5: </p>
<hr />
<div><br />
== Developed Resources ==<br />
*Keyboard layout wiki page: [https://wikis.swarthmore.edu/ling073/Warlpiri/Keyboard]<br />
*Keyboard layout repository: [https://github.swarthmore.edu/tjones5/ling073-wbp-keyboard]<br />
*Repository containing corpus of Warlpiri and English [https://github.swarthmore.edu/tjones5/ling073-wbp-corpus]<br />
*Grammar documentation page: [https://wikis.swarthmore.edu/ling073/Warlpiri/Grammar]<br />
*Morphological Transducer Code: [https://github.swarthmore.edu/tjones5/ling073-wbp]<br />
*Morphological Transducer Documentation: [https://wikis.swarthmore.edu/ling073/Warlpiri/Transducer]<br />
*Morphological Disambiguator: [https://wikis.swarthmore.edu/ling073/Warlpiri/Disambiguation]<br />
*Resources for machine translation between Guarani and Warlpiri [https://wikis.swarthmore.edu/ling073/Guarani_and_Warlpiri]<br />
*Dependency syntax: [https://wikis.swarthmore.edu/ling073/Warlpiri/Universal_Dependencies]<br />
*Link to final project: [https://wikis.swarthmore.edu/ling073/User:Tjones5/Final_project]<br />
<br />
== External Resources ==<br />
<br />
<br />
Kinds of resources:<br />
*computational resources (spell checkers, orthography converters, speech recognition software),<br />
*dictionaries/phrasebooks/glossaries (multilingual and monolingual, online and paper),<br />
*grammatical descriptions (theoretical and pædagogical),<br />
*scientific works (papers, books, websites), and<br />
*corpora (any collection of authentic text, linguistically annotated or not)<br />
<br />
<br />
<br />
===Computational Resources===<br />
*Kirrkirr is computer software allowing for transformation of lexical databases to provide information languages target at indigenous languages, created by Kevin Jansz, Christopher Manning, and Nitin Indurkhya. [https://web.archive.org/web/20090521110425/http://www-nlp.stanford.edu/kirrkirr] It's available for download on Linux. The specifics of the authors' work on Warlpiri is detailed here: [https://web.archive.org/web/20090506054726/http://www.sultry.arts.usyd.edu.au/SULTRY/warlpiri.html]<br />
*Christopher Horsethief has developed a Warlpiri Font Keyboard available on iOS devices. Available for purchase. [https://appadvice.com/app/warlpiri-font-keyboard/962527420]<br />
===Dictionaries/Phrasebooks===<br />
*The AuSIL (Australian Society for Indigenous Languages) has an interactive Warlpiri-English dictionary [http://ausil.org/Dictionary/Warlpiri/lexicon/index.htm] that includes a Warlpiri lexicon and English to Warlpiri translations. It appears to be free to download, yet download is not available on the current server. <br />
*The Glosbe Dictionary contains 55 phrases translated in English and Warlpiri. The phrases contain usage examples, and anyone can edit their translations [https://glosbe.com/en/wbp/]. Note about licenses: "Some of data we present is licensed with CC-BY-SA, some is FDL, some comes with custom license. Data source is always indicated next to data if it is needed due to the license." [https://glosbe.com/about] There is also a free API that can be used [https://glosbe.com/a-api].<br />
*A list of Warlpiri words, including English translations: [http://catalogue.aiatsis.gov.au/client/en_AU/external/search/detailnonmodal/ent:$002f$002fSD_ILS$002f0$002fSD_ILS:401933/ada?qu=warlpiri&te=ILS] No license necessary.<br />
*A concise and comprehensive edition of the Warlpiri dictionaries used to build KirrKirr for Warlpiri: [https://web.archive.org/web/20090506054726/http://www.sultry.arts.usyd.edu.au/SULTRY/warlpiri.html] Available for free download.<br />
===Grammatical Descriptions===<br />
*The Omniglot encyclopedia includes a short grammatical description of Warlpiri: [http://www.omniglot.com/writing/warlpiri.htm] No license necessary.<br />
*The AuSIL has a description of Warlpiri, including its sounds and abbreviations:[http://ausil.org/Dictionary/Warlpiri/aboutwarlpiri.htm] No license necessary.<br />
*''Syntax - Theory and and Analysis'' (Kiss, 2015) includes a chapter about the grammatical structures of Walpiri [https://books.google.com/books?id=HABfCAAAQBAJ&pg=PA1677&lpg=PA1677&dq=steve+swartz+warlpiri+bible&source=bl&ots=fFrQBJrOAJ&sig=TGGWn3GfUpGRA60w3OLzDq_Yaak&hl=en&sa=X&ved=0ahUKEwjMn_Gp69jRAhWBeyYKHY0aBE0Q6AEIPDAI#v=onepage&q=steve%20swartz%20warlpiri%20bible&f=false]<br />
*''Edinburgh Handbook of Evaluative Morphology'' (Bowler, 2015) [https://books.google.com/books?id=nDckDQAAQBAJ&pg=PA438&lpg=PA438&dq=nominal+warlpiri&source=bl&ots=4vQ5MdRO8z&sig=K3POmlFL24SSi_gCy3cR6rMwEqo&hl=en&sa=X&ved=0ahUKEwiH9b-GwP7RAhVkslQKHWqQBKUQ6AEIOzAF#v=onepage&q=nominal%20warlpiri&f=false] All rights reserved. Available for viewing on Google Scholar.<br />
===Scientific Works===<br />
*The AuSIL also contains links to four papers by Stephen Swartz (1982), regarding Warlpiri clauses, propositional particles, verb roots, and verbal clauses, in photocopied PDF form [http://ausil.org/Dictionary/Warlpiri/aboutwarlpiri.htm] No license is necessary to download it.<br />
*Jane Simpson is a researcher at the Australian National University who has published many papers on Warlpiri, including her PhD thesis "Aspects of Warlpiri Morphology and Syntax" [https://dspace.mit.edu/handle/1721.1/15468] No license is necessary to download it.<br />
*Jane Simpson's book ''Warlpiri Morpho-Syntax'' (1991) is available in McCabe: [https://catalog.tricolib.brynmawr.edu/find/Record/.b2064178]<br />
===Corpora===<br />
*Sample paragraph of Warlpiri from Omniglot Encyclopedia, including the English translation. Copyrighted, but usable as long as reference is given and it's not used commercially. [http://www.omniglot.com/writing/warlpiri.htm]<br />
*New testament in Warlpiri, available in other translations including English. All rights reserved, © Wycliffe Bible Translators, Inc. [https://www.bible.com/bible/1355/jhn.1]<br />
*A series of short stories by AIATSIS Australian Indigenous Languages Collection Informants. Open access, available for download: [http://catalogue.aiatsis.gov.au/client/en_AU/external/search/detailnonmodal/ent:$002f$002fSD_ILS$002f0$002fSD_ILS:400826/ada?qu=warlpiri&te=ILS]<br />
===Other===<br />
*David Nash has compiled a list of Warlpiri references, dictionaries, and literature: [http://www.anu.edu.au/linguistics/nash/aust/wlp/wlp-lx-ref.html]<br />
*A 4-minute news program ''Indigenous Language News Radio'', available online in audio form: [http://www.abc.net.au/news/indigenous/]<br />
*Midterm overview notes: https://wikis.swarthmore.edu/ling073/Warlpiri/Midterm<br />
<br />
<br />
[[Category:warlpiri]]<br />
<br />
[[Category:sp17_ResourceDocumentation]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=Warlpiri/Universal_Dependencies&diff=5437
Warlpiri/Universal Dependencies
2017-05-12T01:13:55Z
<p>Tjones5: Created page with "Placeholder for UD"</p>
<hr />
<div>Placeholder for UD</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5382
User:Tjones5/Final project
2017-05-11T00:40:41Z
<p>Tjones5: /* Improvements Made to the Transducer */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.com/tkjones117/ling073-wbp<br />
<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/8cd682e70a81378184b6a90f381f16e6d130459c<br />
*.lexc:<br />
**added a few more stems<br />
<br />
==Improvements Made to Disambiguation==<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/6edc1d225ba43f369468448b83d009616a9c6c41<br />
<br />
*Ambiguity before disambiguation: ~1.41077302631578947368<br />
*Ambiguity after disambiguation (Pre-final project): ~1.35831766917293233083<br />
*Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
Number of stems: ~340<br />
<br />
Final coverage: 61.48%<br />
<br />
Against new_corpus.txt:<br />
*Precision: 84.21053%<br />
*Recall: 32.98969%<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5370
User:Tjones5/Final project
2017-05-10T23:58:16Z
<p>Tjones5: </p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/8cd682e70a81378184b6a90f381f16e6d130459c<br />
*.lexc:<br />
**added a few more stems<br />
<br />
==Improvements Made to Disambiguation==<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/6edc1d225ba43f369468448b83d009616a9c6c41<br />
<br />
*Ambiguity before disambiguation: ~1.41077302631578947368<br />
*Ambiguity after disambiguation (Pre-final project): ~1.35831766917293233083<br />
*Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
Number of stems: ~340<br />
<br />
Final coverage: 61.48%<br />
<br />
Against new_corpus.txt:<br />
*Precision: 84.21053%<br />
*Recall: 32.98969%<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5369
User:Tjones5/Final project
2017-05-10T23:52:55Z
<p>Tjones5: /* Final Evaluation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/8cd682e70a81378184b6a90f381f16e6d130459c<br />
*.lexc:<br />
**added a few more stems<br />
<br />
==Improvements Made to Disambiguation==<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/6edc1d225ba43f369468448b83d009616a9c6c41<br />
<br />
*Ambiguity before disambiguation: ~1.41077302631578947368<br />
*Ambiguity after disambiguation (Pre-final project): ~1.35831766917293233083<br />
*Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
Number of stems: ~340<br />
<br />
Final coverage: 61.48%<br />
<br />
Against new_corpus.txt:<br />
*Precision: 84.21053%<br />
*Recall: 32.98969%<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5368
User:Tjones5/Final project
2017-05-10T23:52:31Z
<p>Tjones5: /* Final Evaluation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/8cd682e70a81378184b6a90f381f16e6d130459c<br />
*.lexc:<br />
**added a few more stems<br />
<br />
==Improvements Made to Disambiguation==<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/6edc1d225ba43f369468448b83d009616a9c6c41<br />
<br />
*Ambiguity before disambiguation: ~1.41077302631578947368<br />
*Ambiguity after disambiguation (Pre-final project): ~1.35831766917293233083<br />
*Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
# stems: ~340<br />
Final coverage: 61.48%<br />
<br />
Against new_corpus.txt:<br />
*Precision: 84.21053%<br />
*Recall: 32.98969%<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5367
User:Tjones5/Final project
2017-05-10T23:50:54Z
<p>Tjones5: /* Improvements Made to the Transducer */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/8cd682e70a81378184b6a90f381f16e6d130459c<br />
*.lexc:<br />
**added a few more stems<br />
<br />
==Improvements Made to Disambiguation==<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/6edc1d225ba43f369468448b83d009616a9c6c41<br />
<br />
*Ambiguity before disambiguation: ~1.41077302631578947368<br />
*Ambiguity after disambiguation (Pre-final project): ~1.35831766917293233083<br />
*Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
Final coverage: 61.48%<br />
<br />
Against new_corpus.txt:<br />
*Precision: 84.21053%<br />
*Recall: 32.98969%<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5366
User:Tjones5/Final project
2017-05-10T23:50:38Z
<p>Tjones5: /* Improvements Made to the Transducer */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/8cd682e70a81378184b6a90f381f16e6d130459c<br />
.lexc:<br />
**added a few more stems<br />
<br />
==Improvements Made to Disambiguation==<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/6edc1d225ba43f369468448b83d009616a9c6c41<br />
<br />
*Ambiguity before disambiguation: ~1.41077302631578947368<br />
*Ambiguity after disambiguation (Pre-final project): ~1.35831766917293233083<br />
*Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
Final coverage: 61.48%<br />
<br />
Against new_corpus.txt:<br />
*Precision: 84.21053%<br />
*Recall: 32.98969%<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5365
User:Tjones5/Final project
2017-05-10T23:46:27Z
<p>Tjones5: /* Final Evaluation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
==Improvements Made to Disambiguation==<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/6edc1d225ba43f369468448b83d009616a9c6c41<br />
<br />
*Ambiguity before disambiguation: ~1.41077302631578947368<br />
*Ambiguity after disambiguation (Pre-final project): ~1.35831766917293233083<br />
*Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
Final coverage: 61.48%<br />
<br />
Against new_corpus.txt:<br />
*Precision: 84.21053%<br />
*Recall: 32.98969%<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5317
User:Tjones5/Final project
2017-05-10T07:18:38Z
<p>Tjones5: /* Final Evaluation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
==Improvements Made to Disambiguation==<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/6edc1d225ba43f369468448b83d009616a9c6c41<br />
<br />
*Ambiguity before disambiguation: ~1.41077302631578947368<br />
*Ambiguity after disambiguation (Pre-final project): ~1.35831766917293233083<br />
*Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
Final coverage: 61.33%<br />
TODO<br />
-precision/recall against annotated corpus<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5316
User:Tjones5/Final project
2017-05-10T07:16:16Z
<p>Tjones5: /* Improvements Made to Disambiguation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
==Improvements Made to Disambiguation==<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/6edc1d225ba43f369468448b83d009616a9c6c41<br />
<br />
*Ambiguity before disambiguation: ~1.41077302631578947368<br />
*Ambiguity after disambiguation (Pre-final project): ~1.35831766917293233083<br />
*Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
Final coverage: 61.33%<br />
TODO<br />
-precision/recall against annotated corpus<br />
-Author file, etc<br />
<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5315
User:Tjones5/Final project
2017-05-10T07:14:25Z
<p>Tjones5: /* Improvements Made to Disambiguation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
==Improvements Made to Disambiguation==<br />
*Ambiguity before disambiguation: ~1.41077302631578947368<br />
*Ambiguity after disambiguation (Pre-final project): ~1.35831766917293233083<br />
*Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
Final coverage: 61.33%<br />
TODO<br />
-precision/recall against annotated corpus<br />
-Author file, etc<br />
<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5314
User:Tjones5/Final project
2017-05-10T07:13:48Z
<p>Tjones5: </p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
==Improvements Made to Disambiguation==<br />
*Initial ambiguity:<br />
**Ambiguity before disambiguation: ~1.41077302631578947368<br />
**Ambiguity after disambiguation (Initial): ~1.35831766917293233083<br />
**Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Final Evaluation==<br />
Final coverage: 61.33%<br />
TODO<br />
-precision/recall against annotated corpus<br />
-Author file, etc<br />
<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5313
User:Tjones5/Final project
2017-05-10T07:11:52Z
<p>Tjones5: /* Improvements Made to Disambiguation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
Final coverage: 61.33%<br />
<br />
TODO:<br />
-add more verb stems<br />
-evaluate coverage, precision/recall against annotated corpus<br />
-remember author, etc<br />
<br />
==Improvements Made to Disambiguation==<br />
*Initial ambiguity:<br />
**Ambiguity before disambiguation: ~1.41077302631578947368<br />
**Ambiguity after disambiguation (Initial): ~1.35831766917293233083<br />
**Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5312
User:Tjones5/Final project
2017-05-10T07:11:14Z
<p>Tjones5: </p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
Final coverage: 61.33%<br />
<br />
TODO:<br />
-add more verb stems<br />
-evaluate coverage, precision/recall against annotated corpus<br />
-remember author, etc<br />
<br />
==Improvements Made to Disambiguation==<br />
*Initial ambiguity:<br />
**Ambiguity before disambiguation: ~1.41077302631578947368<br />
**Ambiguity after disambiguation (Initial): ~1.35831766917293233083<br />
**Ambiguity after disambiguation (Final): ~1.34727443609022556391<br />
<br />
# If there is a transitive verb two words to the left, I don't take a subject.<br />
REMOVE (n sg prf) IF (-2 Vtv) ;<br />
<br />
# If there is a transitive verb two words to the left, I'm in the absolutive case.<br />
SELECT (n sg abs) IF (-2 Vtv) ;<br />
<br />
# If there is an intransitive verb two words to the left, I'm the subject.<br />
SELECT (n sg abs) IF (-2 Viv) ;<br />
<br />
# If there is an auxiliary to the right or left, I cannot be a numeral.<br />
REMOVE (num) IF ((1 Aux) OR (-1 Aux)) ;<br />
<br />
# If there is an auxiliary to the left, I cannot be an auxiliary.<br />
REMOVE (vaux) IF (-1 Aux) ;<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5297
User:Tjones5/Final project
2017-05-10T05:52:56Z
<p>Tjones5: </p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
Final coverage: 61.33%<br />
<br />
TODO:<br />
-add more verb stems<br />
-evaluate coverage, precision/recall against annotated corpus<br />
-remember author, etc<br />
<br />
==Improvements Made to Disambiguation==<br />
*Initial ambiguity:<br />
**Ambiguity before disambiguation: ~1.41077302631578947368<br />
**Ambiguity after disambiguation: ~1.35831766917293233083<br />
<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5296
User:Tjones5/Final project
2017-05-10T05:15:49Z
<p>Tjones5: /* Improvements Made to the Transducer */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/2857cffb76e0014b8ea90b21f40e4cc957ae280f<br />
*.lexc:<br />
**added "yungu" as a variation of the auxiliary "yunga"<br />
**added more optional dashes in the auxiliary suffixes<br />
**added ability to append another object suffixes to auxiliary verb<br />
**added auxiliary suffixes to adverbs<br />
<br />
TODO:<br />
-add more verb stems<br />
-evaluate coverage, precision/recall against annotated corpus<br />
-remember author, etc<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5293
User:Tjones5/Final project
2017-05-10T04:40:22Z
<p>Tjones5: </p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
<br />
TODO:<br />
-add more verb stems<br />
-evaluate coverage, precision/recall against annotated corpus<br />
-remember author, etc<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5292
User:Tjones5/Final project
2017-05-10T04:37:31Z
<p>Tjones5: /* Improvements Made to the Transducer */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
-add more verb stems<br />
-remember files<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5273
User:Tjones5/Final project
2017-05-10T02:49:59Z
<p>Tjones5: /* Improvements Made to the Transducer */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: nyina = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5272
User:Tjones5/Final project
2017-05-10T02:48:37Z
<p>Tjones5: /* Current Evaluation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: piyani = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5271
User:Tjones5/Final project
2017-05-10T02:48:22Z
<p>Tjones5: </p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: piyani = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<!--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5270
User:Tjones5/Final project
2017-05-10T02:48:00Z
<p>Tjones5: /* Notes */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: piyani = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5269
User:Tjones5/Final project
2017-05-10T02:47:47Z
<p>Tjones5: /* Current Evaluation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: piyani = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5268
User:Tjones5/Final project
2017-05-10T02:47:31Z
<p>Tjones5: /* Current Evaluation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: piyani = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<--<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
--><br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5267
User:Tjones5/Final project
2017-05-10T02:47:03Z
<p>Tjones5: /* Current Evaluation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: piyani = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
--><br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5266
User:Tjones5/Final project
2017-05-10T02:46:32Z
<p>Tjones5: /* Current Evaluation */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: piyani = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
<!--<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5265
User:Tjones5/Final project
2017-05-10T02:45:58Z
<p>Tjones5: /* Improvements Made to the Transducer */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: piyani = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
Commit: next commit<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<!--<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5264
User:Tjones5/Final project
2017-05-10T02:38:23Z
<p>Tjones5: /* Improvements Made to the Transducer */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: next commit https://github.swarthmore.edu/tjones5/ling073-wbp/commit/f8841c64db1c248815bc5c5a3ce4a23b8d30f641<br />
*.lexc:<br />
**added present verb analysis (ex: piyani = present form of "to be", not piyanimi)<br />
**added verb tense and suffix analysis (apparently sometimes verbs can object suffixes; previously I thought it was only auxiliaries that could do this)<br />
**added "kirra" as another possible allative object ending<br />
**added clitic analysis and several clitics -- something needs to be changed so that they increase coverage<br />
**added 6 noun stems, ~55 verb stems, 3 adverb stems<br />
*.twol:<br />
**eliminated bi-directionality of some rules<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<!--<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5153
User:Tjones5/Final project
2017-05-09T00:01:19Z
<p>Tjones5: /* Improvements Made to the Transducer */</p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
*.lexc:<br />
**optional dashes throughout (verb suffixes, pronouns)<br />
**improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
**added several noun stems<br />
*.twol:<br />
**added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: next commit?<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<!--<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5152
User:Tjones5/Final project
2017-05-09T00:00:48Z
<p>Tjones5: </p>
<hr />
<div>==Improvements Made to the Transducer==<br />
Final version:<br />
https://github.swarthmore.edu/tjones5/ling073-wbp<br />
<br />
Commit: https://github.swarthmore.edu/tjones5/ling073-wbp/commit/125bb63f33f5f50caf682b746b972c549b4da7a0<br />
.lexc:<br />
*optional dashes throughout (verb suffixes, pronouns)<br />
*improved organization of pronoun analysis: separation of 1st person from 2nd/3rd person<br />
*added several noun stems<br />
<br />
.twol:<br />
*added %{U%} archiphoneme to appear in suffixes on nouns in dative case<br />
<br />
Commit: next commit?<br />
<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<!--<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5142
User:Tjones5/Final project
2017-05-08T02:21:26Z
<p>Tjones5: /* Current Evaluation */</p>
<hr />
<div>==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
Initial Eval:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18481<br />
**Coverage: 49.67%<br />
<br />
Checkpoint at presentation:<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Number of stems: ~250<br />
<br />
<br />
<!--<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
--><br />
<br />
<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=Guarani_and_Warlpiri&diff=5137
Guarani and Warlpiri
2017-05-08T01:42:48Z
<p>Tjones5: /* gug → wbp */</p>
<hr />
<div>Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Guarani Guarani] and [https://wikis.swarthmore.edu/ling073/Warlpiri Warlpiri]<br />
<br />
==Corpora==<br />
*[https://www.bible.com/bible/1355/rev.1 Warlpiri Bible]<br />
<br />
*[https://www.bible.com/bible/66/rev.1 Guarani Bible]<br />
<br />
*[https://www.bible.com/bible/111/rev.1 English Bible]<br />
<br />
*[http://unicode.org/udhr/d/udhr_gug.txt Universal Declaration of Human Rights in Guarani]<br />
<br />
==Developed Resources==<br />
*[https://github.swarthmore.edu/tjones5/ling073-gug-wbp-corpus Corpus Repository]<br />
*[https://wikis.swarthmore.edu/ling073/Guarani_and_Warlpiri/Contrastive_Grammar Contrastive Grammar]<br />
*[https://github.swarthmore.edu/tjones5/ling073-gug-wbp Guarani to Warlpiri analyzer repository]<br />
**This repo will hold the fully functioning rule-based machine translation system.<br />
*[https://github.swarthmore.edu/cpillsb1/ling073-wbp-gug Warlpiri to Guarani analyzer repository]<br />
**This repo will not be developed further.<br />
*[https://wikis.swarthmore.edu/ling073/Guarani_and_Warlpiri/Lexical_selection Lexical selection Wiki page]<br />
*[https://wikis.swarthmore.edu/ling073/Guarani_and_Warlpiri/Structural_transfer Structural transfer Wiki page]<br />
<br />
==Lexical Selection==<br />
* See Lexical Selection page linked above for more detail<br />
<br />
* (gug) ahy'o → (wbp) ngukarna "throat"; lirra "voice"<br />
* (gug) ha'anga → (wbp) japirdimi "threaten"; walaparrirni "imitate"<br />
* (gug) ajúra → (wbp) japirdimi "collar"; waninja "neck"<br />
* (wbp) rdaka → (gug) avatisoka "hand"; po "five"<br />
* (wbp) kirdirrpa → (gug) itakua "cave"; ka'irãi "jail"<br />
<br />
==Initial Evaluation==<br />
===wbp → gug evaluation===<br />
====Results when running on tests====<br />
*Total number of words: 14<br />
*Total number of unknown words: 4<br />
*Percentage of unknown words: 28.57%<br />
*Percentage of stems translated correctly: 71.43%<br />
<br />
When removing unknown words:<br />
*WER = 78.57%<br />
*PER = 78.57%<br />
When unknown words are not removed:<br />
*WER = 100%<br />
*PER = 100%<br />
<br />
====Results when running on sentences (corpus) ====<br />
*Total number of words: 58<br />
*Total number of unknown words: 68<br />
*Percentage of unknown words: 84.45%<br />
*Percentage of stems translate correctly: 15.52%<br />
<br />
When removing unknown words:<br />
*WER = 22.41%<br />
*PER = 22.41%<br />
When unknown words are not removed:<br />
*WER = 100%<br />
*PER = 100%<br />
<br />
<br />
===gug → wbp evaluation===<br />
====Results when running on tests====<br />
*Total number of words: 13<br />
*Total number of unknown words: 0<br />
*Percentage of unknown words: 0%<br />
*Percentage of stems translated correctly: 100% <br />
<br />
When removing unknown words:<br />
*WER = 92.86%<br />
*PER = 92.86%<br />
When unknown words are not removed:<br />
*WER = 92.86%<br />
*PER = 92.86%<br />
<br />
====Results when running on sentences (corpus) ====<br />
*Total number of words: 66<br />
*Total number of unknown words: 60<br />
*Percentage of unknown words: 90.9%<br />
*Percentage of stems translated correctly: 9.09%<br />
<br />
When removing unknown words:<br />
*WER = 0.0%<br />
*PER = 0.0%<br />
When unknown words are not removed:<br />
*WER = 100%<br />
*PER = 100%<br />
<br />
==Final Evaluation==<br />
<br />
===gug → wbp and wbp → gug===<br />
====100 more stems====<br />
Added 100 more stems (mostly nouns, verbs, and adjectives) to the bilingual and monolingual dictionaries.<br />
<br />
===gug → wbp===<br />
<br />
====6 new lexical selection rules====<br />
* (gug) akã → (wbp) jurru "head"; payarrpa "hill"<br />
** Rule: Select "jurru" (Eng: head) as the translation of "akã" when it is preceded by "ha'e" (Eng: his/her). Otherwise, select "payarrpa" as the default.<br />
<br />
* (gug) asu → (wbp) jampu "left"; jampu-karra "left-handed"<br />
** Rule: Select "jampu-karra" (Eng: left-handed) as the translation of "asu" when it is followed by person words such as "ava" (Eng: person) or "kuña" (Eng: woman) or "mitã" (Eng: child). Otherwise, select "jampu" (Eng: left) as the default.<br />
<br />
* (gug) henyhẽ → (wbp) jaka-ngalya "full"; ngayarrka "pregnant"<br />
** Rule: Select "ngayarrka" (Eng: pregnant) as the translation of "henyhẽ" when it is followed/preceded by "kuña" (Eng: woman) or other words describing female people. Otherwise, select "jaka-ngalya" (Eng: full) as the default.<br />
<br />
* (gug) mongúi → (wbp) kijirni "to throw"; pata-karrimi "to fall"<br />
** Rule: Select "kijirni" (Eng: to throw) as the translation of "mopẽ" when it is followed by "ita" (Eng: rock) or "apu'a" (Eng: ball). Otherwise, select "pata-karrimi" as the default.<br />
<br />
* (gug) moha'anga → (wbp) pantirni "to draw"; jarntirni "to carve"<br />
** Rule: Select "pantirni" (Eng: to draw) as the translation of "moha'anga" when it is followed by "ha'anga" (Eng: drawing). Otherwise, select "jarntirni" as the default.<br />
<br />
* (gug) mbogyapy → (wbp) pipa-kurra-mani "to write"; nyinanja-wantimi "to sit down"<br />
** Rule: Select "pipa-kurra-mani" (Eng: to write) as the translation of "mbogyapy" when it is followed by "aranduka" (Eng: book). Otherwise, select "nyinanja-wantimi" as the default.<br />
<br />
====6 additions to the morphology====<br />
*Added numeral analysis:<br />
**Several number words in Warlpiri also have meanings as nouns (e.g. rdaka can mean "hand" or "five")<br />
**the -pala ending distinguishes them as number words, but it is not always necessary<br />
**"rdaka-pala": ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
**"rdaka": ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
*Added adverb analysis:<br />
**"nyurruwiyi: ^nyurruwiyi/nyurruwiyi<adv>$^./.<sent>$<br />
<br />
*Added elative/source case for verb auxiliaries :<br />
**"kajangka": ^kajangka/ka<vaux><src><subj3sg>$^./.<sent>$<br />
<br />
*Added inceptive aspect for verbs:<br />
**"pakarnunjunu": ^pakarnunjunu/pakarni<v><tv><past><inc>$^./.<sent>$<br />
<br />
*Added possessive marker "nyanu" for nouns:<br />
**"kaja-nyanu-rlu": ^kaja-nyanu-rlu/kaja<n><pl><poss><erg>$^./.<sent>$<br />
<br />
*Added the topicalizing particle grammatical marker for nouns. -ja is used to mark known information or to indicate the topic, although sometimes it has no function.<br />
**echo "kurdukuju": ^kurdukuju/kurdu<n><sg><dat><top>$^./.<sent>$<br />
<br />
<!--<br />
-----------------------------------<br />
*Improved pronomial analysis:<br />
**Before: "<ngaju>"<br />
***"ngaju" prn pers p3 sg<br />
***"ngaju" prn pers p3 du<br />
***"ngaju" prn pers p2 sg<br />
***"ngaju" prn pers p2 du<br />
***"ngaju" prn pers p2 pl<br />
***"ngaju" prn pers p1 sg<br />
***"ngaju" prn pers p1 du excl<br />
***"ngaju" prn pers p1 du incl<br />
***"ngaju" prn pers p1 pl excl<br />
***"ngaju" prn pers p1 pl incl<br />
*After: <br />
**"<ngaju>"<br />
<br />
<br />
*Clitics<br />
** (wbp) -yijala cl. also, too, likewise.<br />
** (wbp) -nyayirni Variant: -nyarrirni. cl. very, really -> (gug) adverb: asy<br />
<br />
*Conjunction??<br />
**ngula cj. 1. that, which, who. Syn: kuja. 2.he, she, it, they, them. Referring back to something in previous context.<br />
<br />
*Cs?<br />
** -jangka cs. 1. from. 2. because. Syn: -ngurlu~-ngirli, -warnu.<br />
<br />
*Reflexive?<br />
**nyanungu pn.<br />
<br />
*ngana - who<br />
<br />
*yangka nm. aforementioned one, same one mentioned before.<br />
*Numbers<br />
*-juku cl. still, yet. Syn: -kirli.<br />
*-pinki ns. such-like, things pertaining to.<br />
*-kari ns. another, other.<br />
*nyurru av. already, short time ago.<br />
*-wiyi (first)<br />
*-wana ns. along, alongside, beside.<br />
--><br />
<br />
<!--<br />
====4 new disambiguation rules====<br />
*Number vs Noun (e.g. rdaka = hand or five), has the number marker -pala<br />
*Manu = cnjcoo vs v tv past?<br />
*Kuja = "ka" vaux rel subj3sg OR kuja = kuja cj. relative pronoun, what, which, that, who.<br />
**Kuja = "ka" vaux rel subj3sg OR kuja = kuja cj. relative pronoun, what, which, that, who.<br />
**Ngula = "ka" vaux rel subj3sg OR kuja = kuja cj. relative pronoun, what, which, that, who. Find by searching AUX:COMP in new dictionary<br />
--><br />
<br />
===wbp → gug===<br />
====6 new lexical selection rules====<br />
* (wbp) nyurltu-nyurltu → (gug) apañuãi "confusing, garbled"; apokytã "tangled"<br />
** Rule: select "apokytã" if "nyurltu-nyurltu" is followed by the Warlpiri word "marnilpa" (hair). In other cases, select "apañuãi" as the default.<br />
<br />
* (wbp) warlu → (gug) tini "hot"; aku "angry"<br />
** Rule: select "aku" if "warlu" is followed by the Warlpiri word "miparrpa" (face) or "yapa" (person). In other cases, select "tini" as the default.<br />
<br />
* (wbp) yajarni → (gug) henoi "fetch/invite (someone)"; kakuaa "grow (a plant)"<br />
** Rule: select "kakuaa" if "yajarni" is followed by the Warlpiri word "watiya" (plant). In other cases, select "henoi" as the default.<br />
<br />
* (wbp) paarr-paarr-jankami → (gug) hapy "to burn, singe (hair or fur)"; jahéi "to hurt"<br />
** Rule: select "hapy" if "paarr-paarr-jankami" is followed by the Warlpiri word "marnilpa" (hair). In other cases, select "jahéi" as the default.<br />
<br />
* (wbp) pajirni → (gug) su'u "to bite"; hupi "to pick/gather"<br />
** Rule: select "hupi" if "pajirni" is followed by the Warlpiri word "mangarri" (fruit). In other cases, select "su'u" as the default.<br />
<br />
* (wbp) pakarli → (gug) kuatia "headdress"; kuatia "paper"<br />
** Rule: select "kuatia" if "pakarli" is preceded by the Warlpiri word "yangkarni" (to wear). In other cases, select "kuatia" as the default.<br />
<br />
<br />
====4 new twol rules====<br />
*Nasal harmony on dative suffix -pe:<br />
**-pe now properly changes to -me when before a nasal vowel.<br />
**Ex: havõ (soap). Suffix is regularly -pe, but now becomes -me<br />
echo "havõme" | apertium -d . gug-morph<br />
^havõme/havõ<n><sg><dat>$^./.<sent>$<br />
<br />
*Nasal harmony on accusative suffix -ve:<br />
**-ve now properly changes to -me on accusative forms of pronouns<br />
**Ex: peẽ (second person plural pronoun). Suffix is regularly -ve, but now becomes -me<br />
echo "peẽme" | apertium -d . gug-morph<br />
^peẽme/peẽ<prn><pers><p2><sg><acc>$^./.<sent>$<br />
<br />
*Stress on accusative forms of pronouns:<br />
**The e at the end of pronouns should be stressed (é) when followed by the acc suffix -ve<br />
**Ex: che (first person singular pronoun). With suffix -ve, becomes chéve<br />
echo "chéve" | apertium -d . gug-morph<br />
^chéve/che<prn><pers><p1><sg><acc>$^./.<sent>$<br />
<br />
*Nasal harmony on reflexive prefix je- :<br />
**je- now properly changes to -ñe when before a nasal.<br />
**Ex: ñeñapytĩ (“to tie”). <br />
echo "oñeñapytĩ" | apertium -d . gug-morph<br />
^oñeñapytĩ/ñapytĩ<v><tv><ar><pres><p3><sp><nowiki><ref></nowiki>$^./.<sent>$<br />
<br />
<br />
===Tests results===<br />
<br />
====gug====<br />
*Precision and recall:<br />
**Results:<br />
precisionRecall ../corpus/ling073-gug-corpus/ling073-gug-corpus/gug.annotated.basic.txt ../bootstrap/ling073-gug/corpus.out.txt<br />
Totals: 89 tp, 87 fp, 0 tn, 19 fn<br />
Precision: 82.40741%<br />
Recall: 50.56818%<br />
<br />
*Coverage over large corpus:<br />
**Results:<br />
aq-covtest ~/Source/corpus/ling073-gug-corpus/ling073-gug-corpus/gug.corpus.large.txt gug.automorf.bin<br />
Number of tokenised words in the corpus: 522284<br />
Coverage: 28.96%<br />
<br />
*Number of stems in transducer: ~270 (didn't count by hand)<br />
<br />
====wbp====<br />
*Precision and recall:<br />
**Results:<br />
precisionRecall ../ling073-wbp-corpus/wbp.annotated.basic.txt corpus.out.txt <br />
Totals: 199 tp, 5 fp, 0 tn, 6 fn<br />
Precision: 97.54902%<br />
Recall: 97.07317%<br />
<br />
*Coverage over large corpus:<br />
**Results:<br />
aq-covtest ../ling073-wbp-corpus/wbp.corpus.large.txt wbp.automorf.bin<br />
Number of tokenised words in the corpus: 18427<br />
Coverage: 51.26%<br />
<br />
*Number of stems in transducer: ~250<br />
<br />
====wbp → gug====<br />
*WER and PER<br />
Number of words in reference: 76<br />
Number of words in test: 69<br />
Number of unknown words (marked with a star) in test: 45<br />
Percentage of unknown words: 65.22 %<br />
Word error rate (WER): 100.00 %<br />
Position-independent word error rate (PER): 94.74 %<br />
*Proportion of stems translate correctly in longer: 31/71 = 40.8%<br />
<br />
Unsure... Need wbp.longer.txt to test in longer and large<br />
<br />
====gug → wbp====<br />
*WER and PER over gug.longer.txt<br />
Number of words in reference: 72<br />
Number of words in test: 81<br />
Number of unknown words (marked with a star) in test: 44<br />
Percentage of unknown words: 54.32 %<br />
Word error rate (WER): 112.50.00 %<br />
Position-independent word error rate (PER): 108.33 %<br />
*Proportion of stems translate correctly in longer: (81-44 / 81) = 37/81 = 45.7%<br />
*Trimmed coverage and number of tokens in longer and large corpora<br />
aq-covtest ../ling073-gug-wbp-corpus/gug.longer.txt gug-wbp.automorf.bin<br />
**Number of tokenised words in the longer corpus: 97<br />
**Coverage: 50.52%<br />
<br />
<br />
aq-covtest ../ling073-gug-wbp-corpus/gug.corpus.large.txt gug-wbp.automorf.bin<br />
**Number of tokenised words in the large corpus: 501127<br />
**Coverage: 24.74%<br />
<br />
[[Category:Sp17_TranslationPairs]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5071
User:Tjones5/Final project
2017-05-05T06:08:53Z
<p>Tjones5: /* Presentation */</p>
<hr />
<div>==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5070
User:Tjones5/Final project
2017-05-05T06:07:06Z
<p>Tjones5: /* Current Evaluation */</p>
<hr />
<div>==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against hand-annotated randomly selected forms<br />
<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5069
User:Tjones5/Final project
2017-05-05T05:57:55Z
<p>Tjones5: </p>
<hr />
<div>==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri stories, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
*Next to implement: perhaps reduplication?<br />
echo "kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><sg><abs>$^./.<sent>$<br />
<br />
echo "kurdu-kurdu" | apertium -d . wbp-morph: ^kurdu/kurdu<n><pl><abs>$^./.<sent>$<br />
<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against a hand-annotated randomly selected forms<br />
<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RBMT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
<br />
<br />
<br />
<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5068
User:Tjones5/Final project
2017-05-05T05:45:49Z
<p>Tjones5: </p>
<hr />
<div>Placeholder page for final project<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri phrases, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
===Current Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against a hand-annotated randomly selected forms<br />
<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5066
User:Tjones5/Final project
2017-05-05T05:44:53Z
<p>Tjones5: /* Description of the problem */</p>
<hr />
<div>Placeholder page for final project<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. wolves<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-wolf + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "wolves"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri phrases, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
<br />
<br />
<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against a hand-annotated randomly selected forms<br />
<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5064
User:Tjones5/Final project
2017-05-05T05:44:03Z
<p>Tjones5: </p>
<hr />
<div>Placeholder page for final project<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. foxes<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-fox + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "foxes"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri phrases, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
*pirli (a noun with just 2 vowels) has different endings than warlkurru (which has more than two vowels)<br />
**however when suffixes such as -jarra are added, pirli gets the endings of nouns with >2 vowels: pirli-ngku → pirli-jarra-rlu <br />
<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
|"agent of a transitive verb" || {{tag|erg}} || style="background-color: teal;" | -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
<!-- *adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR --><br />
<br />
*Disambiguation is a challenge because of flexible word order<br />
echo "rdaka" | apertium -d . wbp-morph: ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
echo "rdaka-pala" | apertium -d . wbp-morph: ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
<br />
*Trying to write the following disambiguation rule: # If there is a noun to the right or left, I cannot be a numeral-like noun.<br />
*Examples of numeral-like nouns, sometimes distinguished by adding the number marker -pala.<br />
**mirdi = "four" or "elbow"<br />
**rdaka = "five" or "hand"<br />
**wirlki = "seven" or "boomerang"<br />
**narntirnki = "nine" or "curled"<br />
<br />
<br />
<br />
<br />
<br />
<br />
*Need to write scraper to test on a larger corpus (current corpus is ~13,000 words)<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against a hand-annotated randomly selected forms<br />
<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5061
User:Tjones5/Final project
2017-05-05T05:23:15Z
<p>Tjones5: </p>
<hr />
<div>Placeholder page for final project<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. foxes<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-fox + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "foxes"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
*Other work?<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri phrases, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
**Noun_Number 2v<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
!style="background-color: teal;"| ergative<br />
| "agent of a transitive verb" || {{tag|erg}} || -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
***wati-ngku<br />
***wati-jarra-rlu<br />
**adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR<br />
*Disambiguation is a challenge because of flexible word order<br />
**Numerals are one thing to deal with<br />
*Need to write scraper to test on a larger corpus<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against a hand-annotated randomly selected forms<br />
<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5060
User:Tjones5/Final project
2017-05-05T05:20:51Z
<p>Tjones5: /* Approaches */</p>
<hr />
<div>Placeholder page for final project<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. foxes<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-fox + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "foxes"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
*Other work?<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri phrases, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
<br />
*Pronouns (optional dash not included in table):<br />
{|class="wikitable sortable"<br />
! Meaning !! Word !! Form<br />
|-<br />
! I, me<br />
| ngaju(lu) || {{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngaju}}<br />{{morphTest|ngaju{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|sg}}|ngajulu}}<br />
|-<br />
! you<br />
| nyuntu(lu) || {{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntu}} <br />{{morphTest|nyuntu{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|sg}}|nyuntulu}}<br />
|-<br />
! he/she/it; to him/her/it<br />
| nyanungu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|sg}}|nyanungu}}<br />
|-<br />
! we (you & me)<br />
| ngali(jarra) || {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngali}} <br /> {{morphTest|ngali{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|incl}}|ngalijarra}}<br />
|-<br />
! we (him/her/it & me)<br />
| ngajarra || {{morphTest|ngajarra{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|du}}{{tag|excl}}|ngajarra}}<br />
|-<br />
! we (you & me & other(s))<br />
| ngalipa || {{morphTest|ngalipa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|incl}}|ngalipa}}<br />
|-<br />
! we (them & me)<br />
| nganimpa || {{morphTest|nganimpa{{tag|prn}}{{tag|pers}}{{tag|p1}}{{tag|pl}}{{tag|excl}}|nganimpa}}<br />
|-<br />
! you (both/two)<br />
| nyumpala || {{morphTest|nyumpala{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|du}}|nyumpala}}<br />
|-<br />
! you (more than 2)<br />
| nyurrarla || {{morphTest|nyurrarla{{tag|prn}}{{tag|pers}}{{tag|p2}}{{tag|pl}}|nyurrarla}}<br />
|-<br />
! they/them (both/two)<br />
| nyanungu-jarra || {{morphTest|nyanungu-jarra{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|du}}|nyanungu-jarra}}<br />
|-<br />
! they/them (more than 2)<br />
| nyanungu-rra/nyanungu-patu || {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-rra}} <br /> {{morphTest|nyanungu{{tag|prn}}{{tag|pers}}{{tag|p3}}{{tag|pl}}|nyanungu-patu}}<br />
|-<br />
|}<br />
<br />
**Noun_Number 2v<br />
{|class="wikitable sortable"<br />
! case name !! ~meaning !! tag !! possible forms !! pirli "rock" (2 vowels) !! warlkurru (>2 vowels) "axe"<br />
|-<br />
! absolutive<br />
| subject of intransitive verbs and object of transitive verbs || {{tag|abs}} || — || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|abs}}|pirli}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|abs}}|warlkurru}}<br />
|-<br />
! dative<br />
| "to" || {{tag|dat}} || -ku || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|dat}}|pirli-ku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|dat}}|warlkurru-ku}}<br />
|-<br />
! ergative<br />
| "agent of a transitive verb" || {{tag|erg}} || -ngku, -rlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|erg}}|pirli-ngku}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|erg}}|warlkurru-rlu}}<br />
|-<br />
! allative<br />
| "onto" || {{tag|all}} || -kurra || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|all}}|pirli-kurra}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|all}}|warlkurru-kurra}}<br />
|-<br />
! comitative<br />
| "with" || {{tag|com}} || -ngkajinta, -rlajinta || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|com}}|pirli-ngkajinta}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|com}}|warlkurru-rlajinta}}<br />
|-<br />
! elative<br />
| "out of" || {{tag|ela}} || -ngurlu || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|ela}}|pirli-ngurlu}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|ela}}|warlkurru-ngurlu}}<br />
|-<br />
! locative<br />
| "at, in on" || {{tag|loc}} || -ngka, -rla || {{morphTest|pirli{{tag|n}}{{tag|sg}}{{tag|loc}}|pirli-ngka}} || {{morphTest|warlkurru{{tag|n}}{{tag|sg}}{{tag|loc}}|warlkurru-rla}}<br />
|}<br />
<br />
<br />
***wati-ngku<br />
***wati-jarra-rlu<br />
**adding (optional) dashes between everything; selecting only one analysis by using ! Dir/LR<br />
*Disambiguation is a challenge because of flexible word order<br />
**Numerals are one thing to deal with<br />
*Need to write scraper to test on a larger corpus<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against a hand-annotated randomly selected forms<br />
<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=Guarani_and_Warlpiri&diff=5059
Guarani and Warlpiri
2017-05-05T05:00:44Z
<p>Tjones5: /* wbp */</p>
<hr />
<div>Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Guarani Guarani] and [https://wikis.swarthmore.edu/ling073/Warlpiri Warlpiri]<br />
<br />
==Corpora==<br />
*[https://www.bible.com/bible/1355/rev.1 Warlpiri Bible]<br />
<br />
*[https://www.bible.com/bible/66/rev.1 Guarani Bible]<br />
<br />
*[https://www.bible.com/bible/111/rev.1 English Bible]<br />
<br />
*[http://unicode.org/udhr/d/udhr_gug.txt Universal Declaration of Human Rights in Guarani]<br />
<br />
==Developed Resources==<br />
*[https://github.swarthmore.edu/tjones5/ling073-gug-wbp-corpus Corpus Repository]<br />
*[https://wikis.swarthmore.edu/ling073/Guarani_and_Warlpiri/Contrastive_Grammar Contrastive Grammar]<br />
*[https://github.swarthmore.edu/tjones5/ling073-gug-wbp Guarani to Warlpiri analyzer repository]<br />
**This repo will hold the fully functioning rule-based machine translation system.<br />
*[https://github.swarthmore.edu/cpillsb1/ling073-wbp-gug Warlpiri to Guarani analyzer repository]<br />
**This repo will not be developed further.<br />
*[https://wikis.swarthmore.edu/ling073/Guarani_and_Warlpiri/Lexical_selection Lexical selection Wiki page]<br />
*[https://wikis.swarthmore.edu/ling073/Guarani_and_Warlpiri/Structural_transfer Structural transfer Wiki page]<br />
<br />
==Lexical Selection==<br />
* See Lexical Selection page linked above for more detail<br />
<br />
* (gug) ahy'o → (wbp) ngukarna "throat"; lirra "voice"<br />
* (gug) ha'anga → (wbp) japirdimi "threaten"; walaparrirni "imitate"<br />
* (gug) ajúra → (wbp) japirdimi "collar"; waninja "neck"<br />
* (wbp) rdaka → (gug) avatisoka "hand"; po "five"<br />
* (wbp) kirdirrpa → (gug) itakua "cave"; ka'irãi "jail"<br />
<br />
==Initial Evaluation==<br />
===wbp → gug evaluation===<br />
====Results when running on tests====<br />
*Total number of words: 14<br />
*Total number of unknown words: 4<br />
*Percentage of unknown words: 28.57%<br />
*Percentage of stems translated correctly: 71.43%<br />
<br />
When removing unknown words:<br />
*WER = 78.57%<br />
*PER = 78.57%<br />
When unknown words are not removed:<br />
*WER = 100%<br />
*PER = 100%<br />
<br />
====Results when running on sentences (corpus) ====<br />
*Total number of words: 58<br />
*Total number of unknown words: 68<br />
*Percentage of unknown words: 84.45%<br />
*Percentage of stems translate correctly: 15.52%<br />
<br />
When removing unknown words:<br />
*WER = 22.41%<br />
*PER = 22.41%<br />
When unknown words are not removed:<br />
*WER = 100%<br />
*PER = 100%<br />
<br />
<br />
===gug → wbp evaluation===<br />
====Results when running on tests====<br />
*Total number of words: 13<br />
*Total number of unknown words: 0<br />
*Percentage of unknown words: 0%<br />
*Percentage of stems translated correctly: 100% <br />
<br />
When removing unknown words:<br />
*WER = 92.86%<br />
*PER = 92.86%<br />
When unknown words are not removed:<br />
*WER = 92.86%<br />
*PER = 92.86%<br />
<br />
====Results when running on sentences (corpus) ====<br />
*Total number of words: 66<br />
*Total number of unknown words: 60<br />
*Percentage of unknown words: 90.9%<br />
*Percentage of stems translated correctly: 9.09%<br />
<br />
When removing unknown words:<br />
*WER = 0.0%<br />
*PER = 0.0%<br />
When unknown words are not removed:<br />
*WER = 100%<br />
*PER = 100%<br />
<br />
==Final Evaluation==<br />
<br />
===gug → wbp and wbp → gug===<br />
====100 more stems====<br />
Added 100 more stems (mostly nouns, verbs, and adjectives) to the bilingual and monolingual dictionaries.<br />
<br />
===gug → wbp===<br />
<br />
====6 new lexical selection rules====<br />
* (gug) akã → (wbp) jurru "head"; payarrpa "hill"<br />
** Rule: Select "jurru" (Eng: head) as the translation of "akã" when it is preceded by "ha'e" (Eng: his/her). Otherwise, select "payarrpa" as the default.<br />
<br />
* (gug) asu → (wbp) jampu "left"; jampu-karra "left-handed"<br />
** Rule: Select "jampu-karra" (Eng: left-handed) as the translation of "asu" when it is followed by person words such as "ava" (Eng: person) or "kuña" (Eng: woman) or "mitã" (Eng: child). Otherwise, select "jampu" (Eng: left) as the default.<br />
<br />
* (gug) henyhẽ → (wbp) jaka-ngalya "full"; ngayarrka "pregnant"<br />
** Rule: Select "ngayarrka" (Eng: pregnant) as the translation of "henyhẽ" when it is followed/preceded by "kuña" (Eng: woman) or other words describing female people. Otherwise, select "jaka-ngalya" (Eng: full) as the default.<br />
<br />
* (gug) mongúi → (wbp) kijirni "to throw"; pata-karrimi "to fall"<br />
** Rule: Select "kijirni" (Eng: to throw) as the translation of "mopẽ" when it is followed by "ita" (Eng: rock) or "apu'a" (Eng: ball). Otherwise, select "pata-karrimi" as the default.<br />
<br />
* (gug) moha'anga → (wbp) pantirni "to draw"; jarntirni "to carve"<br />
** Rule: Select "pantirni" (Eng: to draw) as the translation of "moha'anga" when it is followed by "ha'anga" (Eng: drawing). Otherwise, select "jarntirni" as the default.<br />
<br />
* (gug) mbogyapy → (wbp) pipa-kurra-mani "to write"; nyinanja-wantimi "to sit down"<br />
** Rule: Select "pipa-kurra-mani" (Eng: to write) as the translation of "mbogyapy" when it is followed by "aranduka" (Eng: book). Otherwise, select "nyinanja-wantimi" as the default.<br />
<br />
====6 additions to the morphology====<br />
*Added numeral analysis:<br />
**Several number words in Warlpiri also have meanings as nouns (e.g. rdaka can mean "hand" or "five")<br />
**the -pala ending distinguishes them as number words, but it is not always necessary<br />
**"rdaka-pala": ^rdaka-pala/rdaka<num>$^./.<sent>$<br />
**"rdaka": ^rdaka/rdaka<num>/rdaka<n><sg><abs>$^./.<sent>$<br />
<br />
*Added adverb analysis:<br />
**"nyurruwiyi: ^nyurruwiyi/nyurruwiyi<adv>$^./.<sent>$<br />
<br />
*Added elative/source case for verb auxiliaries :<br />
**"kajangka": ^kajangka/ka<vaux><src><subj3sg>$^./.<sent>$<br />
<br />
*Added inceptive aspect for verbs:<br />
**"pakarnunjunu": ^pakarnunjunu/pakarni<v><tv><past><inc>$^./.<sent>$<br />
<br />
*Added possessive marker "nyanu" for nouns:<br />
**"kaja-nyanu-rlu": ^kaja-nyanu-rlu/kaja<n><pl><poss><erg>$^./.<sent>$<br />
<br />
*Added the topicalizing particle grammatical marker for nouns. -ja is used to mark known information or to indicate the topic, although sometimes it has no function.<br />
**echo "kurdukuju": ^kurdukuju/kurdu<n><sg><dat><top>$^./.<sent>$<br />
<br />
<!--<br />
-----------------------------------<br />
*Improved pronomial analysis:<br />
**Before: "<ngaju>"<br />
***"ngaju" prn pers p3 sg<br />
***"ngaju" prn pers p3 du<br />
***"ngaju" prn pers p2 sg<br />
***"ngaju" prn pers p2 du<br />
***"ngaju" prn pers p2 pl<br />
***"ngaju" prn pers p1 sg<br />
***"ngaju" prn pers p1 du excl<br />
***"ngaju" prn pers p1 du incl<br />
***"ngaju" prn pers p1 pl excl<br />
***"ngaju" prn pers p1 pl incl<br />
*After: <br />
**"<ngaju>"<br />
<br />
<br />
*Clitics<br />
** (wbp) -yijala cl. also, too, likewise.<br />
** (wbp) -nyayirni Variant: -nyarrirni. cl. very, really -> (gug) adverb: asy<br />
<br />
*Conjunction??<br />
**ngula cj. 1. that, which, who. Syn: kuja. 2.he, she, it, they, them. Referring back to something in previous context.<br />
<br />
*Cs?<br />
** -jangka cs. 1. from. 2. because. Syn: -ngurlu~-ngirli, -warnu.<br />
<br />
*Reflexive?<br />
**nyanungu pn.<br />
<br />
*ngana - who<br />
<br />
*yangka nm. aforementioned one, same one mentioned before.<br />
*Numbers<br />
*-juku cl. still, yet. Syn: -kirli.<br />
*-pinki ns. such-like, things pertaining to.<br />
*-kari ns. another, other.<br />
*nyurru av. already, short time ago.<br />
*-wiyi (first)<br />
*-wana ns. along, alongside, beside.<br />
--><br />
<br />
<!--<br />
====4 new disambiguation rules====<br />
*Number vs Noun (e.g. rdaka = hand or five), has the number marker -pala<br />
*Manu = cnjcoo vs v tv past?<br />
*Kuja = "ka" vaux rel subj3sg OR kuja = kuja cj. relative pronoun, what, which, that, who.<br />
**Kuja = "ka" vaux rel subj3sg OR kuja = kuja cj. relative pronoun, what, which, that, who.<br />
**Ngula = "ka" vaux rel subj3sg OR kuja = kuja cj. relative pronoun, what, which, that, who. Find by searching AUX:COMP in new dictionary<br />
--><br />
<br />
===wbp → gug===<br />
====6 new lexical selection rules====<br />
* (wbp) nyurltu-nyurltu → (gug) apañuãi "confusing, garbled"; apokytã "tangled"<br />
** Rule: select "apokytã" if "nyurltu-nyurltu" is followed by the Warlpiri word "marnilpa" (hair). In other cases, select "apañuãi" as the default.<br />
<br />
* (wbp) warlu → (gug) tini "hot"; aku "angry"<br />
** Rule: select "aku" if "warlu" is followed by the Warlpiri word "miparrpa" (face) or "yapa" (person). In other cases, select "tini" as the default.<br />
<br />
* (wbp) yajarni → (gug) henoi "fetch/invite (someone)"; kakuaa "grow (a plant)"<br />
** Rule: select "kakuaa" if "yajarni" is followed by the Warlpiri word "watiya" (plant). In other cases, select "henoi" as the default.<br />
<br />
* (wbp) paarr-paarr-jankami → (gug) hapy "to burn, singe (hair or fur)"; jahéi "to hurt"<br />
** Rule: select "hapy" if "paarr-paarr-jankami" is followed by the Warlpiri word "marnilpa" (hair). In other cases, select "jahéi" as the default.<br />
<br />
* (wbp) pajirni → (gug) su'u "to bite"; hupi "to pick/gather"<br />
** Rule: select "hupi" if "pajirni" is followed by the Warlpiri word "mangarri" (fruit). In other cases, select "su'u" as the default.<br />
<br />
* (wbp) pakarli → (gug) kuatia "headdress"; kuatia "paper"<br />
** Rule: select "kuatia" if "pakarli" is preceded by the Warlpiri word "yangkarni" (to wear). In other cases, select "kuatia" as the default.<br />
<br />
<br />
====4 new twol rules====<br />
*Nasal harmony on dative suffix -pe:<br />
**-pe now properly changes to -me when before a nasal vowel.<br />
**Ex: havõ (soap). Suffix is regularly -pe, but now becomes -me<br />
echo "havõme" | apertium -d . gug-morph<br />
^havõme/havõ<n><sg><dat>$^./.<sent>$<br />
<br />
*Nasal harmony on accusative suffix -ve:<br />
**-ve now properly changes to -me on accusative forms of pronouns<br />
**Ex: peẽ (second person plural pronoun). Suffix is regularly -ve, but now becomes -me<br />
echo "peẽme" | apertium -d . gug-morph<br />
^peẽme/peẽ<prn><pers><p2><sg><acc>$^./.<sent>$<br />
<br />
*Stress on accusative forms of pronouns:<br />
**The e at the end of pronouns should be stressed (é) when followed by the acc suffix -ve<br />
**Ex: che (first person singular pronoun). With suffix -ve, becomes chéve<br />
echo "chéve" | apertium -d . gug-morph<br />
^chéve/che<prn><pers><p1><sg><acc>$^./.<sent>$<br />
<br />
*Nasal harmony on reflexive prefix je- :<br />
**je- now properly changes to -ñe when before a nasal.<br />
**Ex: ñeñapytĩ (“to tie”). <br />
echo "oñeñapytĩ" | apertium -d . gug-morph<br />
^oñeñapytĩ/ñapytĩ<v><tv><ar><pres><p3><sp><nowiki><ref></nowiki>$^./.<sent>$<br />
<br />
<br />
===Tests results===<br />
<br />
====gug====<br />
*Precision and recall:<br />
**Results:<br />
precisionRecall ../corpus/ling073-gug-corpus/ling073-gug-corpus/gug.annotated.basic.txt ../bootstrap/ling073-gug/corpus.out.txt<br />
Totals: 89 tp, 87 fp, 0 tn, 19 fn<br />
Precision: 82.40741%<br />
Recall: 50.56818%<br />
<br />
*Coverage over large corpus:<br />
**Results:<br />
aq-covtest ~/Source/corpus/ling073-gug-corpus/ling073-gug-corpus/gug.corpus.large.txt gug.automorf.bin<br />
Number of tokenised words in the corpus: 522284<br />
Coverage: 28.96%<br />
<br />
*Number of stems in transducer: ~270 (didn't count by hand)<br />
<br />
====wbp====<br />
*Precision and recall:<br />
**Results:<br />
precisionRecall ../ling073-wbp-corpus/wbp.annotated.basic.txt corpus.out.txt <br />
Totals: 199 tp, 5 fp, 0 tn, 6 fn<br />
Precision: 97.54902%<br />
Recall: 97.07317%<br />
<br />
*Coverage over large corpus:<br />
**Results:<br />
aq-covtest ../ling073-wbp-corpus/wbp.corpus.large.txt wbp.automorf.bin<br />
Number of tokenised words in the corpus: 18427<br />
Coverage: 51.26%<br />
<br />
*Number of stems in transducer: ~250<br />
<br />
====wbp → gug====<br />
*WER and PER<br />
Number of words in reference: 76<br />
Number of words in test: 69<br />
Number of unknown words (marked with a star) in test: 45<br />
Percentage of unknown words: 65.22 %<br />
Word error rate (WER): 100.00 %<br />
Position-independent word error rate (PER): 94.74 %<br />
*Proportion of stems translate correctly in longer: 31/71 = 40.8%<br />
<br />
Unsure... Need wbp.longer.txt to test in longer and large<br />
<br />
====gug → wbp====<br />
*WER and PER over gug.longer.txt<br />
Number of words in reference: 72<br />
Number of words in test: 81<br />
Number of unknown words (marked with a star) in test: 44<br />
Percentage of unknown words: 54.32 %<br />
Word error rate (WER): 112.50.00 %<br />
Position-independent word error rate (PER): 108.33 %<br />
*Proportion of stems translate correctly in longer: (81-44 / 81) = 37/81 = 45.7%<br />
*Trimmed coverage and number of tokens in longer and large corpora<br />
aq-covtest ../ling073-gug-wbp-corpus/gug.longer.txt gug-wbp.automorf.bin<br />
**Number of tokenised words in the longer corpus: 97<br />
**Coverage: 50.52%<br />
<br />
<br />
**Number of tokenised words in the LARGE corpus: ?<br />
**Coverage: ?<br />
<br />
[[Category:Sp17_TranslationPairs]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5058
User:Tjones5/Final project
2017-05-05T04:54:39Z
<p>Tjones5: /* Presentation */</p>
<hr />
<div>Placeholder page for final project<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
<br />
==Presentation==<br />
<!--What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem.--> <br />
===Description of the problem===<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. foxes<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-fox + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "foxes"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
<!--* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing.--><br />
*Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
*Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
*Other work?<br />
<br />
===Benefits of this project===<br />
<!--* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?--><br />
*Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri phrases, or give alternate translations<br />
**All of these applications might also be helpful in starting an analyzer for other indigenous languages that have the same orthography, are highly agglutinative, and have other grammatical similarities <br />
<br />
===Approaches===<br />
<!--* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.--><br />
<br />
*Mostly have worked within lexc and twol files<br />
**Examples??<br />
*Disambiguation is a challenge because of flexible word order<br />
**Numerals are one thing to deal with<br />
*Need to build a scraper to test on a larger corpus<br />
<br />
===Evaluation===<br />
<!--* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P --><br />
* Coverage over a large corpus <br />
* Precision and recall against a hand-annotated randomly selected forms<br />
<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5
http://wikis.swarthmore.edu/ling073/index.php?title=User:Tjones5/Final_project&diff=5056
User:Tjones5/Final project
2017-05-05T04:50:15Z
<p>Tjones5: **</p>
<hr />
<div>Placeholder page for final project<br />
<br />
==Notes==<br />
<br />
*To implement:<br />
**Reduplication<br />
<br />
*Get a big corpus:<br />
**Build a scraper? to get corpus up to 25K<br />
**Use regular expressions to clean up scraped (or copied/pasted) text<br />
<br />
*For evaluation:<br />
**Go back and look at coverage<br />
<br />
*Other ling to do: UD pipeline<br />
<br />
==Presentation==<br />
What will be expected for this assignment is a '''10 minute''' presentation in which you share the following with your classmates:<br />
* A '''description of the problem''' addressed by your project. For some projects this may require thinking about it in a somewhat new light. Whether or not the problem is something we've dealt with in class, provide a brief but complete introduction to the problem. <br />
===Description of the problem===<br />
* What is a transducer? What does it do?<br />
* Word structure is analyzed by composition of morphemes (the smallest units with grammatical meaning)<br />
* A morphological transducer connects forms with analyses. <br />
* Surface level -- represents the concatenation of letters which make up the actual spelling of the word, e.g. foxes<br />
* Lexical level -- represents the concatenation of morphemes making up a word, e.g. NOUN-fox + PLURAL-es <br />
* Morphological parsing -- the problem of recognizing that a word breaks down into component morphemes, as above with "foxes"<br />
<br />
[[Image:Simple transducer.png|1000px]]<br />
<br />
===Previous work===<br />
* Any '''previous work''' you've found that also addresses this problem or similar problems, and how it relates to what you're doing. <br />
**Many languages have working [http://wiki.apertium.org/wiki/Languages Apertium transducers]: <br />
**Warlpiri doesn't have a working transducer -- in fact, no languages from the Pama–Nyungan family have a transducer<br />
<br />
===Benefits of this project===<br />
* Your thoughts on who might '''benefit from your project''' and in what way. E.g., might a language community be able to find a use for what you're doing, or maybe [computational] linguists working on a language or issue?<br />
**Why is a transducer useful to a community of Warlpiri speakers?<br />
**Search engine<br />
**Spell checker<br />
**Machine Translation, especially to/from English (the primary language of most Warlpiri speakers). Many are trying to revive the use of Warlpiri, so there are many potentially useful applications.<br />
***A eng → wbp MT system could be used to look up how to say/write a certain word or phrase<br />
***A wbp → eng MT system could be used to analyze (and then translate) untranslated Warlpiri phrases, or give alternate translations<br />
<br />
===Approaches===<br />
* '''How you're approaching the solution''' to the overall problem, including how you're implementing the solution. You can talk here about smaller individual issues that have arisen as well.<br />
<br />
*Mostly have worked within lexc and twol files<br />
**Examples??<br />
*Disambiguation is a challenge because of flexible word order<br />
**Numerals are one thing to deal with<br />
*Need to build a scraper to test on a larger corpus<br />
<br />
===Evaluation===<br />
* '''How you're evaluating the effectiveness''' of the solution, and some preliminary look at results of the evaluation. We'll understand that the project isn't yet complete, so the evaluation may show that the project is entirely ineffective and useless :-P<br />
* Coverage over a large corpus <br />
* Precision and recall against a hand-annotated randomly selected forms<br />
<br />
<br />
* Coverage over large corpus<br />
**Number of tokenised words in the corpus: 18427<br />
**Coverage: 51.26%<br />
* Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
**Precision: 97.54902%<br />
**Recall: 97.07317%<br />
* Number of stems: ~250<br />
<!--<br />
*Pre Final Project<br />
** Coverage over large corpus<br />
***Number of tokenised words in the corpus: 18427<br />
***Coverage: 51.26%<br />
** Precision/Recall on wbp.annotated.basic.txt from Polished RMBT:<br />
***Precision: 97.54902%<br />
***Recall: 97.07317%<br />
** Number of stems: ~250<br />
--><br />
<br />
<br />
<br />
{|class="wikitable"<br />
|-<br />
! status !! description !! stems !! coverage<br />
|-<br />
!style="background-color: red;"| '''prototype'''<br />
| language module that has not received heavy development<br />
| <1,000<br />
| <60% <br />
|-<br />
!style="background-color: orange;"| '''development'''<br />
| language module under development <br />
| ≥1,000 <br />
| ≥60% <br />
|-<br />
!style="background-color: yellow;"| '''working'''<br />
| language module with near-production-quality performance <br />
| ≥8,000<br />
| ≥80% <br />
|-<br />
!style="background-color: green;"| '''production''' <br />
| language module used in a released pair<br />
| ≥10,000<br />
| ≥90%<br />
|}<br />
<br />
*Table Source: http://wiki.apertium.org/wiki/Languages<br />
*Background on analyzers: http://www.phil.uu.nl/tst/2012/Slides/SLP_Lecture2.pdf<br />
<br />
[[Category:sp17_FinalProjects]]</div>
Tjones5