Khasi and Wôpanâak
From LING073
Revision as of 01:42, 19 April 2017 by Jmalin1 (talk | contribs) (→Improvements to Wôpanâak Transducer)
Resources for machine translation between Khasi and Wôpanâak
Final Evaluation
Improvements to Wôpanâak Transducer
- Updates to lexc
- Removed {u} from first- and second- person suffixes, causing words such as "nutunantam" to analyze correctly. Additionally, {u} was added to third-person suffixes, allowing words like "nupuwak" to analyze correctly.
- Fixed an overgeneration issue with intransitive verbs. Inanimate intransitive verbs can no longer receive first or second person morphology
- Reworked transitive verbs with inanimate objects, splitting the inflection lexicon into two, one for absolute forms and one for objective forms
- Added the "TI2" paradigm of verb stems to the transducer by sending them to a different lexicon before inflection. This has allowed the analyzer to handle words such as "ahtaw" ("have/own")
- Added the verbalizing suffix -w to the noun pathway, allowing nouns with it to analyze as verbs (e.g. "sôtyumâw" - "he/she is sachem")
- Added a number of words which should have been receiving the -m possession suffix to the correct category. Also updated the -m suffix to change to -um when following a consonant. This is hopefully correct, as it is consistent with phonology elsewhere in the language, but may require a further update as it is largely conjecture on my part.
- Updates to twol
- Updated rules for deletion of {m} and {w} to work after vowel archiphonemes as well as vowels.
- Consolidated multiple rules sets that did the same thing to the same archiphoneme into single rules, clearing up rule conflicts in the compiler
- Added a rule that deletes w from prefix of 3p possessed dependent nouns beginning with ȣ. Now the transducer correctly analyzes and generates words such as "ȣshah" instead of "wȣshah" for "his father".
- Updates to twoc
- Added tags and rules to prevent overgeneration of forms associated with the -w verbalizing suffix: nouns will not receive an alternate 3p possession reading without the appropriate prefix, and the verb form will never receive the 3p possession prefix incorrectly.
Improvements to Khasi Transducer
- Updates to lexc
- Completely restructured file and paths the transducer takes in order to make code much more readable, and fixed prefixation (I essentially rewrote most of the lexc file)
- Added prefixed nominalization with 'kaba' (verbs and adverbs) and 'nym' and 'nong' (verbs)
- Added progressives (prefixed) with 'nang' and 'iai'
- Added future aspects with 'la'
- Added past aspects with 'myn'
- Added causality on verbs with 'pyn'
- Added personal pronoun emphasis ('ma')
- Added ungendered nouns and figured out how to work with them
- Updates to disambiguator
- Fixed 4th disambiguator rule to work with verbs (If first word is a pronoun or an article, and next word is a noun/adj/verb, select article for the first word.)
- All rules under #----------------------------- in rlx file were written in the past but not as part of the disambiguator assignment; I found words that needed disambiguation so I fixed them as I found problems. There are three rules present that were not necessary for the first assignment.