Tongan/Transducer

From LING073
Jump to: navigation, search

Notes

Non Passing Tests

  • Verb tenses
    • Past
    • Present
    • Future
    • Imperative
  • Pluralization
    • Plural Noun Markers
    • Irregular Noun Pluralization
  • Adjectivizer Suffix
  • Articles

These tests do not pass currently because I have not implemented their respective lexicon classes yet.

Evaluation

Initial aq-cov test

Coverage: 12.15%

Top unknown words in the corpus:

Word Word Count
e 76
he 70
naʻe 47
ʻae 47
pea 44
ʻi 43
Pea 33
ʻOtua 32
p 30
ʻe 25
ki 22
ʻoku 19
hono 19
ʻoe 17
ʻo 17
ngaahi 17
kuo 16
ai 14
meʻa 14


Analysis of certain unknown words

e<adj><def>: e

he<adj><sdef>: he

ngaahi<cl><pl>: ngaahi

pea<n><sg>: pea


Current Evaluation

After defining the noun pea<n>:pea (English gloss is either bear, pear, or and), coverage increased from 12.15% to 17.29%.

Current number of stems in my transducer is 75.

Current top unknown words

Now the top unknown words are as follows:

Word Word Count
e 76
he 70
naʻe 47
ʻae 47
pea 44
ʻi 43
Pea 33
ʻOtua 32
p 30
ʻe 25
ki 22
ʻoku 19
hono 19
ʻoe 17
ʻo 17
ngaahi 17
kuo 16
ai 14
vai 13
ʻaho 12


yaml Test Files

yaml tests tests passing (passing tests/total tests)
ton.yaml 75/146
commonwords.yaml 1/20

Generator Evaluation

initial evaluation of morphological generation

Tests tests passing (passing tests/total tests)
ton.yaml 75/146 (51%)
Coverage 17.72%
Generational Tests tests passing (passing tests/total tests)
ton.yaml 75/147 (51%)

final evaluation of morphological generation

Notes: Tongan's morphology does not feature much affix morphology. There are a limited number of suffixes and prefixes, and they only surface in certain words. Instead, Tongan relies heavily on "markers", such as tense markers (auxiliary verbs) and plural markers (plural classifiers), that appear before a given noun or verb. So most of the work I did for this morphological transducer involved direct mappings for all words. This also means there were no twol tests that I could do. The number of total tests have also changed since the first evaluation of the morphological generator, because a number of my tests were formatted incorrectly. They were formatted as entire phrases and sentences, which this software cannot analyze. So I removed these tests, and added additional possessive pronouns because there were many I had not included in the lexicon yet. I have 85% of my tests passing. I began working on a twol rule - when a noun is preceded by a definite article (he or e), stress falls on the ultimate syllable of that noun, or the last letter which is always a vowel. The issue I found with this rule, is that the accent marker (á) , is not always included across the orthographies of Tongan. In fact, after cross referencing this information in my corpus, I could not find an instance of this rule taking affect. This may be because only semi-definite articles were used, which also take the form he or e. Therefore the number of working twol rules as of now is 0, although I hope to finish it as well as prove that it is necessary to have this rule in place soon.

Tests tests passing (passing tests/total tests)
ton.yaml 122/142 (51%)
Coverage 11.75% *(a little unsure as to why the coverage is less than earlier)
Generational Tests tests passing (passing tests/total tests)
ton.yaml 75/147 (51%)