Eastern Burushaski/Transducer
From LING073
Contents
Code
Repo on github [1]
Generator Evaluation
initial evaluation of morphological generation
- 204/232 (88%) analysis forms passing
- 176/844 (21%) coverage of corpus
- 102/141 (72%) initial generation
- 102/116 (88%) generation forms passing
- one twol rule
- coverage has not changed
Analyzer Evaluation
Passes 88 tests 13% corpus coverage, with 844 tokenised words.
Top Unknown Words
Word | Occurrences | Gloss | Tags |
---|---|---|---|
ke | 47 | "and" | ke<conj> ↔ ke |
ine | 19 | "that" (human) | ine<prn><dem><dst><sg><mf> ↔ ine |
kaa | 17 | "with" | kaa<conj> ↔ kaa |
daa | 16 | "and" | daa<conj> ↔ daa |
it̪e | 14 | "that" (class y) | ine<prn><dem><dst><sg><cly> ↔ it̪e |
inar | 12 | <unk> ↔ inar | |
e | 10 | "me" (abs) | a<prn><p1><sg><abs> ↔ e |
nuse | 9 | <unk> ↔ nuse | |
in | 8 | "he/she" | in<pron><pers><dst><sg><mf> ↔ in |
u | 8 | <unk> ↔ u | |
besan | 7 | <unk> ↔ besan | |
yuu | 7 | <unk> ↔ yuu | |
a | 7 | <unk> ↔ a | |
t̪ačume | 7 | <unk> ↔ t̪ačume | |
aa | 6 | "me" (erg/gen) | a<prn><p1><sg><erg> ↔ aa |
ečam | 6 | <unk> ↔ ečam | |
but | 6 | <unk> ↔ but | |
senasar | 6 | <unk> ↔ senasar | |
bas | 6 | <unk> ↔ bas | |
ise | 6 | "that" (class x) | ine<prn><dem><dst><sg><clx> ↔ ise |
After addition of 3 words from commonwords.yaml, coverage is 21%
Lexccounter: Our corpus contains # stems.
Notes
Tests that currently do not work have not been implemented yet. Implemented tests work, but may need further refinement (archiphonemes added, etc).