Eastern Burushaski/Transducer

From LING073
Jump to: navigation, search

Code

Repo on github [1]

Generator Evaluation

initial evaluation of morphological generation

  • 204/232 (88%) analysis forms passing
  • 176/844 (21%) coverage of corpus
  • 102/141 (72%) initial generation
  • 102/116 (88%) generation forms passing
    • one twol rule
    • coverage has not changed

Analyzer Evaluation

Passes 88 tests 13% corpus coverage, with 844 tokenised words.

Top Unknown Words

Word Occurrences Gloss Tags
ke 47 "and" ke<conj> ↔ ke
ine 19 "that" (human) ine<prn><dem><dst><sg><mf> ↔ ine
kaa 17 "with" kaa<conj> ↔ kaa
daa 16 "and" daa<conj> ↔ daa
it̪e 14 "that" (class y) ine<prn><dem><dst><sg><cly> ↔ it̪e
inar 12 <unk> ↔ inar
e 10 "me" (abs) a<prn><p1><sg><abs> ↔ e
nuse 9 <unk> ↔ nuse
in 8 "he/she" in<pron><pers><dst><sg><mf> ↔ in
u 8 <unk> ↔ u
besan 7 <unk> ↔ besan
yuu 7 <unk> ↔ yuu
a 7 <unk> ↔ a
t̪ačume 7 <unk> ↔ t̪ačume
aa 6 "me" (erg/gen) a<prn><p1><sg><erg> ↔ aa
ečam 6 <unk> ↔ ečam
but 6 <unk> ↔ but
senasar 6 <unk> ↔ senasar
bas 6 <unk> ↔ bas
ise 6 "that" (class x) ine<prn><dem><dst><sg><clx> ↔ ise

After addition of 3 words from commonwords.yaml, coverage is 21%

Lexccounter: Our corpus contains # stems.

Notes

Tests that currently do not work have not been implemented yet. Implemented tests work, but may need further refinement (archiphonemes added, etc).