Neo-Aramaic/Transducer

From LING073
Revision as of 16:49, 13 March 2018 by Cdalton2 (talk | contribs)

Jump to: navigation, search

The code for the transducer can be found in this Github repository.

Evaluation

Number of tokenised words in the corpus: 2556

Coverage: 9.66%

Top unknown words in the corpus:

114 ܐ

86 ܕ

66 ܠܹܗ

62 ܡ

54 ܒ

46 ܢ

44 ܘ

44 ܠ

39 ܡܘܼܠܸܕ

36 ܝ

31 ܪ

30 ܡܢ

23 ܵܐ

21 ܗ

20 ܹܐ

18 ܠܐ

17 ܫ

14 ܵ

13 ܚ

12 ܡܕܝܢܬܐ

Notes

Sadly, only 38 of our 96 tests pass at the moment. We changed many of our lemmas in the .yaml file to match what our transducer produces. We still need to change a few of the lemmas and remove a lot of tags that we don't currently have implemented in our transducer. Also, there are a few roots that we don't have in our transducer yet. As we do these things, we expect more tests to start passing. At some point, we also hope to be able to modify the transducer to re-implement some of the tags we removed and to use the canonical roots of words as the lemmas, a process that will require introducing more archiphonemes.