Neo-Aramaic/Transducer
The code for the transducer can be found in this Github repository.
Evaluation
Number of tokenised words in the corpus: 2556
Coverage: 9.66%
Top unknown words in the corpus:
114 ܐ
86 ܕ
66 ܠܹܗ
62 ܡ
54 ܒ
46 ܢ
44 ܘ
44 ܠ
39 ܡܘܼܠܸܕ
36 ܝ
31 ܪ
30 ܡܢ
23 ܵܐ
21 ܗ
20 ܹܐ
18 ܠܐ
17 ܫ
14 ܵ
13 ܚ
12 ܡܕܝܢܬܐ
Notes
Sadly, only 38 of our 96 tests pass at the moment. We changed many of our lemmas in the .yaml file to match what our transducer produces. We still need to change a few of the lemmas and remove a lot of tags that we don't currently have implemented in our transducer. Also, there are a few roots that we don't have in our transducer yet. As we do these things, we expect more tests to start passing. At some point, we also hope to be able to modify the transducer to re-implement some of the tags we removed and to use the canonical roots of words as the lemmas, a process that will require introducing more archiphonemes.