For the final project of this class, I chose to expand my transducer and include new spellrelax and TWOL rules to account for the language's multiple orthographies and verb inflections. I added 700+ new stems to both the transducer file and the yrl-por dictionary.
Test corpora: The Bible (Gospel of Matthew)
coverage: 18085/26015 (~69.5%)
remaining unknown forms: 7930
Generation tests passed: 75/108
Precision: ~91% Recall: ~28%
Adding more spellrelax rules would ensure that all three dialects of the language are taken as valid inputs. To do so, it would be ideal to work with literate native speakers from the dialects of the language in order to fully describe the differences correctly. Additionally, modeling reduplication and other derivational grammar processes of the language would generate a more powerful transducer.