Difference between revisions of "Pcruz1/Final Project"

From LING073
Jump to: navigation, search
(Evaluation)
(Transducer Code)
 
Line 18: Line 18:
  
 
==Transducer Code==
 
==Transducer Code==
[https://github.com/pedroborgescruz/nheengatu-transdutor.git The final version of the transducer for Nheengatu.]
+
* [https://github.com/pedroborgescruz/nheengatu-transdutor.git The final version of the transducer for Nheengatu.]
 +
* [https://www.linkedin.com/posts/activity-6931026114911903744-yZhK?utm_source=linkedin_share&utm_medium=member_desktop_web Promotion of Transducer on LinkedIn + invite for collaboration]
  
 
[[Category:sp22_FinalProjects]]
 
[[Category:sp22_FinalProjects]]

Latest revision as of 19:47, 13 May 2022

Final Project

For the final project of this class, I chose to expand my transducer and include new spellrelax and TWOL rules to account for the language's multiple orthographies and verb inflections. I added 700+ new stems to both the transducer file and the yrl-por dictionary.

Evaluation

Test corpora: The Bible (Gospel of Matthew)

coverage: 18085/26015 (~69.5%)

remaining unknown forms: 7930

Generation tests passed: 75/108

Precision: ~91% Recall: ~28%

Future Work

Adding more spellrelax rules would ensure that all three dialects of the language are taken as valid inputs. To do so, it would be ideal to work with literate native speakers from the dialects of the language in order to fully describe the differences correctly. Additionally, modeling reduplication and other derivational grammar processes of the language would generate a more powerful transducer.

Transducer Code