Difference between revisions of "User:Nfeldba1/Final project"
(Blanked the page) |
|||
Line 1: | Line 1: | ||
+ | ==The Project== | ||
+ | For my final project, I expanded my transducer until it reached 85% coverage, which involved adding words, disambiguations, and extra suffixes. | ||
+ | My code can be found here: https://github.com/nfeldbaum/Khasi_Transducer | ||
+ | |||
+ | ==The Results== | ||
+ | |||
+ | I managed to reach 85.15% coverage on my initial 50,000 word corpus! However, I ended up adding a significant number of English words that were present to reach this amount. A truer measure of corpus coverage is the 80.50% coverage I get when I delete all the words that aren't part of Khasi. | ||
+ | |||
+ | Instead of testing precision and recall against hand-annotated randomly selected forms, I decided to gather a further 25,000 words in order to test my transducer on a corpus it hadn't trained on. This new corpus can be found in the repository linked above under ling073-kha-corpus/kha.corpus.large.test.txt. On this test corpus, I achieved 83.39% coverage with English words included, and 80.50% coverage without English words - exactly the same as on my training corpus. |
Revision as of 13:20, 11 May 2017
The Project
For my final project, I expanded my transducer until it reached 85% coverage, which involved adding words, disambiguations, and extra suffixes.
My code can be found here: https://github.com/nfeldbaum/Khasi_Transducer
The Results
I managed to reach 85.15% coverage on my initial 50,000 word corpus! However, I ended up adding a significant number of English words that were present to reach this amount. A truer measure of corpus coverage is the 80.50% coverage I get when I delete all the words that aren't part of Khasi.
Instead of testing precision and recall against hand-annotated randomly selected forms, I decided to gather a further 25,000 words in order to test my transducer on a corpus it hadn't trained on. This new corpus can be found in the repository linked above under ling073-kha-corpus/kha.corpus.large.test.txt. On this test corpus, I achieved 83.39% coverage with English words included, and 80.50% coverage without English words - exactly the same as on my training corpus.