Hwang11/Final project
Contents
Overview
This final project is an extension of the last assignment on universal dependencies. I continue working on the language, FIjian, I chose at the beginning. For this project, I plan to annotate 232 sentences in a text drawn from the main grammar book A Grammar of Boumaa Fijian by R.M.W. Dixon. In this process, I also expect to enrich my morphological transduce as encountering more sentences and new words. Then after annotating the sentences, I will also train a parser and evaluate the accuracy of the parser on both the data used for training and other data. Through annotating a larger corpus, I hope to develop a more accurate parser than the one for my last assignment and also to explore more about the dependency syntax in Fijian.
Evaluation
The current evaluation is on the transducer with 260 more stems added and a model trained by a .conllu
file containing 90 annotated sentences from the text in the grammar book. The "other data" used in the test right now is the fij.annotated.ud.conllu annotated in the previous assignment.
- Current coverage over a large corpus: 75.26%
- -This rate is expected because though a lot more stems are added into the transducer, they are almost all in Boumaa Fijian, a dialect overlapping 80% with the Standard Fijian, which the large corpus (Bible New Testament) is in.
- Accuracy for the trained UD-relations parser:
fij.withmorph.udpipe | fij.nomorph.udpipe | Number of forms | Number of sentences | |
---|---|---|---|---|
Training data | UAS: 96.83%, LAS: 94.96% | UAS: 86.02%, LAS: 82.65% | 1481 | 90 |
Other data | UAS: 73.05%, LAS: 59.53% | UAS: 67.02%, LAS: 57.80% | 282 | 30 |
The relatively low accuracy for the models on the other data is expected, since that corpus also includes several sentences in Standard Fijian.
UD Guidelines
This section lists several frequently used dependencies in Fijian, including a description and one or two examples for each of them.
root
In Fijian, not only verb, but any Part of Speech except time words can be a predicate head, and thus can possibly be a root.
- Example: number as a root
nsubj, expl, csubj
obl
xcomp
ccomp
aux, advmod
nmod
acl
advcl
obj
Issues and Future Work
- The current annotations are mainly on Boumaa Fijian. Although it is very similar to Standard Fijian in syntax, it is still necessary and worthwhile to annotate more sentences in Standard Fijian.
- The grammar book I am using does not distinguish adverbs and aspect markers very well and put them all into the "modifier" category, which is confusing sometimes.
License and Copyright
The annotation is licensed under an open source license GNU GLP v3. The original text is protected by copyright: All rights reserved by R.M.W. Dixion, the author of the book A grammar of Boumaa Fijian.
References and Resources
For more details: https://github.swarthmore.edu/hwang11/fij-final/tree/master.
Churchward, C. M. (1941). A new Fijian grammar. Suva (Fiji): Government of Fiji.
Dixon, R.M.W. (1988). A Grammar of Boumaa Fijian. Chicago: University of Chicago Press.
"<Aa>" "aa" past @aux #1->2 "<’eneii>" "’eneii" vblex iv @root #2->0 "<sara>" "sara" adv @advmod #3->2 "<’eneiiqee>" "’eneiiqee" vblex iv @advmod #4->2 "<o>" "o" art @det #5->6 "<Tabu>" "Tabu" np al @nsubj #6->2 "<:>" ":" sent @punct #7->2
"<E>" "e" prn pers p3 sg subj @expl #1->2 "<tu’una>" "tu’u" vblex tv @root #2->0 "<sara>" "sara" adv @advmod #3->2 "<o>" "o" art @det #4->5 "<RaavouvouniBoumaa>" "RaavouvouniBoumaa" np al @nsubj #5->2 "<:>" ":" sent @punct #6->2
"<">" sent @punct #1->2 "<Qawa>" "qawa" vblex iv @root #2->0 "<i>" "i" pr @case #3->4 "<yai>" "yai" prn dem @obl #4->2 "<a>" "a" art @det #5->7 "<ootaru>" "taru" prn pers p1 inc du cl1_pos2 @nmod #6->7 "<bu’a>" "bu’a" n @nsubj #7->2 "<!>" "!" sent @punct #8->2 "<">" sent @punct #9->2
"<Rau>" "rau" prn pers p3 du subj2 @nsubj #1->4 "<saa>" "saa" asp @aux #2->4 "<mai>" "mai" adv @advmod #3->4 "<to’a>" "to’a" vblex iv @root #4->0 "<yane>" "yane" adv @advmod #5->4 "<i>" "i" pr @case #6->7 "<vanua>" "vanua" n @obl #7->4 "<mai>" "mai" pr @case #8->9 "<Nagasau>" "Nagasau" np top @nmod #9->7 "<,>" "," cm @punct #10->13 "<rau>" "rau" prn pers p3 du subj2 @nsubj #11->13 "<va’aqawabu’a>" "bu’a" n @compound #12->13 "va’aqawa" vblex caus @conj #13->4 "<.>" "." sent @punct #14->13
"<Tu’una>" "tu’u" vblex tv @root #1->0 "<sara>" "sara" adv @advmod #2->1 "<mai>" "mai" adv @advmod #3->1 "<e>" "e" prn pers p3 sg subj @expl #4->6 "<daa>" "a" art @compound #5->6 "dua" num @csubj #6->1 "<marama>" "marama" n @nsubj #7->6 "<:>" ":" sent @punct #8->1