Hwang11/Final project

From LING073
Revision as of 14:45, 12 May 2018 by Hwang11 (talk | contribs)

Jump to: navigation, search


This final project is an extension of the last assignment on universal dependencies. I continue working on the language, FIjian, I chose at the beginning. For this project, I plan to annotate 232 sentences in a text drawn from the main grammar book A Grammar of Boumaa Fijian by R.M.W. Dixon. In this process, I also expect to enrich my morphological transduce as encountering more sentences and new words. Then after annotating the sentences, I will also train a parser and evaluate the accuracy of the parser on both the data used for training and other data. Through annotating a larger corpus, I hope to develop a more accurate parser than the one for my last assignment and also to explore more about the dependency syntax in Fijian.


The current evaluation is on the transducer with 260 more stems added and a model trained by a .conllu file containing 90 annotated sentences from the text in the grammar book. The "other data" used in the test right now is the fij.annotated.ud.conllu annotated in the previous assignment.

  • Current coverage over a large corpus: 75.26%
-This rate is expected because though a lot more stems are added into the transducer, they are almost all in Boumaa Fijian, a dialect overlapping 80% with the Standard Fijian, which the large corpus (Bible New Testament) is in.
  • Accuracy for the trained UD-relations parser:
fij.withmorph.udpipe fij.nomorph.udpipe Number of forms Number of sentences
Training data UAS: 96.83%, LAS: 94.96% UAS: 86.02%, LAS: 82.65% 1481 90
Other data UAS: 73.05%, LAS: 59.53% UAS: 67.02%, LAS: 57.80% 282 30

The relatively low accuracy for the models on the other data is expected, since that corpus also includes several sentences in Standard Fijian.

UD Guidelines

This section lists several frequently used dependencies in Fijian, including a description and one or two examples for each of them.


In Fijian, not only verb, but any Part of Speech except time words can be a predicate head, and thus can possibly be a root.

  • Example: number as a root
alt text

nsubj, expl, csubj




aux, advmod





Issues and Future Work

  • The current annotations are mainly on Boumaa Fijian. Although it is very similar to Standard Fijian in syntax, it is still necessary and worthwhile to annotate more sentences in Standard Fijian.
  • The grammar book I am using does not distinguish adverbs and aspect markers very well and put them all into the "modifier" category, which is confusing sometimes.

License and Copyright

The annotation is licensed under an open source license GNU GLP v3. The original text is protected by copyright: All rights reserved by R.M.W. Dixion, the author of the book A grammar of Boumaa Fijian.

References and Resources

For more details: https://github.swarthmore.edu/hwang11/fij-final/tree/master.

Churchward, C. M. (1941). A new Fijian grammar. Suva (Fiji): Government of Fiji.

Dixon, R.M.W. (1988). A Grammar of Boumaa Fijian. Chicago: University of Chicago Press.

	"aa" past @aux #1->2
	"’eneii" vblex iv @root #2->0
	"sara" adv @advmod #3->2
	"’eneiiqee" vblex iv @advmod #4->2
	"o" art @det #5->6
	"Tabu" np al @nsubj #6->2
	":" sent @punct #7->2
	"e" prn pers p3 sg subj @expl #1->2
	"tu’u" vblex tv @root #2->0
	"sara" adv @advmod #3->2
	"o" art @det #4->5
	"RaavouvouniBoumaa" np al @nsubj #5->2
	":" sent @punct #6->2
	 sent @punct #1->2
	"qawa" vblex iv @root #2->0
	"i" pr @case #3->4
	"yai" prn dem @obl #4->2
	"a" art @det #5->7
	"taru" prn pers p1 inc du cl1_pos2 @nmod #6->7
	"bu’a" n @nsubj #7->2
	"!" sent @punct #8->2
	 sent @punct #9->2
	"rau" prn pers p3 du subj2 @nsubj #1->4
	"saa" asp @aux #2->4
	"mai" adv @advmod #3->4
	"to’a" vblex iv @root #4->0
	"yane" adv @advmod #5->4
	"i" pr @case #6->7
	"vanua" n @obl #7->4
	"mai" pr @case #8->9
	"Nagasau" np top @nmod #9->7
	"," cm @punct #10->13
	"rau" prn pers p3 du subj2 @nsubj #11->13
	"bu’a" n @compound #12->13
		"va’aqawa" vblex caus @conj #13->4
	"." sent @punct #14->13
	"tu’u" vblex tv @root #1->0
	"sara" adv @advmod #2->1
	"mai" adv @advmod #3->1
	"e" prn pers p3 sg subj @expl #4->6
	"a" art @compound #5->6
		"dua" num @csubj #6->1
	"marama" n @nsubj #7->6
	":" sent @punct #8->1