Hwang11/Final project
Contents
Overview
This final project is an extension of the last assignment on universal dependencies. I continue working on the language, FIjian, I chose at the beginning. For this project, I plan to annotate 232 sentences in a text drawn from the main grammar book A Grammar of Boumaa Fijian by R.M.W. Dixon (currently only 110 sentences are annotated). In this process, I also expect to enrich my morphological transduce as encountering more sentences and new words. Then after annotating the sentences, I will also train a parser and evaluate the accuracy of the parser on both the data used for training and other data. Through annotating a larger corpus, I hope to develop a more accurate parser than the one for my last assignment and also to explore more about the dependency syntax in Fijian.
Evaluation
The current evaluation is on the transducer with 260 more stems added and a model trained by a .conllu
file containing 90 annotated sentences from the text in the grammar book. The "other data" used in the test right now is the fij.annotated.ud.conllu annotated in the previous assignment.
- Current coverage over a large corpus: 75.26%
- -This rate is expected because though a lot more stems are added into the transducer, they are almost all in Boumaa Fijian, a dialect overlapping 80% with the Standard Fijian, which the large corpus (Bible New Testament) is in.
- Accuracy for the trained UD-relations parser:
fij.withmorph.udpipe | fij.nomorph.udpipe | Number of forms | Number of sentences | |
---|---|---|---|---|
Training data | UAS: 96.83%, LAS: 94.96% | UAS: 86.02%, LAS: 82.65% | 1481 | 90 |
Other data | UAS: 73.05%, LAS: 59.53% | UAS: 67.02%, LAS: 57.80% | 282 | 30 |
The relatively low accuracy for the models on the other data is expected, since that corpus also includes several sentences in Standard Fijian.
UD Guidelines
This section lists several frequently used dependencies in Fijian, including a description and one or two examples for each of them.
root
In Fijian, not only verb, but any Part of Speech except time words can be a predicate head, and thus can possibly be a root.
- Example: number as the root
nsubj, expl, csubj
A predicate usually requires a subject pronoun preceding the predicate head (but it can be omitted sometimes, e.g. when a cardinal pronoun is present following the predicate head), and subject NP and subject clause (with the same reference as the subject pronoun) always follow the predicate. When a subject NP or a subject clause is present in a clause, the subject pronoun gets<expl>, and the subject NP gets <nsubj> or the subject clause gets <csubj>. When neither is present but only the subject pronoun is present in the clause, the subject pronoun gets <nsubj> instead of <expl>.
For <expl>, the dependent is a pronoun (subj, obj, card); for <nsubj>, the dependent must be a nominal; for <csubj>, the dependent must be a predicate head.
- Example: A clause containing all three dependencies. For the matrix clause, the subject is a clause and within the subject clause, the subject is a noun.
obl, advmod, advcl
These three are all modifiers of a predicate.
Oblique <obl> is usually a non-core nominal adjunct of the predicate, and in Fijian, it is usually preceded by a preposition. The dependent must be a noun.
A predicate usually takes several adverbial modifiers in a clause, and the <advmod> does not have to be an adverb (e.g. deictic verbs can modify a predicate as well.)
- Example:
Adverbial clause modifier is a clause that modifies a predicate. The dependent is the predicate head of the adv clause. The clause is usually temporal clause, causal clause, conditional clause, etc..
- Example: causal clause
aux
Fijian has separate morphemes as aspect and tense markers, which all get <aux> relation with the predicate head.
- Example: When an <aux> is present, the subject pronoun is not required.
xcomp
An open clause with its own subject. The reference of the subject of <xcomp> is usually the subject or object of the matrix clause. The dependent must be on a predicate head.
obj, ccomp, iobj
Object and ccomp are not very common in Fijian, since 70% Fijian clauses are intransitive. No indirect object has been found in my corpus yet.
Issues and Future Work
- The current annotations are mainly on Boumaa Fijian. Although it is very similar to Standard Fijian in syntax, it is still necessary and worthwhile to annotate more sentences in Standard Fijian.
- The grammar book I am using does not distinguish adverbs and aspect markers very well and put them all into the "modifier" category. Some words are said to be aspect markers, but their corresponding translations in English are like adverbs, such as 'continuously'. When developing my transducers, I added tags according their translations (i.e. I gave adv tag to some words claimed to be asp markers by the grammar book) but during the process of doing UD relations I realised that for future work, it might be necessary and important to better distinguish asp markers and true adverbs.
License and Copyright
The annotation is licensed under an open source license GNU GLP v3. The original text is protected by copyright: All rights reserved by R.M.W. Dixion, the author of the book A grammar of Boumaa Fijian.
References and Resources
For more details: https://github.swarthmore.edu/hwang11/fij-final/tree/master.
And it is also published publicly on github: https://github.com/hwang16/UD_fij.
Churchward, C. M. (1941). A new Fijian grammar. Suva (Fiji): Government of Fiji.
Dixon, R.M.W. (1988). A Grammar of Boumaa Fijian. Chicago: University of Chicago Press.