Hwang11/Final project

From LING073
Jump to: navigation, search

Overview

This final project is an extension of the last assignment on universal dependencies. I continue working on the language, FIjian, I chose at the beginning. For this project, I plan to annotate 232 sentences in a text drawn from the main grammar book A Grammar of Boumaa Fijian by R.M.W. Dixon (currently only 110 sentences are annotated). In this process, I also expect to enrich my morphological transduce as encountering more sentences and new words. Then after annotating the sentences, I will also train a parser and evaluate the accuracy of the parser on both the data used for training and other data. Through annotating a larger corpus, I hope to develop a more accurate parser than the one for my last assignment and also to explore more about the dependency syntax in Fijian.

Evaluation

The current evaluation is on the transducer with 260 more stems added and a model trained by a .conllu file containing 90 annotated sentences from the text in the grammar book. The "other data" used in the test right now is the fij.annotated.ud.conllu annotated in the previous assignment.

  • Current coverage over a large corpus: 75.26%
-This rate is expected because though a lot more stems are added into the transducer, they are almost all in Boumaa Fijian, a dialect overlapping 80% with the Standard Fijian, which the large corpus (Bible New Testament) is in.
  • Accuracy for the trained UD-relations parser:
fij.withmorph.udpipe fij.nomorph.udpipe Number of forms Number of sentences
Training data UAS: 96.83%, LAS: 94.96% UAS: 86.02%, LAS: 82.65% 1481 90
Other data UAS: 73.05%, LAS: 59.53% UAS: 67.02%, LAS: 57.80% 282 30

The relatively low accuracy for the models on the other data is expected, since that corpus also includes several sentences in Standard Fijian.

UD Guidelines

This section lists several frequently used dependencies in Fijian, including a description and one or two examples for each of them.

root

In Fijian, not only verb, but any Part of Speech except time words can be a predicate head, and thus can possibly be a root.

  • Example: number as the root
'There was once a king in this land of Boumaa, the Vuunisaa.'





nsubj, expl, csubj

A predicate usually requires a subject pronoun preceding the predicate head (but it can be omitted sometimes, e.g. when a cardinal pronoun is present following the predicate head), and subject NP and subject clause (with the same reference as the subject pronoun) always follow the predicate. When a subject NP or a subject clause is present in a clause, the subject pronoun gets<expl>, and the subject NP gets <nsubj> or the subject clause gets <csubj>. When neither is present but only the subject pronoun is present in the clause, the subject pronoun gets <nsubj> instead of <expl>.

For <expl>, the dependent is a pronoun (subj, obj, card); for <nsubj>, the dependent must be a nominal; for <csubj>, the dependent must be a predicate head.

  • Example: A clause containing all three dependencies. For the matrix clause, the subject is a clause and within the subject clause, the subject is a noun.
'One of the women said:'






obl, advmod, advcl

These three are all modifiers of a predicate.

Oblique <obl> is usually a non-core nominal adjunct of the predicate, and in Fijian, it is usually preceded by a preposition. The dependent must be a noun.

A predicate usually takes several adverbial modifiers in a clause, and the <advmod> does not have to be an adverb (e.g. deictic verbs can modify a predicate as well.)

  • Example:
'The two of them went counter-clockwise, reached Vurevure.'






Adverbial clause modifier is a clause that modifies a predicate. The dependent is the predicate head of the adv clause. The clause is usually temporal clause, causal clause, conditional clause, etc..

  • Example: causal clause
'Well, he was known because of the unique length of his feet (lit: because his feet is very long).'






aux

Fijian has separate morphemes as aspect and tense markers, which all get <aux> relation with the predicate head.

  • Example: When an <aux> is present, the subject pronoun is not required.
'The married couple wanted to sleep.'






xcomp

An open clause with its own subject. The reference of the subject of <xcomp> is usually the subject or object of the matrix clause. The dependent must be on a predicate head.

'The married couple wanted to sleep.'






obj, ccomp, iobj

Object and ccomp are not very common in Fijian, since 70% Fijian clauses are intransitive. No indirect object has been found in my corpus yet.

Issues and Future Work

  • The current annotations are mainly on Boumaa Fijian. Although it is very similar to Standard Fijian in syntax, it is still necessary and worthwhile to annotate more sentences in Standard Fijian.
  • The grammar book I am using does not distinguish adverbs and aspect markers very well and put them all into the "modifier" category. Some words are said to be aspect markers, but their corresponding translations in English are like adverbs, such as 'continuously'. When developing my transducers, I added tags according their translations (i.e. I gave adv tag to some words claimed to be asp markers by the grammar book) but during the process of doing UD relations I realised that for future work, it might be necessary and important to better distinguish asp markers and true adverbs.

License and Copyright

The annotation is licensed under an open source license GNU GLP v3. The original text is protected by copyright: All rights reserved by R.M.W. Dixion, the author of the book A grammar of Boumaa Fijian.

References and Resources

For more details: https://github.swarthmore.edu/hwang11/fij-final/tree/master.

And it is also published publicly on github: https://github.com/hwang16/UD_fij.

Churchward, C. M. (1941). A new Fijian grammar. Suva (Fiji): Government of Fiji.

Dixon, R.M.W. (1988). A Grammar of Boumaa Fijian. Chicago: University of Chicago Press.