Kikuyu/Universal Dependencies

From LING073
Jump to: navigation, search

This page details Universal Dependencies for Kikuyu, and describes UD annotation and the training of several models on corpora.

Evaluation

Here, Corpus 1 is the corpus on which the parsers were trained, and Corpus 2 is another Kikuyu corpus. The withmorph parser was trained on the corpus with morphological information, while the nomorph parser was trained on the corpus without morphological information.

The following summarizes the sizes of the corpora:

Corpus 1 Corpus 2
Sentences 27 16
Forms 197 110

The performance of the parsers on the corpora is as follows:

Evaluation of Parsers on Corpus 1
LAS UAS
withmorph 60.53% 63.16%
nomorph 49.12% 57.02%

For Corpus 1, the withmorph parser fared better on both LAS and UAS metrics, although the degree to which it fared better than the nomorph parser was somewhat higher for the LAS metric.

Evaluation of Parsers on Corpus 2
LAS UAS
withmorph 8.86% 22.78%
nomorph 6.33% 30.38%

Both parsers fared much worse on Corpus 2 than on Corpus 1, as expected, given that the parsers were trained on Corpus 1. Surprisingly, the nomorph parser actually had a better UAS score than the withmorph parser for Corpus 2.

Dependency Relations

In this section, five different dependency relations used in annotating the corpora are detailed.

nmod

The nmod relation is, in general, used to mark a nominal modifying another nominal in some way. In Kikuyu, this form of modification appears often; for example, in the noun-associative-noun construction, the second nominal serves as a modifier of the first. There are also numerous constructions in which a noun modifies an adjacent noun. Examples include:

  • The phrase "mbathi thaa inya", meaning 'ten o'clock bus' (literally 'bus hour four'), has the noun "thaa" ('hour') modifying the noun "mbathi" ('bus') with an nmod relationship.
  • The phrase "marwa ma tax", meaning 'tax letter' (literally 'letter ASSOC tax'), has the nominal headed by "tax" ('tax') modifying the noun "marwa" ('letter') with an nmod relationship.

case

The case relation is, in general, used to mark something like a preposition which acts as a case-assigner for a nominal. In Kikuyu, we have chosen to analyze the relationship between an associative marker and the next word as one of case, as the associative plays a similar role to that of a word like "of" in English. Furthermore, the associative is used in possessive constructions, which usually involve a case relationship. The case relationship also appears with prepositions.

  • The phrase "ndarĩkia na gym", meaning 'I finished with the gym' (literally 'I-finished with gym'), has the preposition "na" ('with') in a case relationship with the nominal "gym" ('gym').
  • The phrase "Mũthenya Ũmwe wa Wambũi", meaning 'A Day in the Life of Wambũi" (literally 'day one ASSOC Wambũi'), has the associative "wa" in a case relationship with the nominal "Wambũi".

num

The num relation is used when a number is modifying a nominal. This construction appears often in Kikuyu, with the number word following the nominal.

  • The phrase "thaa ithatũ kĩroko", meaning '9 in the morning' (literally 'hour three morning'), has the number "ithatũ" modifying the nominal "thaa" ('hour') with a nummod dependency. This is generally how time is expressed in Kikuyu.
  • The phrase "mĩrongo ĩrĩ", meaning 'twenty' (literally 'set-of-ten two'), has the number "ĩrĩ" ('two') modifying "mĩrongo" ('set-of-ten') with a nummod dependency.

obl

The obl relation is used for non-core nominal dependents of the sentence's root. Such dependents are very common in Kikuyu, and are often describing the place of the event or other descriptors.

  • The phrase "ndamũona kinya thaa thita ĩtigairie ndagĩka ikũmi", meaning 'I saw her until 10 minutes to 12', has the nominal headed by "thaa" ('hour'), which is the phrase meaning 'until 10 minutes to 12', with obl dependency to the verb "ndamũona" (I-saw), as it is not a core dependent of the verb but does depend on it.
  • The phrase "gakinya rũũĩ-inĩ", meaning 'when he arrived at the river', has the locative nominal "rũũĩ-inĩ" ('at the river') modifying the verb "gakinya" 'arrive' with an obl dependency, at is a dependent on the verb but not a core dependent.

xcomp

The xcomp relation describes the dependency between a verb's (or adjective's) subjectless clausal complement and the verb itself. Such constructions appear very often in Kikuyu, as verbs are often followed by other verbs, in analogous constructions to "start to work" in English. One particularly common construction involves the verb translated as 'return', which often takes subjectless verbal complements.

  • The phrase "gathiĩ kuona kĩũra", meaning 'he went to see the frog' (literally 'he-went see frog'), has the second verb, "kuona" ('see'), with an xcomp dependency on the first verb, "gathiĩ" (went).
  • The phrase "ndacoka ndanyua ũcũrũ", meaning 'I then drank porridge' (literally I-return drink porridge), has the second verb, "ndanyua" ('drink'), with an xcomp dependency on the first verb, "ndacoka" ('return'). This is the common construction with "cok" ('return') alluded to above, where cok plus a subjectless verbal complement is used to mean '[subject] then did [verbal complement]'.