Lingala and Kikuyu/Lexical Selection

From LING073
Jump to: navigation, search

This page details the lexical selection process for machine translation between Lingala and Kikuyu for one-to-many mappings.

lin → kik

Solved Ambiguities

We have implemented rules to solve two ambiguous translations from Lingala to Kikuyu.

Mabele

The Lingala noun mabele can be translated as follows:

  • (lin) mabele → (kik) thanga (sand); ria (milk)

To (mostly) solve this, we made generalizations based on the meaning of the preceding rule. We chose a default translation of thanga since this is the only way mabele is used in our parallel corpus. We override the default translation when mabele is preceded by "bololo" (sour), "kotoka" (pour), or "komela" (drink), since these are the most likely contexts in which mabele would refer to milk. Obviously this misses a couple of cases, but it helps. The following sentences can be used to test both rules:

  • na mabele ("from the earth/sand," should be translated thanga) (it's a bit odd to use mabele here, but this example is taken verbatim from our corpus)
  • nabuákí mabele ("I throw sand," should be translated thanga)
  • bololo mabele ("sour milk," should be translated ria)
  • atokí mabele ("He pours milk," should be translated ria)
  • tomelí mabele ("We drink milk," should be translated ria)

Kobéta

The Lingala verb kobéta can be translated as follows:

  • (lin) kobéta → (kik) hũr (to beat (as in hit)); ur (to rain)

To (mostly) solve this, we made generalizations based on grammar of the sentence. When kobéta is used to mean beat, it is nearly always transitive, referring to someone hitting someone/something else. Hence, we translate kobéta as hũr when it is followed by any noun or pornoun. When used to mean rain (verb), kobéta is almost always is either followed by an adverb or ends the sentence. We thus us ur as the default translation when the context for translation to hũr is not present. The following sentences can be used to test the rules:

  • abétí nyama ("He beats the animal," should be translated hũr)
  • nakobéta yo ("I will beat you," should be translated hũr)
  • ebétí lelo ("It rains today," should be translated ur)
  • ekobéta ("It will rain," should be transated ur)

Unsolved Ambiguities

Kokíma

The Lingala verb kokíma can be translated as follows:

  • (lin) kokíma → (kik) hanyũk (to run); ũr (to run away)

This ambiguity is hard to solve because the difference between the two words is rather nuanced. One possible generalization is that running from something should be translated as running away (ũr), but this cannot be implemented because Lingala has a single preposition used for both from and to, among other meanings.

Kofánda

The Lingala verb kofánda can be translated as follows:

  • (lin) kofánda → (kik) ikar (to sit); tũũr (to live (as in reside))

This ambiguity is hard to solve because there is no obvious generalization that can be made. Both can be followed (and often are) by a preposition, so whether a preposition follows cannot be used in a rule. In general, the contexts are hard to distinguish. One might try to select the translation based on the object of the preposition following, if there is one, but this would end up being fairly complicated, and we had other examples of one-to-many mappings that could be more readily and thoroughly solved by lexical selection.

Kobánza

The Lingala verb kobánza can be translated as follows:

  • (lin) kobánza → (kik) ĩcir (to think); ririkan (to remember)

This ambiguity is hard to solve mostly because we do not have a good understanding of how the syntax of kobánza works in Lingala, since we don't have many example sentences, and it is consequently hard to make any generalizations.

kik → lin

Solved Ambiguities

We have implemented rules to solve one ambiguous translation from Kikuyu to Lingala.

Ria

The Kikuyu noun ria can be translated as follows:

  • (kik) ria → (lin) mabele (milk); etimá (pond)

To (mostly) solve this, we made generalizations based on the meaning of the preceding rule. We chose a default translation of etimá because it is easier to isolate situations where mabele is used than where etimá is used. We override the default translation when ria is followed by mata (thick/creamy), preceded or followed by it (pour/spill), preceded by (drink), or preceded by na (with)since these are likely contexts in which ria would refer to milk. As with the similar ambiguity in the other direction, this misses quite a few cases. The following phrases can be used to test the rules

  • iria imata ("thick milk", should be translated mabele)
  • na iria ("with milk", should be translated mabele)
  • nĩnyuire iria ("I drank milk", should be translated mabele)
  • nĩnjitire iria ("I poured the milk", should be translated mabele)
  • iria nĩrĩitĩkire ("The milk spilled", should be translated mabele)
  • iriainĩ (sometimes spelled iria-inĩ) ("in the lake", should be translated etímá)
  • iria inene ("the big lake", should be translated etímá)

Unsolved Ambiguities

Igu

The Kikuyu verb igu can be translated as follows:

  • (kik) igu → (lin) kotosa (to obey); koyoka (to hear)

This ambiguity is hard to solve because many of the contexts in which the word for hear might appear are also contexts in which obey might appear and vice versa. For example, although one might obey a command, one might also hear a command.

Rut

The Kikukyu verb rut can be translated as follows:

  • (kik) rut → (lin) koyekola (to learn); kokukula (to remove)

This ambiguity is hard to solve because, although the two meanings are far removed from each other, the contextual information that lets a hearer select the correct meaning is much more sophisticated than a small set of lemmas or tags. It is therefore hard to make a generalization that could be implemented as a rule.