Difference between revisions of "Mixe/Disambiguation"

From LING073
Jump to: navigation, search
(ja'a)
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
Link to Git repository: https://github.swarthmore.edu/Ling073-sp22/ling073-mto
 +
 
=Initial evaluation of ambiguity=
 
=Initial evaluation of ambiguity=
  
Line 8: Line 10:
  
 
unique forms in transducer:106575
 
unique forms in transducer:106575
*most of the repeated forms are various inflected verb forms, I think
+
*The reason there are so many is because it's outputting all possible spelling variations for each token. I'm not sure how to prevent this.
  
 
= Ambiguous terms =
 
= Ambiguous terms =
  
== ja'a ==
+
== pö’ö ==
 
 
 
 
* jaꞌa (jèꞌè in corpus)
 
** <dem>
 
*** Guzman (221) (p108)
 
***: <code> ^ku/ku<cnjc>$ ^ja/'''jaꞌa<dem>'''/jaꞌa<prn>$ ^pöjxïn/pöx<n>$ ^ti/të<enc><perf>$ ^nyëjkxnï/nëkx<v><dep><cpl><p3>$ ^nëjootm/nëjoot<n><loc>$ </code>
 
** <prn>
 
*** Guzman (224) (p109)
 
***: <code> ^ꞌuk/ꞌuk<cnjc>$ ^nänööx/nööx<adj><aug>$ ^jaꞌa/'''jaꞌa<prn>'''/jaꞌa<dem>/$ </code>
 
 
 
 
 
TO ADD TO TRANSDUCER
 
 
 
*'''<aug> = augmentative (opposite of a diminutive)'''
 
*'''nä = <aug> prefix. attaches to nouns'''
 
*'''nööx = <adj>'''
 
*'''ꞌuk = <cnjc>'''
 
* -nï = "already" verb.suffix
 
* '''-m locative adposition'''
 
* '''nëjoot <n>'''
 
* '''ku = <cnjc> subordinator (translates "because", generally)'''
 
 
 
= For translation stuff later on =
 
I thought this was going to be an ambiguous analysis case, but it wasn't, really.
 
  
== juuꞌ ==
+
Verb stem ''pö’ö'' "break" and noun ''pö’ö'' "sand"
  
'''What I think is happening:'''
+
==="break"===
  
''juuꞌ'' is always a relativizer. When introducing a relative clause, one ''juuꞌ'' is required at the beginning of that relative clause. For emphasis, focalization, or other discourse-y things, you can have another ''juuꞌ'' right before the main noun phrase. This ''juuꞌ'' is mostly (or maybe always; I haven't seen any cases not like this) followed by discourse enclitics like ''=ts'' ASSERTIVE and ''=veꞌe'' FOC
+
I had to make up a sentence because (1) I couldn't find any example sentences with this word and, (2) I had trouble finding any other cases of ambiguity.
  
... ''juuꞌ''=enclitic NP ''juuꞌ'' relative-clause ...
+
I started with an example sentence that includes a verb from the same non-agentive ambitransitive verb paradigm as ''pö’ö''.
 +
*: xi nëëj ’ixtaamp
 +
*: xii nëëj ø=’ëjx-tam-p
 +
*: that water 3S.IND=BASE-spill-IDT.ICPL.INTR
 +
*: "That water spilled." (p75 of 312)
  
Both uses are relativizers, but only the first is required. Furthermore, the second type is more accurately translated as "that" or "which" in English, whereas the first type may or may not be translated.
+
First, I switched out ''nëëj'' "wind" with ''pu'u'' "board/table", to try and make the sentence semantically sound. Then, I replaced independent incompletive aspect with independent completive aspect, thus changing the aspect suffix from ''-p'' to ''-ø''. Finally, I switched the verb stem to ''pö’ö''. This gets us, to the best of my knowledge:
  
'''Solution:'''
+
*: '' xi pu'u pö’ö ''
 +
*: '' xii pu'u ø=pö’ö-ø ''
 +
*: that board 3S.IND=break-IDT.CPL.INTR
 +
*: that board broke
  
We'll tag both as relativizers, but only the second type will have the lemma "that". We will need to change the juuꞌ morphTest and remove the <dem> possibility from the transducer.
+
The main thing I'm unclear on is whether the verb stem would take a different form. Some verb stems change more than others depending on aspect, transitivity, dependency, and construction type (direct, inverse). I haven't seen any clear pattern that would allow me to predict what shape ''pö’ö'' takes when it is an independent, completive, intransitive verb.
  
=== Sentences ===
+
Transducer analysis for this sentence:
  
* ex. (238) in Guzman (p 114 | 148/312)
+
^xi/xii<det><dem>$ ^pu'u/pu'u<n><sg>$ ^pö’ö/'''pö’ö<v><idt><cpl><p3>'''/pö’ö<v><dep><cpl><p3>/pö’ö<v><dep><cpl><p1>/pö’ö<v><dep><icpl><p1>/pö’ö<v><dep><icpl><p3>/pö’ö<v><idt><cpl><p1>/pö’ö<n>$^./.<sent>$
* ex. (112.b) in Guzman (p 69 | 103/312)
 
* ex. (185) in Guzman (p 99 | 133/312)
 
* ex. (238) in Guzman (p114) -includes both
 
** ꞌäx juuꞌts yë piꞌkꞌöktä juuꞌ tëꞌktsïnipä ꞌöktä
 
**: <code> ^ꞌäx/ꞌäx<cnjcoo> ^juuꞌts/juuꞌ<rlt><asrt> ^yë/yëꞌë<dem> ^piꞌkꞌöktä/ꞌök<n><pl><?> ^juuꞌ/juuꞌ<rlt> ^tëꞌktsïnipä/tëktsëën<v><tv><nmn><asun> ^ꞌöktä/ꞌök<n><pl> </code>
 
***pi'k is a diminutive, but I don't know whether I should tag it, and if I do, what to tag it as.
 
  
cnjcoo = coordinating conjunction
+
==="sand"===
rlt = relativizers
+
*: ''kumöön '''pöꞌö''' yäkpätsïmjä''
asrt = assertive
+
*: ''kumöön pö’ö y=yäk-pä’äv-tsëm-jäy-I''
asum = assumptive mood (indicates a statement is assumed to be true, because it usually is under similar circumstances)
+
*: town sand 3S.DEP=PAS-EDGE-load-APL.R-ICPL.DEP
nmn = nominalizer
+
*: "The sand is carried to the municipality."
*-pa, specifically, is used in a copula-ish sort of way, it seems. Guzman writes, "-pä se sufija a verbos para referirse a la entidad
 
que desarrolla la actividad que predica la raíz" (119).
 
  
 +
Note that I'm treating ''pä’äv-tsëm'' as a verb stem ''pätsïm'', since ''pä’äv'' is a lexical prefix.
  
The examples from the coffee story (which is mainly what we're using for our corpus) are generally more complicated than the examples in Guzman. (We could add some Guzman examples to the corpus, I guess.)
+
Transducer analysis for this sentence:
  
(juu' "which" is referring to earlier mentioned "coffee") -- A relative pronoun here, I think
+
^kumöön/kumöön<n><sg>$ ^pöꞌö/'''pöꞌö<n>'''/pöꞌö<v><idt><cpl><p3>$ ^yäkpätsïmjä/yäkpä’ävtsëm<v><tv><dep><icpl><pas><apl><p3>$^./.<sent>$
  
*'''juuꞌts''' viijnk kajpün jayuda, vèꞌèts, ?oytyunükts jè jyèꞌè du tumpivda. (Suslak p85)
+
===constraints===
 +
If there is another verb in the sentence, ''pöꞌö'' is likely a noun. If there is no other verb in the sentence, ''pöꞌö'' is almost certainly a verb.
  
* ꞌax kutseꞌe yakmujùydat, ꞌakijpxa cheꞌe nmujùꞌyumdat, veꞌem juuꞌ laata tü dü ꞌapivꞌùtsta, nay veꞌem juuꞌ kajha tü dü apivꞌùtstup. (Suslak p87)
+
= Final evaluation of ambiguity =
 +
Our corpus's level of ambiguity hasn't changed, since it was 1 and continues to be 1. I do think, though, that if we were to implement all possible verb affixes and add all the verb stems from our corpus, we would have a little ambiguity, at least.
  
[[category:Sp22_Disambiguation]], [[category:Mixe]]
+
[[category:Sp22_Disambiguation]]
 +
[[category:Mixe]]

Latest revision as of 16:06, 12 May 2022

Link to Git repository: https://github.swarthmore.edu/Ling073-sp22/ling073-mto

Initial evaluation of ambiguity

Using the script shown in the Calculating ambiguity section of the Morphological Disambiguator wiki page, the initial level of ambiguity in our corpus is about 1.025.

EDIT: After determining that juuꞌ is only a relativizer, not a demonstrative, corpus ambiguity is apparently 1.

forms in transducer: 204288

unique forms in transducer:106575

  • The reason there are so many is because it's outputting all possible spelling variations for each token. I'm not sure how to prevent this.

Ambiguous terms

pö’ö

Verb stem pö’ö "break" and noun pö’ö "sand"

"break"

I had to make up a sentence because (1) I couldn't find any example sentences with this word and, (2) I had trouble finding any other cases of ambiguity.

I started with an example sentence that includes a verb from the same non-agentive ambitransitive verb paradigm as pö’ö.

  • xi nëëj ’ixtaamp
    xii nëëj ø=’ëjx-tam-p
    that water 3S.IND=BASE-spill-IDT.ICPL.INTR
    "That water spilled." (p75 of 312)

First, I switched out nëëj "wind" with pu'u "board/table", to try and make the sentence semantically sound. Then, I replaced independent incompletive aspect with independent completive aspect, thus changing the aspect suffix from -p to . Finally, I switched the verb stem to pö’ö. This gets us, to the best of my knowledge:

  • xi pu'u pö’ö
    xii pu'u ø=pö’ö-ø
    that board 3S.IND=break-IDT.CPL.INTR
    that board broke

The main thing I'm unclear on is whether the verb stem would take a different form. Some verb stems change more than others depending on aspect, transitivity, dependency, and construction type (direct, inverse). I haven't seen any clear pattern that would allow me to predict what shape pö’ö takes when it is an independent, completive, intransitive verb.

Transducer analysis for this sentence:

^xi/xii<det><dem>$ ^pu'u/pu'u<n><sg>$ ^pö’ö/pö’ö<v><idt><cpl><p3>/pö’ö<v><dep><cpl><p3>/pö’ö<v><dep><cpl><p1>/pö’ö<v><dep><icpl><p1>/pö’ö<v><dep><icpl><p3>/pö’ö<v><idt><cpl><p1>/pö’ö<n>$^./.<sent>$

"sand"

  • kumöön pöꞌö yäkpätsïmjä
    kumöön pö’ö y=yäk-pä’äv-tsëm-jäy-I
    town sand 3S.DEP=PAS-EDGE-load-APL.R-ICPL.DEP
    "The sand is carried to the municipality."

Note that I'm treating pä’äv-tsëm as a verb stem pätsïm, since pä’äv is a lexical prefix.

Transducer analysis for this sentence:

^kumöön/kumöön<n><sg>$ ^pöꞌö/pöꞌö<n>/pöꞌö<v><idt><cpl><p3>$ ^yäkpätsïmjä/yäkpä’ävtsëm<v><tv><dep><icpl><pas><apl><p3>$^./.<sent>$

constraints

If there is another verb in the sentence, pöꞌö is likely a noun. If there is no other verb in the sentence, pöꞌö is almost certainly a verb.

Final evaluation of ambiguity

Our corpus's level of ambiguity hasn't changed, since it was 1 and continues to be 1. I do think, though, that if we were to implement all possible verb affixes and add all the verb stems from our corpus, we would have a little ambiguity, at least.