Difference between revisions of "Morphological disambiguator"

From LING073
Jump to: navigation, search
(Why we need disambiguation)
(Using Constraint Grammar to disambiguate)
Line 10: Line 10:
  
 
== Using Constraint Grammar to disambiguate ==
 
== Using Constraint Grammar to disambiguate ==
Constraint Grammar (CG) is a formalism for making context sensitive rules to select or remove analyses from the list of possible analyses.
+
Constraint Grammar (CG) is a formalism that can be used to formalise context sensitive constraints to select or remove analyses from the list of possible analyses.
  
 
=== The structure of a CG file ===
 
=== The structure of a CG file ===
  
=== Rules ===
+
At the top you define delimiters.  You can then add lists (which are kind of like sets in twol).  After that you can have any number of constraints.  You can also put lists and constraints in sections: all the constraints in each section are applied simultaneously, but the sections are applied in order and lists are always global.
 +
 
 +
=== Constraints ===
 +
 
 +
The most commonly used constraints are to either '''remove''' a reading under a certain condition or '''select''' a reading under a certain condition.  The syntax is e.g., <code>REMOVE <target> [contextual tests] ;</code>.
 +
 
 +
Targets can be defined as list names—like <code>Nominals</code>—or individual tag names, in parentheses—e.g., <code>(n pl)</code>.  This target identifies the reading to select or remove.
 +
 
 +
Contextual tests come together in ()s.  All tests must match in order for the constraint to apply.  Inside the parentheses, the first definition is a position: <code>0</code> means the current cohort, <code>-1</code> means the previous cohort, etc.  The rest of the test is what should be matched.  If you want to match any member of the <code>Nominals</code> set in the previous position, it would be <code>(-1 Nominals)</code>  You can match tag sequences by adding an extra set of parentheses, e.g. <code>(1 (n pl))</code> would match a plural noun in the following position.  You can match specific baseforms with quotation marks, e.g. <code>(1 ("house" n))</code> would match any form of the noun "house" in the following position.
 +
 
 +
 
  
 
Example above
 
Example above
Line 26: Line 36:
 
==== Calculating ambiguity ====
 
==== Calculating ambiguity ====
  
==== Seeing which rules are doing what ====
+
==== Seeing which constraints are doing what ====
  
 
== The assignment ==
 
== The assignment ==

Revision as of 03:23, 23 February 2017

Why we need disambiguation

Imagine you have a word that has two different tagsets, e.g.

^houses/house<n><pl>/house<v><tv><p3><sg>$
^this/this<det><dem><sg>/this<pron><dem><sg>$

Normally your analyser will randomly choose one of these. Now imagine you have a sentence where the wrong analysis is chosen (in this case, twice!):

^The/The<det><def>$ ^motel/motel<n><sg>$ ^houses/house<n><pl>$ ^this/this<pron><dem><sg>$ ^dog/dog<n><sg>$^./.<sent>$

The goal of a disambiguator is to choose the correct analysis based on the surrounding words.

Using Constraint Grammar to disambiguate

Constraint Grammar (CG) is a formalism that can be used to formalise context sensitive constraints to select or remove analyses from the list of possible analyses.

The structure of a CG file

At the top you define delimiters. You can then add lists (which are kind of like sets in twol). After that you can have any number of constraints. You can also put lists and constraints in sections: all the constraints in each section are applied simultaneously, but the sections are applied in order and lists are always global.

Constraints

The most commonly used constraints are to either remove a reading under a certain condition or select a reading under a certain condition. The syntax is e.g., REMOVE <target> [contextual tests] ;.

Targets can be defined as list names—like Nominals—or individual tag names, in parentheses—e.g., (n pl). This target identifies the reading to select or remove.

Contextual tests come together in ()s. All tests must match in order for the constraint to apply. Inside the parentheses, the first definition is a position: 0 means the current cohort, -1 means the previous cohort, etc. The rest of the test is what should be matched. If you want to match any member of the Nominals set in the previous position, it would be (-1 Nominals) You can match tag sequences by adding an extra set of parentheses, e.g. (1 (n pl)) would match a plural noun in the following position. You can match specific baseforms with quotation marks, e.g. (1 ("house" n)) would match any form of the noun "house" in the following position.


Example above

Useful commands

Getting output before disambiguation

Getting output after disambiguation

Calculating ambiguity

Seeing which constraints are doing what

The assignment

  1. Add words
  2. Measure ambiguity
  3. Identify ambiguous form
  4. Write rule
  5. Measure ambiguity

More resources