Difference between revisions of "Miyako/Transducer"

From LING073
Jump to: navigation, search
(Evaluation)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
The code for the transducer can be found [https://github.swarthmore.edu/doldham1/ling073-mvi here].
 
The code for the transducer can be found [https://github.swarthmore.edu/doldham1/ling073-mvi here].
 
+
==Analyser Evaluation==
==Evaluation==
+
===Evaluation===
 
* There are currently 42 stems in the transducer.
 
* There are currently 42 stems in the transducer.
 
* The current coverage is 19.12%.
 
* The current coverage is 19.12%.
Line 27: Line 27:
 
*: っま
 
*: っま
  
==Notes==
+
===Notes===
The following tests still do not work:
+
As of submission of the analyser, the following tests did not work:
 
*One verb because of an apostrophe problem, and one because I need lexicons for class 2 stem 1 and stem 2 verbs.
 
*One verb because of an apostrophe problem, and one because I need lexicons for class 2 stem 1 and stem 2 verbs.
 
*Most of the final particles, because the form of the tests need to be fixed.
 
*Most of the final particles, because the form of the tests need to be fixed.
Line 34: Line 34:
  
 
The first time I ran aq-covtest, 15.19% of my corpus was covered.  I discovered that a number of the top unknown forms were in fact postpositions, which led me to realise that in part of my corpus, the postpositions were separate from the words.  Having resolved that problem, I got up to 16.91%.  By adding {{morphTest|あみ{{tag|n}}{{tag|gen}}|あみぬ}} and {{morphTest|みず{{tag|n}}{{tag|gen}}|みずぬ}} to the transducer, coverage increased to 19.12%.
 
The first time I ran aq-covtest, 15.19% of my corpus was covered.  I discovered that a number of the top unknown forms were in fact postpositions, which led me to realise that in part of my corpus, the postpositions were separate from the words.  Having resolved that problem, I got up to 16.91%.  By adding {{morphTest|あみ{{tag|n}}{{tag|gen}}|あみぬ}} and {{morphTest|みず{{tag|n}}{{tag|gen}}|みずぬ}} to the transducer, coverage increased to 19.12%.
 +
 +
==Generator Evaluation==
 +
===Initial analysis of morphological generation===
 +
* 81 morphological analysis tests pass and 11 fail.
 +
* The current coverage is 19.12%. (Then I fixed the issues with the corpus and it dropped to 14.57%.)
 +
* 81 morphological generation tests pass and 53 fail.
 +
 +
===Final analysis of morphological generation===
 +
* 88 morphological generation tests pass and 4 fail.
 +
* I added 5 twol rules.
 +
* The current coverage is 16.02%.
  
  
 
[[Category:Sp17_Transducers]]
 
[[Category:Sp17_Transducers]]
 +
[[Category:Miyako]]

Latest revision as of 06:28, 5 March 2017

The code for the transducer can be found here.

Analyser Evaluation

Evaluation

  • There are currently 42 stems in the transducer.
  • The current coverage is 19.12%.
  • There are currently 81 tests passing in mvi.yaml and 2 passing in commonwords.yaml.
  • The current list of top unknown words is
    あい
    たるが
    ひー
    ひらいん
    くとぅー
    どぅみっさすたい
    あたい
    みんぬ
    つっち
    あいな
    っふぃ
    いつ
    うでぃー
    かでぃぬ
    みゃーくん
    あみゃあ
    っま

Notes

As of submission of the analyser, the following tests did not work:

  • One verb because of an apostrophe problem, and one because I need lexicons for class 2 stem 1 and stem 2 verbs.
  • Most of the final particles, because the form of the tests need to be fixed.
  • All of the parts of speech, because I didn't put them in.

The first time I ran aq-covtest, 15.19% of my corpus was covered. I discovered that a number of the top unknown forms were in fact postpositions, which led me to realise that in part of my corpus, the postpositions were separate from the words. Having resolved that problem, I got up to 16.91%. By adding あみ<n><gen> ↔ あみぬ and みず<n><gen> ↔ みずぬ to the transducer, coverage increased to 19.12%.

Generator Evaluation

Initial analysis of morphological generation

  • 81 morphological analysis tests pass and 11 fail.
  • The current coverage is 19.12%. (Then I fixed the issues with the corpus and it dropped to 14.57%.)
  • 81 morphological generation tests pass and 53 fail.

Final analysis of morphological generation

  • 88 morphological generation tests pass and 4 fail.
  • I added 5 twol rules.
  • The current coverage is 16.02%.