Difference between revisions of "Khasi/Transducer"

From LING073
Jump to: navigation, search
(Evaluation)
 
(7 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
=Link to code=
 
=Link to code=
 
https://github.swarthmore.edu/nfeldba1/ling073-kha
 
https://github.swarthmore.edu/nfeldba1/ling073-kha
=Evaluation=
+
=Analyzer Evaluation=
 
* At the start of the evaluation, there was 43.57% coverage, and the top unknown form was 'bad,' meaning 'and.' Another commonly missed form was 'ruh,' meaning 'too,' as well as 'dei,' meaning 'to hit,' 'to belong to,' or, as an adjective, 'proper.'
 
* At the start of the evaluation, there was 43.57% coverage, and the top unknown form was 'bad,' meaning 'and.' Another commonly missed form was 'ruh,' meaning 'too,' as well as 'dei,' meaning 'to hit,' 'to belong to,' or, as an adjective, 'proper.'
 +
* At the end of the evaluation, there was 47.64% coverage, for an increase of 4.07%.
 +
* I have 40 roots in my transducer.
 +
* The full current list of top unknown words are:
 +
**22 ba
 +
**13 a
 +
**11 ruh
 +
**8 bha
 +
**8 dei
 +
**7 bneng
 +
**7 ong
 +
**7 um
 +
**6 ne
 +
**6 don
 +
**6 halor
 +
**6 leh
 +
**6 jaid
 +
**5 por
 +
**5 pat
 +
**5 Te
 +
**5 kiwei
 +
**5 jingshai
 +
**5 hapoh
 +
**5 khot
 +
* The commonwords.yaml file has one passing test
 +
* The main yaml file has 94/106 tests passing, for an 89% success rate. Note that many of the tests were testing similar things, though.
  
=Notes=
+
=Analyzer Notes=
 
* Two of my abstract noun tests fail
 
* Two of my abstract noun tests fail
 
**I wasn't sure how to make a rule that said that only an adjectival prefix of ka forms abstract nouns
 
**I wasn't sure how to make a rule that said that only an adjectival prefix of ka forms abstract nouns
Line 13: Line 38:
 
**There are often more than two words involved in forming the comparative, and I'm not sure how to deal with this.
 
**There are often more than two words involved in forming the comparative, and I'm not sure how to deal with this.
 
**One superlative case works, which is the only two-word case. The rest involve three words, which again, I'm not sure how to deal with
 
**One superlative case works, which is the only two-word case. The rest involve three words, which again, I'm not sure how to deal with
 +
* At the start of the evaluation, there was 43.57% coverage, and the top unknown form was 'bad,' meaning 'and.' Another commonly missed form was 'ruh,' meaning 'too,' as well as 'dei,' meaning 'to hit,' 'to belong to,' or, as an adjective, 'proper.'
 +
* At the end of the evaluation, there was 47.64% coverage, for an increase of 4.07%.
 +
 +
=Generator Evaluation=
 +
==Initial evaluation of morphological generation==
 +
* At the moment, there is 48% coverage (of analysis)
 +
* The main yaml file has 94/106 tests passing, for an 89% success rate. Note that many of the tests were testing similar things, though.
 +
* Initially for morphological generation, there were 90/174 total passes, or a 52% success rate
 +
 +
==Final evaluation of morphological generation==
 +
* 41 tests are now passing out of 53.
 +
** I restructured the language to be based on prepositions rather than cases, which is why my kha.yaml file changed
 +
* I added no twol rules, but I did add a twoc rule.
 +
* After running the coverage test on my corpus again, I now have 58.70% coverage.
 +
 +
[[Category:Sp17_Transducers]]

Latest revision as of 16:41, 25 May 2017

Link to code

https://github.swarthmore.edu/nfeldba1/ling073-kha

Analyzer Evaluation

  • At the start of the evaluation, there was 43.57% coverage, and the top unknown form was 'bad,' meaning 'and.' Another commonly missed form was 'ruh,' meaning 'too,' as well as 'dei,' meaning 'to hit,' 'to belong to,' or, as an adjective, 'proper.'
  • At the end of the evaluation, there was 47.64% coverage, for an increase of 4.07%.
  • I have 40 roots in my transducer.
  • The full current list of top unknown words are:
    • 22 ba
    • 13 a
    • 11 ruh
    • 8 bha
    • 8 dei
    • 7 bneng
    • 7 ong
    • 7 um
    • 6 ne
    • 6 don
    • 6 halor
    • 6 leh
    • 6 jaid
    • 5 por
    • 5 pat
    • 5 Te
    • 5 kiwei
    • 5 jingshai
    • 5 hapoh
    • 5 khot
  • The commonwords.yaml file has one passing test
  • The main yaml file has 94/106 tests passing, for an 89% success rate. Note that many of the tests were testing similar things, though.

Analyzer Notes

  • Two of my abstract noun tests fail
    • I wasn't sure how to make a rule that said that only an adjectival prefix of ka forms abstract nouns
    • As the above didn't work, I didn't yet take the time to create the jing prefix, which goes along with the ka-formed abstract nouns
  • I also didn't work with the agentive marker yet
    • Agentive noun is formed when the particle 'nong' attaches to a verb to make a noun
  • Comparatives should fail, as well, and as should most superlatives
    • There are often more than two words involved in forming the comparative, and I'm not sure how to deal with this.
    • One superlative case works, which is the only two-word case. The rest involve three words, which again, I'm not sure how to deal with
  • At the start of the evaluation, there was 43.57% coverage, and the top unknown form was 'bad,' meaning 'and.' Another commonly missed form was 'ruh,' meaning 'too,' as well as 'dei,' meaning 'to hit,' 'to belong to,' or, as an adjective, 'proper.'
  • At the end of the evaluation, there was 47.64% coverage, for an increase of 4.07%.

Generator Evaluation

Initial evaluation of morphological generation

  • At the moment, there is 48% coverage (of analysis)
  • The main yaml file has 94/106 tests passing, for an 89% success rate. Note that many of the tests were testing similar things, though.
  • Initially for morphological generation, there were 90/174 total passes, or a 52% success rate

Final evaluation of morphological generation

  • 41 tests are now passing out of 53.
    • I restructured the language to be based on prepositions rather than cases, which is why my kha.yaml file changed
  • I added no twol rules, but I did add a twoc rule.
  • After running the coverage test on my corpus again, I now have 58.70% coverage.