Jeju/Transducer

From LING073
Jump to: navigation, search

Code Repository

Analyzer Evaluation

  • Total number of stems: 42
  • Current coverage over combined corpus:
  • Current list of unknown words:
  • Number of tests passing in each yaml file:
    • jje.yaml: 0/20 (I'm working on this one. I've added three tests and lexc changes, but it seems to

not work. Possibly a character issue.)

    • commonwords.yaml: 58/101

First Evaluation(Before tokenization bug fix)

The following is the output of the aq-covtest. It is suspected that the tokenization of the corpus was faulty.

Number of tokenised words in the corpus: 4293

Coverage: 11.32%

Top unknown words in the corpus:

300 ᄒᆞ

285 이

251 다

228 ᆞ

185 주

125 어

91 가

86 도

82 에

77 아

74 을

73 게

71 사

69 은

62 원

53 는

46 준

45 서

44 보

42 하

Translation time: 0.03252243995666504 seconds

Second Evaluation(After tokenization fix)

Although the results have gotten more plausible, there still remains the problem of too many single character tokens.

Number of tokenised words in the corpus: 4874

Coverage: 86.25%

Top unknown words in the corpus:

79 ᄒᆞ

43 ᄀᆞ

20 ᄆᆞ

19 어떵ᄒᆞ

18 ᄎᆞ

17 ᄀᆞ튼

17 ᄒᆞᄊᆞ

15 ᄒᆞ는

13 경ᄒᆞ

9 유명ᄒᆞ

8 ᄉᆞ

7 ᄀᆞ치

7 ᄒᆞ메

7 경ᄒᆞ난

6 ᄒᆞ멍

6 ᄒᆞ꼼

5 ᄌᆞ

5 ᄒᆞ곡

5 ᄄᆞ

5 경ᄒᆞ고

Translation time: 0.0666646957397461 seconds

Third Evaluation(After .lexc update)

I have added three test that I know the form of. However, the test is not passing yet. This is possibly due to a character issue similar to the tokenization problem.

Coverage Increase:

Notes

Currently(02.16.17), 62 out 101 tests succeed.

  • 3 are emphatic suffix examples.(not yet added.)
  • 18 are number examples.(not yet added.)
  • 3 are future tense examples.(not yet added.)
  • 1 is an imperative example.(not yet added.)
  • 3 are interrogative example.(not yet added.)
  • 1 is a present test example.(irregular form.)
  • 3 are subordinating conjugate suffix examples.(not yet added.)
  • 1 is a topic information suffix example.(not yet added.)
  • 3 are verbal adjective examples.(not yet added.)
  • 3 are verbal noun examples.(not yet added.)

Generator Evaluation

Initial evaluation of morphological generation

Number of passes and fails for the analysis test

jje.yaml(generated from grammar page)

62 out 101 tests succeed.

39 out 101 tests fail.

Number of passes and fails for the generation test

jje.yaml(generated from grammar page)

47 out of 129 tests succeed.

82 out of 129 tests fail.

Coverage statistics

Top unknowns:

     5 불르는
     5 창조한
     5 놀아난
     5 자랑만
     5 멘든
     4 피해자들을
     4 짐셍덜이
     4 모삼수다
     4 물회에는
     4 집의서만
     4 보이게
     4 이영
     4 헴디
     4 보난
     4 보레
     4 메틀
     4 이신
     4 더꺼
     4 이젠
     4 먹어

total coverage: 731/4790 ~0.15260960334029227557

Final evaluation of morphological generation

Number of passes and fails for the analysis test

68 out of 101 tests succeed.

33 out of 101 tests fail.

Number of passes and fails for the generation test

68 out 101 tests succeed.

33 out of 101 tests fail.

Number of twol rules added

6 twol rules were added.

Coverage statistics

Top unknowns:

     5 불르는
     5 창조한
     5 놀아난
     5 자랑만
     5 멘든
     4 피해자들을
     4 짐셍덜이
     4 모삼수다
     4 물회에는
     4 집의서만
     4 주말엔
     4 보이게
     4 이영
     4 헴디
     4 보난
     4 보레
     4 메틀
     4 이신
     4 더꺼
     4 이젠

total coverage: 742/4790 ~0.15490605427974947808