Difference between revisions of "Waray/Final Project"
From LING073
Line 2: | Line 2: | ||
* [https://github.swarthmore.edu/Ling073-sp21/ling073-war-corpus Waray Corpus (Github)] | * [https://github.swarthmore.edu/Ling073-sp21/ling073-war-corpus Waray Corpus (Github)] | ||
* [https://github.com/rebelin/Waray-Lexical-Tools Waray Lexical Tools (Github)] | * [https://github.com/rebelin/Waray-Lexical-Tools Waray Lexical Tools (Github)] | ||
− | |||
− | |||
==Expanding Our Morphological Transducer== | ==Expanding Our Morphological Transducer== | ||
===Initial Anaylser Evaluation=== | ===Initial Anaylser Evaluation=== | ||
* Coverage: 742 / 1239 (~0.59887005649717514124) | * Coverage: 742 / 1239 (~0.59887005649717514124) | ||
* Remaining Unknown Words: 487 | * Remaining Unknown Words: 487 | ||
− | |||
====Lexical Information==== | ====Lexical Information==== | ||
Line 122: | Line 119: | ||
4 ^tanom/*tanom$ | 4 ^tanom/*tanom$ | ||
4 ^os/*os$ | 4 ^os/*os$ | ||
+ | |||
+ | ===Notes=== | ||
+ | * Added two more twol rules to handle verb infixes. | ||
[[Category:Sp21_FinalProjects]] [[Category:Waray]] | [[Category:Sp21_FinalProjects]] [[Category:Waray]] |
Revision as of 23:57, 22 May 2021
Contents
Developed Resources
Expanding Our Morphological Transducer
Initial Anaylser Evaluation
- Coverage: 742 / 1239 (~0.59887005649717514124)
- Remaining Unknown Words: 487
Lexical Information
- Lexicons: 17
- Lexicon entries: 120
- Patterns: 1
- Pattern entries: 9
Counts for individual lexicon
- NounRoot: 23
- Determiners: 13
- PluralDet: 2
- VerbPrefixes: 5
- VerbStems: 5
- Pronouns: 31
- ProperNouns: 1
- Adverbs: 11
- Auxiliary: 1
- Punctuaion: 22
- All anonymous lexicons: 6
Tests
- war.yaml: Total passes: 57, Total fails: 14, Total: 71
- commonWords.yaml: Total passes 4, Total fails: 16, Total: 20
Current Unknown Words
TOP UNKNOWN WORDS:
15 ^nagkaada/*nagkaada$ 12 ^hito/*hito$ 11 ^uyon/*uyon$ 11 ^linarang/*linarang$ 10 ^may/*may$ 10 ^langit/*langit$ 9 ^klase/*klase$ 9 ^hayop/*hayop$ 8 ^katubigan/*katubigan$ 7 ^tagsa/*tagsa$ 7 ^ngatanan/*ngatanan$ 7 ^kalamrag/*kalamrag$ 7 ^basi/*basi$ 6 ^nalupad/*nalupad$ 6 ^liso/*liso$ 6 ^iton/*iton$ 6 ^haluag/*haluag$ 6 ^espasyo/*espasyo$ 6 ^bawbaw/*bawbaw$ 6 ^aga/*aga$
Notes
- Tests for verbalized nouns not implemented yet
Initial Generator Evaluation
Analyzer:
- Total passes: 57
- Total fails: 14
- Total tests: 71
Generator:
- Total passes: 56
- Total fails: 18
- Total tests: 74
- Currently, we have 4 rules in our twol file to handle verb conjugation.
Later Anaylser Evaluation
- Coverage: 935 / 1401 (~0.66738044254104211278)
- Remaining Unknown Words: 466
Lexical Information
Lexicons: 28 Lexicon entries: 269 Patterns: 4 Pattern entries: 18
Counts for individual lexicons: NounRoot: 62 Determiners: 13 PluralDet: 2 VerbPrefixes: 7 VerbStems: 24 Pronouns: 34 ProperNouns: 5 Adverbs: 11 Auxiliary: 1 Punctuation: 22 Num-Lex: 23 Conjunctions: 2 Adjectives: 18 Numeral: 10 Num-SecondLex: 23 All anonymous lexicons: 12
Current Unknown Words
TTOP UNKNOWN WORDS:
15 ^nagkaada/*nagkaada$ 12 ^hito/*hito$ 11 ^uyon/*uyon$ 11 ^linarang/*linarang$ 10 ^may/*may$ 8 ^katubigan/*katubigan$ 7 ^tagsa/*tagsa$ 7 ^ngatanan/*ngatanan$ 7 ^kalamrag/*kalamrag$ 7 ^basi/*basi$ 6 ^nalupad/*nalupad$ 6 ^liso/*liso$ 6 ^haluag/*haluag$ 6 ^espasyo/*espasyo$ 6 ^bawbaw/*bawbaw$ 5 ^kapawa/*kapawa$ 5 ^ginlarang/*ginlarang$ 5 ^Jehova/*Jehova$ 4 ^tanom/*tanom$ 4 ^os/*os$
Notes
- Added two more twol rules to handle verb infixes.