Difference between revisions of "Waray/Final Project"
From LING073
(Created page with " Category:Sp21_FinalProjects Caegory:Waray") |
(→Later Anaylser Evaluation) |
||
(18 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | ==Developed Resources== | ||
+ | * [https://github.swarthmore.edu/Ling073-sp21/ling073-war-corpus Waray Corpus (Github)] | ||
+ | * [https://github.com/rebelin/Waray-Lexical-Tools Waray Lexical Tools (Github)] | ||
+ | ==Expanding Our Morphological Transducer== | ||
+ | ===Initial Anaylser Evaluation=== | ||
+ | * Coverage: 742 / 1239 (~0.59887005649717514124) | ||
+ | * Remaining Unknown Forms: 487 | ||
+ | ====Lexical Information==== | ||
+ | * Lexicons: 17 | ||
+ | * Lexicon entries: 120 | ||
+ | * Patterns: 1 | ||
+ | * Pattern entries: 9 | ||
− | [[Category:Sp21_FinalProjects]] [[ | + | =====Counts for individual lexicon===== |
+ | * NounRoot: 23 | ||
+ | * Determiners: 13 | ||
+ | * PluralDet: 2 | ||
+ | * VerbPrefixes: 5 | ||
+ | * VerbStems: 5 | ||
+ | * Pronouns: 31 | ||
+ | * ProperNouns: 1 | ||
+ | * Adverbs: 11 | ||
+ | * Auxiliary: 1 | ||
+ | * Punctuaion: 22 | ||
+ | * All anonymous lexicons: 6 | ||
+ | |||
+ | =====Tests===== | ||
+ | * '''war.yaml''': Total passes: 57, Total fails: 14, Total: 71 | ||
+ | * '''commonWords.yaml''': Total passes 4, Total fails: 16, Total: 20 | ||
+ | |||
+ | =====Current Unknown Words===== | ||
+ | TOP UNKNOWN WORDS: | ||
+ | 15 ^nagkaada/*nagkaada$ | ||
+ | 12 ^hito/*hito$ | ||
+ | 11 ^uyon/*uyon$ | ||
+ | 11 ^linarang/*linarang$ | ||
+ | 10 ^may/*may$ | ||
+ | 10 ^langit/*langit$ | ||
+ | 9 ^klase/*klase$ | ||
+ | 9 ^hayop/*hayop$ | ||
+ | 8 ^katubigan/*katubigan$ | ||
+ | 7 ^tagsa/*tagsa$ | ||
+ | 7 ^ngatanan/*ngatanan$ | ||
+ | 7 ^kalamrag/*kalamrag$ | ||
+ | 7 ^basi/*basi$ | ||
+ | 6 ^nalupad/*nalupad$ | ||
+ | 6 ^liso/*liso$ | ||
+ | 6 ^iton/*iton$ | ||
+ | 6 ^haluag/*haluag$ | ||
+ | 6 ^espasyo/*espasyo$ | ||
+ | 6 ^bawbaw/*bawbaw$ | ||
+ | 6 ^aga/*aga$ | ||
+ | |||
+ | ====Notes==== | ||
+ | * Tests for verbalized nouns not implemented yet | ||
+ | |||
+ | ===Initial Generator Evaluation=== | ||
+ | Analyzer: | ||
+ | * Total passes: 57 | ||
+ | * Total fails: 14 | ||
+ | * Total tests: 71 | ||
+ | |||
+ | Generator: | ||
+ | * Total passes: 56 | ||
+ | * Total fails: 18 | ||
+ | * Total tests: 74 | ||
+ | |||
+ | * Currently, we have 4 rules in our twol file to handle verb conjugation. | ||
+ | |||
+ | ===Later Anaylser Evaluation=== | ||
+ | * Coverage: 1025 / 1494 (~0.68607764390896921017) | ||
+ | * Remaining unknown forms: 469 | ||
+ | |||
+ | ====Lexical Information==== | ||
+ | * Lexicons: 29 | ||
+ | * Lexicon entries: 525 | ||
+ | * Patterns: 19 | ||
+ | * Pattern entries: 19 | ||
+ | |||
+ | ====Counts for individual lexicons==== | ||
+ | * NounRoot: 194 | ||
+ | * Determiners: 13 | ||
+ | * PluralDet: 2 | ||
+ | * VerbPrefixes: 7 | ||
+ | * VerbStems: 94 | ||
+ | * Pronouns: 34 | ||
+ | * ProperNouns: 15 | ||
+ | * Adverbs: 11 | ||
+ | * Auxiliary: 1 | ||
+ | * Punctuation: 22 | ||
+ | * Num-Lex: 23 | ||
+ | * Conjunctions: 2 | ||
+ | * Adjectives: 61 | ||
+ | * Numeral: 10 | ||
+ | * Num-SecondLex: 23 | ||
+ | * All anonymous lexicons: 12 | ||
+ | |||
+ | =====Current Unknown Words===== | ||
+ | TOP UNKNOWN WORDS: | ||
+ | 15 ^nagkaada/*nagkaada$ | ||
+ | 12 ^hito/*hito$ | ||
+ | 11 ^linarang/*linarang$ | ||
+ | 10 ^may/*may$ | ||
+ | 8 ^katubigan/*katubigan$ | ||
+ | 7 ^tagsa/*tagsa$ | ||
+ | 7 ^ngatanan/*ngatanan$ | ||
+ | 7 ^man/*man$ | ||
+ | 7 ^kalamrag/*kalamrag$ | ||
+ | 7 ^basi/*basi$ | ||
+ | 6 ^nalupad/*nalupad$ | ||
+ | 6 ^liso/*liso$ | ||
+ | 6 ^haluag/*haluag$ | ||
+ | 6 ^espasyo/*espasyo$ | ||
+ | 6 ^bawbaw/*bawbaw$ | ||
+ | 5 ^kapawa/*kapawa$ | ||
+ | 5 ^ginlarang/*ginlarang$ | ||
+ | 5 ^Jehova/*Jehova$ | ||
+ | 4 ^tanom/*tanom$ | ||
+ | 4 ^nagkikiwa/*nagkikiwa$ | ||
+ | |||
+ | ====Notes==== | ||
+ | * Added more twol rules | ||
+ | * Added 215 more stems | ||
+ | |||
+ | [[Category:Sp21_FinalProjects]] [[Category:Waray]] |
Latest revision as of 03:28, 29 May 2021
Contents
Developed Resources
Expanding Our Morphological Transducer
Initial Anaylser Evaluation
- Coverage: 742 / 1239 (~0.59887005649717514124)
- Remaining Unknown Forms: 487
Lexical Information
- Lexicons: 17
- Lexicon entries: 120
- Patterns: 1
- Pattern entries: 9
Counts for individual lexicon
- NounRoot: 23
- Determiners: 13
- PluralDet: 2
- VerbPrefixes: 5
- VerbStems: 5
- Pronouns: 31
- ProperNouns: 1
- Adverbs: 11
- Auxiliary: 1
- Punctuaion: 22
- All anonymous lexicons: 6
Tests
- war.yaml: Total passes: 57, Total fails: 14, Total: 71
- commonWords.yaml: Total passes 4, Total fails: 16, Total: 20
Current Unknown Words
TOP UNKNOWN WORDS:
15 ^nagkaada/*nagkaada$ 12 ^hito/*hito$ 11 ^uyon/*uyon$ 11 ^linarang/*linarang$ 10 ^may/*may$ 10 ^langit/*langit$ 9 ^klase/*klase$ 9 ^hayop/*hayop$ 8 ^katubigan/*katubigan$ 7 ^tagsa/*tagsa$ 7 ^ngatanan/*ngatanan$ 7 ^kalamrag/*kalamrag$ 7 ^basi/*basi$ 6 ^nalupad/*nalupad$ 6 ^liso/*liso$ 6 ^iton/*iton$ 6 ^haluag/*haluag$ 6 ^espasyo/*espasyo$ 6 ^bawbaw/*bawbaw$ 6 ^aga/*aga$
Notes
- Tests for verbalized nouns not implemented yet
Initial Generator Evaluation
Analyzer:
- Total passes: 57
- Total fails: 14
- Total tests: 71
Generator:
- Total passes: 56
- Total fails: 18
- Total tests: 74
- Currently, we have 4 rules in our twol file to handle verb conjugation.
Later Anaylser Evaluation
- Coverage: 1025 / 1494 (~0.68607764390896921017)
- Remaining unknown forms: 469
Lexical Information
- Lexicons: 29
- Lexicon entries: 525
- Patterns: 19
- Pattern entries: 19
Counts for individual lexicons
- NounRoot: 194
- Determiners: 13
- PluralDet: 2
- VerbPrefixes: 7
- VerbStems: 94
- Pronouns: 34
- ProperNouns: 15
- Adverbs: 11
- Auxiliary: 1
- Punctuation: 22
- Num-Lex: 23
- Conjunctions: 2
- Adjectives: 61
- Numeral: 10
- Num-SecondLex: 23
- All anonymous lexicons: 12
Current Unknown Words
TOP UNKNOWN WORDS:
15 ^nagkaada/*nagkaada$ 12 ^hito/*hito$ 11 ^linarang/*linarang$ 10 ^may/*may$ 8 ^katubigan/*katubigan$ 7 ^tagsa/*tagsa$ 7 ^ngatanan/*ngatanan$ 7 ^man/*man$ 7 ^kalamrag/*kalamrag$ 7 ^basi/*basi$ 6 ^nalupad/*nalupad$ 6 ^liso/*liso$ 6 ^haluag/*haluag$ 6 ^espasyo/*espasyo$ 6 ^bawbaw/*bawbaw$ 5 ^kapawa/*kapawa$ 5 ^ginlarang/*ginlarang$ 5 ^Jehova/*Jehova$ 4 ^tanom/*tanom$ 4 ^nagkikiwa/*nagkikiwa$
Notes
- Added more twol rules
- Added 215 more stems