Difference between revisions of "Miskito/Final project"
(→Evaluation) |
(→Overview) |
||
Line 2: | Line 2: | ||
Over the course of the semester, we've worked on developing a rule based machine translation (RBMT) system for Miskito utilizing apertium resources. | Over the course of the semester, we've worked on developing a rule based machine translation (RBMT) system for Miskito utilizing apertium resources. | ||
− | ====Implementation==== | + | ====Implementation/Solution==== |
For our final project we decided to focus on our monolingual transducer and improve it as much as possible. Specifically, we directed our attention to the .lexd and .twol files to encode grammatical generalizations. We managed to implement a large range of morphological analysis for our language. | For our final project we decided to focus on our monolingual transducer and improve it as much as possible. Specifically, we directed our attention to the .lexd and .twol files to encode grammatical generalizations. We managed to implement a large range of morphological analysis for our language. | ||
Line 17: | Line 17: | ||
With just a combined set of around 230 nouns and verbs, we achieved an impressive amount of coverage detailed below. | With just a combined set of around 230 nouns and verbs, we achieved an impressive amount of coverage detailed below. | ||
− | ====Areas to | + | ====Problems Encountered==== |
+ | As Miskito has only ~140,000 native speakers, there were limited available resources to pull from in the creation of these language tools. We were able to make use of a couple of dictionaries and grammar workbooks. However, we found there were inconsistencies between these sources, which caused implementing some components of Miskito to be rather challenging. | ||
+ | |||
+ | ====Areas to Fix==== | ||
* Full coverage of possessive noun morphology | * Full coverage of possessive noun morphology | ||
* Adjective morphology | * Adjective morphology |
Revision as of 14:40, 20 May 2021
Contents
Overview
Over the course of the semester, we've worked on developing a rule based machine translation (RBMT) system for Miskito utilizing apertium resources.
Implementation/Solution
For our final project we decided to focus on our monolingual transducer and improve it as much as possible. Specifically, we directed our attention to the .lexd and .twol files to encode grammatical generalizations. We managed to implement a large range of morphological analysis for our language.
- Verb tense morphology
- Absolute Past
- Present
- Present Progressive
- Absolute Future
- Future
- Irregular Verb Morphology
- Negation in the present tense
- Possessive Noun morphology
With just a combined set of around 230 nouns and verbs, we achieved an impressive amount of coverage detailed below.
Problems Encountered
As Miskito has only ~140,000 native speakers, there were limited available resources to pull from in the creation of these language tools. We were able to make use of a couple of dictionaries and grammar workbooks. However, we found there were inconsistencies between these sources, which caused implementing some components of Miskito to be rather challenging.
Areas to Fix
- Full coverage of possessive noun morphology
- Adjective morphology
Developed Resources
- Corpus Repo
- Miskito Keyboard
- Grammar Documentation
- Miskito Transducer
- Disambiguation wiki
- Miskito and English Wiki Page
- Miskito to English Apertium Language Pair Repo
- Contrastive Grammars wiki
- Link to the code for our morphological analyzer
We expanded our morphological analyzer's capabilities by adding in
Evaluation
To evaluate our progress, we used the metric coverage-hfst
to record our overall corpus coverage. Below is our reported results.
Initial Corpus Coverage
- Coverage: 3873 / 7569 (~0.51169242964724534285)
- Remaining unknown forms: 3696
Final Corpus Coverage
- Coverage: 5987 / 7588 (~0.78900896151818661044)
- Remaining unknown forms: 1601