Difference between revisions of "Dhivehi and English"
From LING073
(→Final div → eng Evaluation) |
(→Additions) |
||
Line 80: | Line 80: | ||
== Final div → eng Evaluation == | == Final div → eng Evaluation == | ||
− | === Additions === | + | ==== Additions ==== |
* Bilingual dictionary: added 101 stems | * Bilingual dictionary: added 101 stems | ||
* Expanded Morphology: | * Expanded Morphology: |
Revision as of 01:10, 8 May 2019
Resources for Machine Translation between Dhivehi and English
Contents
External Resources
- Dhivehi-English GitHub
- Dhivehi GitHub
- English GitHub
- Dhivehi-English Parallel Corpus GitHub
- Lexical Selection
- Contrastive Grammar
You can do links to pages on the wiki like this: Lexical Selection —Jwashin1 (talk) 00:48, 9 April 2019 (EDT) (fixed)
Initial div → eng Evaluation
- Coverage of Dhivehi Transducer on
div.sentences.txt
:
- Number of tokenised words in the corpus: 627
- Coverage: 23.92%
- Coverage of Bilingual Transducer on
div.sentences.txt
:
- Number of tokenised words in the corpus: 630
- Coverage: 22.86%
Sentence | Intended English Translation | Lexical Transfer Output | Full translation Output |
---|---|---|---|
މަދްރަސާ ގެއަށްވުރެ މާ ބޮޑެވެ. | The school is much bigger than the house. | ^މަދްރަސާ<n><nhum><sg><def><dir>/school<n><sg><def><dir>$ ^ގެ<n><nhum><sg><def><dat>/house<n><sg><def><dat>$ ^ވުރެ<post>/compared to<post>$ ^މާ<adv>/much<adv>$ ^ބޮޑު<adj>/big<adj>$ ^އެވެ<mod>/<mod>$^.<sent>/.<sent>$ | #school #house #compared to much #big . |
މިފަދަ އިސްލާޙެއް ވާނީ ޒަމާނީ ޑިމޮކްރެސީގެ ރޫޙާއި ޚިލާފު އިސްލާޙެއް. | This type of reform would be contrary to the spirit of modern democracy. | ^މިފަދަ<det><dem><deg1>/this type of<det><dem><deg1>$ ^އިސްލާޙް<n><nhum><sg><ind><dir>/reform<n><sg><ind><dir>$ ^ވާނީ<v><tv><act><fut><pot>/be<v><tv><act><fut><pot>$ ^ޒަމާނީ<adj>/modern<adj>$ ^ޑިމޮކްރެސީ<n><nhum><sg><def><gen>/democracy<n><sg><def><gen>$ ^ރޫޙާއި<n><nhum><sg><def><soc>/spirit<n><sg><def><soc>$ ^ޚިލާފު<post>/against<post>$ ^އިސްލާޙް<n><nhum><sg><ind><dir>/reform<n><sg><ind><dir>$^.<sent>/.<sent>$ | #this type of #reform #be modern #democracy #spirit #against #reform. |
ކުރަން ޖެހޭ ހޭދަ އިތުރުވުން. | We hit spending surplus. | ^ކުރަ<v><tv><act><pres><p1>/do<v><tv><act><pres><p1>$ ^ޖެހޭ<v><tv><act><pprs>/hit<v><tv><act><pprs>$ ^ހޭދަ<adj>/spending<adj>$ ^އިތުރުވުން<n><nhum><sg><def><dir>/surplus<n><sg><def><dir>$^.<sent>/.<sent>$ | #do #hit #spending #surplus. |
ތަރައްގީ އަކީ އާއިލާގެ އާމްދަނީ އިތުރުވުން. | Improvement is family income increasing. | ^ތަރައްގީ<n><nhum><sg><def><dir>/improvement<n><sg><def><dir>$ ^އަކީ<mod>/is<mod>$ ^އާއިލާ<n><nhum><sg><def><gen>/family<n><sg><def><gen>$ ^އާމްދަނީ<n><nhum><sg><def><dir>/income<n><sg><def><dir>$ ^އިތުރުވުން<v><iv><act><pres><p3>/increase<v><iv><act><pres><p3>$^.<sent>/.<sent>$ | #improvement #is #family #income #increase. |
މުބާރާތް މިއަދު ނިމޭނެއެވެ. | The competition will end today. | ^މުބާރާތް<n><nhum><sg><def><dir>/competition<n><sg><def><dir>$ ^މިއަދު<adv>/today<adv>$ ^ނިމެ<v><iv><pass><fut><p3>/end<v><iv><pass><fut><p3>$ ^އެވެ<mod>/<mod>$^.<sent>/.<sent>$ | #competition today #end . |
މާދަމާ އަލީ ފާހަނަ ސާފު ކުރާނެ. | Tomorrow Ali will clean the bathroom. | ^މާދަމާ<adv>/tomorrow<adv>$ ^އަލީ<np>/Ali<np>$ ^ފާހަނަ<n><nhum><sg><def><dir>/bathroom<n><sg><def><dir>$ ^ސާފު<adj>/clean<adj>$ ^ކުރަ<v><tv><act><fut><p3>/do<v><tv><act><fut><p3>$^.<sent>/.<sent>$ | tomorrow #Ali #bathroom #clean #do. |
އަހަރެން ފެތެނީ! | I am sinking! | ^އަހަރެން<prn><pers><p1><sg><std><dir>/I<prn><pers><p1><sg><std><dir>$ ^ފެތެ<v><iv><pass><pprs>/sink<v><iv><pass><pprs>$^!<sent>/!<sent>$ | #I #sink! |
މީތި ކިހާވަރަކަހް؟ | How much is this? | ^މީތި<prn><dem><deg1><sg><dir>/this<prn><dem><deg1><sg><dir>/it<prn><dem><deg1><sg><dir>$ ^ކިހާވަރަކަހް<itg>/how much<itg>$^؟<sent>/؟<sent>$ | #this #how much#؟ |
އަހަރެން އެކަނި ދުކޮހް ލާ! | Leave me alone! | ^އަހަރެން<prn><pers><p1><sg><std><dir>/I<prn><pers><p1><sg><std><dir>$ ^އެކަނި<adj>/alone<adj>$ ^ދުކޮހް<v><tv><act><pres><p3><imp>/leave<v><tv><act><pres><p3><imp>$ ^ލާ<mod>/<mod>$^!<sent>/!<sent>$ | #I alone #leave ! |
ކޮބާ ފާހަނަ؟ | Where's the bathroom? | ^ކޮބާ<itg>/where<itg>$ ^ފާހަނަ<n><nhum><sg><def><dir>/bathroom<n><sg><def><dir>$^؟<sent>/؟<sent>$ | #where #bathroom#؟ |
Final div → eng Evaluation
Additions
- Bilingual dictionary: added 101 stems
- Expanded Morphology:
- Digits
- Digits (ordinal form)
- Numbers (citation form)
- Numbers (combining form)
- Demonstrative Determiners as prefix to nouns
- 2 structural transfer rules (number + noun and determiner + noun)
Coverage Over Large Corpus
- Number of tokenised words in the corpus: 692580
- Coverage: 28.52%
- Number of words in large corpus: 603356
- Number of Unique Entries in transducer: 320