Difference between revisions of "Chechen and English"
From LING073
(→For MT from che to eng) |
|||
(16 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Chechen Chechen] and English | Resources for machine translation between [https://wikis.swarthmore.edu/ling073/Chechen Chechen] and English | ||
− | ==Resources Developed For | + | ==Resources Developed For [https://wikis.swarthmore.edu/ling073/Chechen Chechen]== |
− | https://wikis.swarthmore.edu/ling073/Chechen | + | |
+ | |||
+ | {{comment|You can do links to pages on the wiki like this: [[Chechen]]. —[[User:Jwashin1|Jwashin1]] ([[User talk:Jwashin1|talk]]) 00:49, 9 April 2019 (EDT)}} | ||
==External resources== | ==External resources== | ||
Line 7: | Line 9: | ||
*Repository for Chechen and English language pair: https://github.swarthmore.edu/Ling073-sp19/ling073-che-eng.git | *Repository for Chechen and English language pair: https://github.swarthmore.edu/Ling073-sp19/ling073-che-eng.git | ||
*Repository for corpus: https://github.swarthmore.edu/Ling073-sp19/ling073-che-eng-corpus.git | *Repository for corpus: https://github.swarthmore.edu/Ling073-sp19/ling073-che-eng-corpus.git | ||
+ | |||
+ | ==che → eng Evaluation== | ||
+ | ===Coverage of Chechen Transducer=== | ||
+ | *Words in corpus: 1874 | ||
+ | *Unknown words: 1306 | ||
+ | *Known words: 568 | ||
+ | *Percentage: ~30.31% | ||
+ | ===Coverage of che-eng Transducer=== | ||
+ | *Words in corpus: 1874 | ||
+ | *Unknown words: 1306 | ||
+ | *Known words: 568 | ||
+ | *Percentage: ~30.31% | ||
+ | |||
+ | ==Ten Sentences Analyzed== | ||
+ | 1.Ас Хьасанна буьрка яьлла. | ||
+ | :*I gave Hasan a ball. | ||
+ | :*<code>^Ас<prn><pers><p1><erg><sg>/Prpers<prn><pers><p1><erg><sg>$ ^Хьасан<np><ant><dat><sg>/Hasan<np><dat><sg>$ ^буьрка<n><cl_j><abs><sg>/ball<n><abs><sg>$ ^яьлла<v><tv><pres><pf>/give<vblex><tv><pres><pf>$^.<sent>/.<sent>$^.<sent>/.<sent>$</code> | ||
+ | :*<code>#Prpers #Hasan #ball #give.</code> | ||
+ | 2.иза сан ваша ву. | ||
+ | :*He is my brother. | ||
+ | :*<code>^иза<prn><pers><p3><abs><sg>/prpers<prn><pers><p3><abs><sg>$ ^сан<prn><pers><p1><gen><sg>/prpers<det><pos><pers><p1><gen><sg>$ ^ваша<n><cl_v><abs><sg>/brother<n><abs><sg>$ ^ву<v><cpl><cl_b>/be<vbser><cl_b>$^.<sent>/.<sent>$^.<sent>/.<sent>$</code> | ||
+ | :*<code>#prpers #prpers #brother #be.</code> | ||
+ | 3.сан цӀе Иван ю. | ||
+ | :*My name is Ivan. | ||
+ | :*<code>^сан<prn><pers><p1><gen><sg>/prpers<det><pos><pers><p1><gen><sg>$ ^цӀе<n><cl_j><abs><sg>/name<n><abs><sg>$ ^Иван<np><ant><abs><sg>/Ivan<np><abs><sg>$ ^ю<v><cpl><cl_j>/be<vbser><cl_j>$^.<sent>/.<sent>$^.<sent>/.<sent>$</code> | ||
+ | :*<code>#prpers #name #Ivan #be.</code> | ||
+ | 4.Со базара воьду. | ||
+ | :*I go to the bazaar. | ||
+ | :*<code>^Со<prn><pers><p1><abs><sg>/Prpers<prn><pers><p1><abs><sg>$ ^базар<n><cl_j><all><sg>/bazaar<n><all><sg>$ ^воьда<v><iv><pres>/go<vblex><pres>$^.<sent>/.<sent>$^.<sent>/.<sent>$</code> | ||
+ | :*<code>#Prpers #bazaar go.</code> | ||
+ | 5.со бепиг деш ву. | ||
+ | :*I am making bread. | ||
+ | :*<code>^со<prn><pers><p1><abs><sg>/prpers<prn><pers><p1><abs><sg>$ ^бепиг<n><cl_j><abs><sg>/bread<n><abs><sg>$ ^деш<v><tv><prog><ptcp>/make<vblex><tv><prog><ptcp>$ ^ву<v><cpl><cl_b>/be<vbser><cl_b>$^.<sent>/.<sent>$^.<sent>/.<sent>$</code> | ||
+ | :*<code>#prpers #bread #make #be.</code> | ||
+ | 6.Ас бепиг дина. | ||
+ | :*I made bread. | ||
+ | :*<code>^Ас<prn><pers><p1><erg><sg>/Prpers<prn><pers><p1><erg><sg>$ ^бепиг<n><cl_j><abs><sg>/bread<n><abs><sg>$ ^дина<v><tv><pres><pf>/make<vblex><tv><pres><pf>$^.<sent>/.<sent>$^.<sent>/.<sent>$</code> | ||
+ | :*<code>#Prpers #bread #make.</code> | ||
+ | 7.Ахьмада машина эцна. | ||
+ | :*Ahwmad bought a car. | ||
+ | :*<code>^Ахьмад<np><ant><erg><sg>/Ahwmad<np><erg><sg>$ ^машина<n><cl_u><abs><sg>/car<n><abs><sg>$ ^эца<v><tv><pres><pf>/buy<vblex><tv><pres><pf>$^.<sent>/.<sent>$^.<sent>/.<sent>$</code> | ||
+ | :*<code>#Ahwmad #car #buy.</code> | ||
+ | 8.нанас бепиг дан деза. | ||
+ | :*Mother has to make bread. | ||
+ | :*<code>^нанас<n><cl_j><erg><sg>/mother<n><erg><sg>$ ^бепиг<n><cl_j><abs><sg>/bread<n><abs><sg>$ ^дан<v><tv><inf>/do<vbdo><tv><inf>$ ^деза<v><pres>/must<vbmod><pres>$^.<sent>/.<sent>$^.<sent>/.<sent>$</code> | ||
+ | :*<code>#mother #bread #do must.</code> | ||
+ | 9.Ахьмадан машина керла ю. | ||
+ | :*Ahwmad's car is new. | ||
+ | :*<code>^Ахьмад<np><ant><gen><sg>/Ahwmad<np><gen><sg>$ ^машина<n><cl_u><abs><sg>/car<n><abs><sg>$ ^керла<adj>/new<adj>$ ^ю<v><cpl><cl_j>/be<vbser><cl_j>$^.<sent>/.<sent>$^.<sent>/.<sent>$</code> | ||
+ | :*<code>#Ahwmad #car #new #be.</code> | ||
+ | 10.Аслан школе воьду. | ||
+ | :*Aslan goes to school. | ||
+ | :*<code>^Аслан<np><ant><abs><sg>/Aslan<np><abs><sg>$ ^школа<n><cl_j><all><sg>/school<n><all><sg>$ ^воьда<v><iv><pres>/go<vblex><pres>$^.<sent>/.<sent>$^.<sent>/.<sent>$</code> | ||
+ | :*<code>#Aslan #school go.</code> | ||
+ | |||
+ | ==[https://wikis.swarthmore.edu/ling073/Chechen_and_English/Lexical_selection Lexical Selection]== | ||
+ | |||
+ | |||
+ | {{comment|You should make this a wiki link too. —[[User:Jwashin1|Jwashin1]] ([[User talk:Jwashin1|talk]]) 00:50, 9 April 2019 (EDT)}} | ||
+ | |||
+ | ==Addition== | ||
+ | *128 words | ||
+ | *12 elements | ||
+ | *3 transfer rules | ||
+ | |||
+ | ==Final Evaluation== | ||
+ | ===For Chechen Transducer=== | ||
+ | *Precision and recall against the annotated.basic corpus: | ||
+ | :*Precision: 76.85714% | ||
+ | :*Recall: 54.78615% | ||
+ | *Coverage over the large corpus: 40.829% | ||
+ | *The number of words in the large corpus: 19017110 | ||
+ | *The number of stems in the transducer: 401 | ||
+ | |||
+ | ===For MT from che to eng=== | ||
+ | *WER: 100% | ||
+ | *PER: 100% | ||
+ | *Trimmed coverage in large corpus: 21.20% | ||
+ | *Trimmer coverage in longer corpus: 39.00% | ||
+ | *number of tokens in large corpus: 19017110 | ||
+ | *number of tokens in longer corpus: 1831 | ||
+ | |||
[[Category:Sp19_TranslationPairs]] | [[Category:Sp19_TranslationPairs]] | ||
[[Category:Chechen]] | [[Category:Chechen]] | ||
[[Category:English]] | [[Category:English]] |
Latest revision as of 21:53, 12 May 2019
Resources for machine translation between Chechen and English
Contents
Resources Developed For Chechen
You can do links to pages on the wiki like this: Chechen. —Jwashin1 (talk) 00:49, 9 April 2019 (EDT)
External resources
- Repository for Chechen transducer: https://github.swarthmore.edu/Ling073-sp19/ling073-che.git
- Repository for Chechen and English language pair: https://github.swarthmore.edu/Ling073-sp19/ling073-che-eng.git
- Repository for corpus: https://github.swarthmore.edu/Ling073-sp19/ling073-che-eng-corpus.git
che → eng Evaluation
Coverage of Chechen Transducer
- Words in corpus: 1874
- Unknown words: 1306
- Known words: 568
- Percentage: ~30.31%
Coverage of che-eng Transducer
- Words in corpus: 1874
- Unknown words: 1306
- Known words: 568
- Percentage: ~30.31%
Ten Sentences Analyzed
1.Ас Хьасанна буьрка яьлла.
- I gave Hasan a ball.
^Ас<prn><pers><p1><erg><sg>/Prpers<prn><pers><p1><erg><sg>$ ^Хьасан<np><ant><dat><sg>/Hasan<np><dat><sg>$ ^буьрка<n><cl_j><abs><sg>/ball<n><abs><sg>$ ^яьлла<v><tv><pres><pf>/give<vblex><tv><pres><pf>$^.<sent>/.<sent>$^.<sent>/.<sent>$
#Prpers #Hasan #ball #give.
2.иза сан ваша ву.
- He is my brother.
^иза<prn><pers><p3><abs><sg>/prpers<prn><pers><p3><abs><sg>$ ^сан<prn><pers><p1><gen><sg>/prpers<det><pos><pers><p1><gen><sg>$ ^ваша<n><cl_v><abs><sg>/brother<n><abs><sg>$ ^ву<v><cpl><cl_b>/be<vbser><cl_b>$^.<sent>/.<sent>$^.<sent>/.<sent>$
#prpers #prpers #brother #be.
3.сан цӀе Иван ю.
- My name is Ivan.
^сан<prn><pers><p1><gen><sg>/prpers<det><pos><pers><p1><gen><sg>$ ^цӀе<n><cl_j><abs><sg>/name<n><abs><sg>$ ^Иван<np><ant><abs><sg>/Ivan<np><abs><sg>$ ^ю<v><cpl><cl_j>/be<vbser><cl_j>$^.<sent>/.<sent>$^.<sent>/.<sent>$
#prpers #name #Ivan #be.
4.Со базара воьду.
- I go to the bazaar.
^Со<prn><pers><p1><abs><sg>/Prpers<prn><pers><p1><abs><sg>$ ^базар<n><cl_j><all><sg>/bazaar<n><all><sg>$ ^воьда<v><iv><pres>/go<vblex><pres>$^.<sent>/.<sent>$^.<sent>/.<sent>$
#Prpers #bazaar go.
5.со бепиг деш ву.
- I am making bread.
^со<prn><pers><p1><abs><sg>/prpers<prn><pers><p1><abs><sg>$ ^бепиг<n><cl_j><abs><sg>/bread<n><abs><sg>$ ^деш<v><tv><prog><ptcp>/make<vblex><tv><prog><ptcp>$ ^ву<v><cpl><cl_b>/be<vbser><cl_b>$^.<sent>/.<sent>$^.<sent>/.<sent>$
#prpers #bread #make #be.
6.Ас бепиг дина.
- I made bread.
^Ас<prn><pers><p1><erg><sg>/Prpers<prn><pers><p1><erg><sg>$ ^бепиг<n><cl_j><abs><sg>/bread<n><abs><sg>$ ^дина<v><tv><pres><pf>/make<vblex><tv><pres><pf>$^.<sent>/.<sent>$^.<sent>/.<sent>$
#Prpers #bread #make.
7.Ахьмада машина эцна.
- Ahwmad bought a car.
^Ахьмад<np><ant><erg><sg>/Ahwmad<np><erg><sg>$ ^машина<n><cl_u><abs><sg>/car<n><abs><sg>$ ^эца<v><tv><pres><pf>/buy<vblex><tv><pres><pf>$^.<sent>/.<sent>$^.<sent>/.<sent>$
#Ahwmad #car #buy.
8.нанас бепиг дан деза.
- Mother has to make bread.
^нанас<n><cl_j><erg><sg>/mother<n><erg><sg>$ ^бепиг<n><cl_j><abs><sg>/bread<n><abs><sg>$ ^дан<v><tv><inf>/do<vbdo><tv><inf>$ ^деза<v><pres>/must<vbmod><pres>$^.<sent>/.<sent>$^.<sent>/.<sent>$
#mother #bread #do must.
9.Ахьмадан машина керла ю.
- Ahwmad's car is new.
^Ахьмад<np><ant><gen><sg>/Ahwmad<np><gen><sg>$ ^машина<n><cl_u><abs><sg>/car<n><abs><sg>$ ^керла<adj>/new<adj>$ ^ю<v><cpl><cl_j>/be<vbser><cl_j>$^.<sent>/.<sent>$^.<sent>/.<sent>$
#Ahwmad #car #new #be.
10.Аслан школе воьду.
- Aslan goes to school.
^Аслан<np><ant><abs><sg>/Aslan<np><abs><sg>$ ^школа<n><cl_j><all><sg>/school<n><all><sg>$ ^воьда<v><iv><pres>/go<vblex><pres>$^.<sent>/.<sent>$^.<sent>/.<sent>$
#Aslan #school go.
Lexical Selection
Addition
- 128 words
- 12 elements
- 3 transfer rules
Final Evaluation
For Chechen Transducer
- Precision and recall against the annotated.basic corpus:
- Precision: 76.85714%
- Recall: 54.78615%
- Coverage over the large corpus: 40.829%
- The number of words in the large corpus: 19017110
- The number of stems in the transducer: 401
For MT from che to eng
- WER: 100%
- PER: 100%
- Trimmed coverage in large corpus: 21.20%
- Trimmer coverage in longer corpus: 39.00%
- number of tokens in large corpus: 19017110
- number of tokens in longer corpus: 1831