Machine translation
Help us develop tools for translating Wikipedia. | ||
The purpose of the Wikipedia Machine Translation Project is to develop ideas, methods and tools that can help translate Wikipedia articles from one language to another (particularly out of English and into languages with small numbers of fluent speakers). Motivation
TradWiki/WikiTran
LicenseAll code and data should be released under a free licence GFDL Advantages
TradWiki/WikiTran - Translation memory approach
Lexical, syntactic and semantic analysis of wikipedia content
Information about the most popular sentences and expressions can be used to create a translation database of such expressions so translators don't need to repeat a translation.
Resources:
Links
References: DiscussionTranslate or Write from scratch?Are you sure the other language wikipedias would rather translate text than write it themselves? It seems to me that it's almost more effort to translate text than to write an article yourself. For instance, I run the Esperanto wikipedia or eo: and I think we appreciate the international nature of our articles. I was wondering if other second-language wikipedias would feel the same way. I mean do the other language wikipedias want it?
T14N, I18N, L10N
It's true that the effort to write articles is almost the same as the effort to translate. But there are some exceptions. If you are not an expert in the topic, it's easier to translate than to write. On the other hand, it's clear to me that the number of contributers to the portuguese encyclopedia is very small. You must consider the fact that portuguese is not a second-language, but the first language of milions of people. Unfortunaly very few of those millions have access to the internet and/or have an education. A free encyclopedia would be an extraordinary resource for tose people, so every effort to speed the creation of portugurese version is welcomed. Of course, people that write to a second language wikipedia like Vikipedio, have diferent purposes, do it for fun and are not interested in Machine Translation. PS-You may be interested in knowing that the Traduki project uses esperanto for the deeper word representation to achieve machine translation. user:joao. Point well made. It would be especially good for the minority languages. I was aware of the Traduki project and it looks interesting although it looks like nothing has happened on the project lately... maybe I'm wrong. I actually looked at the pages again yesterday. ...and since I'm going to start learning Portuguese soon (I plan to visit Brazil next August), I'll probably take a closer look at it later. I now know Sim, N~ao and Obrigado. :) Make Auto-translations available?Now that I think more about it I'd like to see auto-translation so I can get rough translations of encyclopedias in non-English, non-Esperanto language wikipedias. Seems like in a future version we could have a drop down list on each page that could translate a page for us and also give a link to the article on another language wikipedia if it exists. I'm think there's already free services that do this, does anyone know? --Chuck Smith There are some links to such services above under "Free translations on the web" Joao Has anyone seen Google Translate at http://translate.google.com/translate_t ? Would a automatical translation script be run only once for each article, multiple times at an interval, immediately when changes are made or immediately on demand by a reader? If only once or by an interval, how would article conficts be handled? 24.198.63.192 03:52 Oct 18, 2002 (UTC) Machine translation can give the best of both worlds:
...or Not
Yeah, I'm not so big on the idea anymore either. I do think it's interesting as an extremely long term project, though. I've noticed that machine translation is adequate for getting the general meaning across but isn't very pleasing to the eye. If it has to be used, it'd probably best be used to populate a blank page so that native speakers of the language can clean it up in normal wikiwiki style. -- Daniel Thomas Have any of you heard of Knowledge Based Translation Systems. This is a hibrid of Machine Translation, Translation Memories and Human Translation. It is offered by a company called SDL and reduces the translation process massively but gives a quality that is indistinguishable from full human translation. It is generally more appropriate for technical writing as opposed to 'flowery' marketing text and literary works. SDL [1] (there the people behind www.FreeTranslation.com [2]) Mi nur bedauwras ke la tuta diskuto estas nur en la angla kaj ke la diskutantoj deiras de la punkto kvazaux la anglalingva vikipedio estus la cxefa kulturfonto. Ja valorus traduki artikolojn jam ekzistantajn sed en cxiujn direktojn (ne nepre nur de la angla). Kaj mi gxis nun spertis ke la auxtomataj tradukiloj donas acxegajn rezultojn. Arno Lagrange Fixing Auto-translatorsWhen I use automated translation, I usually observe two problems:
Both seems to be caused by ambiguities. So, my idea is:
This would require for each language to add two additional wikis to the 'presentable' version: one for disambingued texts, and one as a collection pool for raw translations. Sloyment 12:47, 22 Oct 2003 (UTC) Some examples how the above procedure could work:
The assumption behind this idea is that it would be easier to disambingue a text than to translate it, and that it is easier to correct an automated translation that has only few mistakes in it, than to correct the rubbish that current translation programs produce. Sloyment 14:59, 22 Oct 2003 (UTC) There are other problems. Some languages may not have words or phrases for certain technical concepts because no native speaker has ever needed them before. This is particularly true of languages with small numbers of native speakers in rural settings. It may be difficult to automatically translate an article on co-routines, for instance, because ideas like subroutine, co-routine, time-sharing and multi-tasking have never been put into words in that particular language before. A human translator can normally use a bit of imagination to invent a new term or reuse a term previously used for an analogous existing concept and if the translator is any good, the result will fit into the language reasonably well. However a machine can do little better than to leave the untranslatable term untranslated and mark it for human attention. -- Derek Ross 16:05, 26 Mar 2004 (UTC) Other wonderingThree main things I'm wondering about.
So essentially, if I knew any programming language other than HTML (hey, I'm only 14, though I am going to begin taking CC courses in C or some crap like that over the summer) and I were to make MT software, it would incorporate all 3 of these. I think that a lot of the programming behind neural networks is availible for free online to plug into whatever you want, so that (afaik) wouldn't be very hard, except maybe the customization part. UNL, at its best, claims a 99% accuracy rate. I have seen UNL at work. The English deconversions are fantastic, though they do leave something to be desired. As far as I can tell from what others have told me, though, the deconversions for languages such as Russian and Italian are - though one can get what they say - totally ungrammatical.--Node_ue 03:11, 7 Apr 2004 (UTC)
|
So What ?
Scuse me folks, I've just read the whole page for the first time and felt like adding my little contribution : I'm french and have written some articles for the French Wikipedia, but since I could'nt help thinking about this translation question I decided to stop writing and to go to the English Wikipedia in search of some kind of an answer (since you english-talking guys are a lot more numerous than us.)... So I came here and ... Gee, what a mess ! can't anyone here try to clean a little this page ? (I didn't dare to do it myself) This isn't a forum ; think a little of all these people like me who come here coz they're seeking for some ideas or explanations -and I don't mean you haven't any ideas, this long discussing is quite full of propositions and ideas, but could someone make it simpler, shorter and more understandable ? Instead of filling the whole page with -interesting- conversations, couldn't it be better to edit in a single part (as a list for instance)the latest propositions and ways of proceed to solve the problem of translation, with their advantages and disadvantages, to the international stupid ones like me who can't understand such impressive conferences ? Thanks everyone for your attention ; (I hope this contribution will be deleted soon with the recasting of the page.)
A french friend of Wikipedia, user:persivre