Machine translation: Difference between revisions
mNo edit summary |
good point |
||
Line 97: | Line 97: | ||
Are you sure the other language wikipedias would rather translate text than write it themselves? It seems to me that it's almost more effort to translate text than to write an article yourself. For instance, I run the [http://eo.wikipedia.com/ Esperanto wikipedia] and I think we appreciate the international nature of our articles. I was wondering if other second-language wikipedias would feel the same way. I mean do the other language wikipedias want it? | Are you sure the other language wikipedias would rather translate text than write it themselves? It seems to me that it's almost more effort to translate text than to write an article yourself. For instance, I run the [http://eo.wikipedia.com/ Esperanto wikipedia] and I think we appreciate the international nature of our articles. I was wondering if other second-language wikipedias would feel the same way. I mean do the other language wikipedias want it? | ||
--[[ | --[[Chuck Smith]] | ||
---- | ---- | ||
It's true that the effort to write articles is almost the same as the effort to translate. But there are some exceptions. If you are not an expert in the topic, it's easier to translate than to write. On the other hand, it's clear to me that the number of contributers to the portuguese encyclopedia is very small. You must consider the fact that portuguese is not a second-language, but the first language of milions of people. Unfortunaly very few of those millions have access to the internet and/or have an education. A free encyclopedia would be an extraordinary resource for tose people, so everuy effort to speed the creation of portugurese version is welcomed. Of course, people that write to a second language wikipedia like Vikipedio, have diferent purposes, do it for fun and are not interested in | It's true that the effort to write articles is almost the same as the effort to translate. But there are some exceptions. If you are not an expert in the topic, it's easier to translate than to write. On the other hand, it's clear to me that the number of contributers to the portuguese encyclopedia is very small. You must consider the fact that portuguese is not a second-language, but the first language of milions of people. Unfortunaly very few of those millions have access to the internet and/or have an education. A free encyclopedia would be an extraordinary resource for tose people, so everuy effort to speed the creation of portugurese version is welcomed. Of course, people that write to a second language wikipedia like Vikipedio, have diferent purposes, do it for fun and are not interested in Machine Translation. | ||
PS-You may be interested in knowing that the Traduki project uses esperanto for the deeper word representation to achieve machine translation. [[user:joao]] | PS-You may be interested in knowing that the Traduki project uses esperanto for the deeper word representation to achieve machine translation. [[user:joao]] | ||
---- | |||
Point well made. It would be especially good for the minority languages. I was aware of the Traduki project and it looks interesting although it looks like nothing has happened on the project lately... maybe I'm wrong. I actually looked at the pages again yesterday. ...and since I'm going to start learning Portuguese soon (I plan to visit Brazil next August), I'll probably take a closer look at it later. I now know Sim, N~ao and Obrigado. :) | |||
--[[Chuck Smith]] |
Revision as of 15:33, 8 January 2002
The purpose of the Wikipedia Machine Translation Project is to develop ideas, methods and tools that can help translate Wikipedia to non-english languages.
Motivation
Small languages can't produce articles as fast as english wikipedia because the number of wikipedians is too low. The solution for this problem is the translation of english wikipedia. But, some languages will not have enough translators. Machine Translation can improve the productivity of the community.
TradWiki/WikiTran
TradWiki/WikiTran (WikipediaTranslator/WikiTranslator/BabelWiki) is a to be coded wiki that helps wikipedians to translate articles from english to other languages.
- I rather like WikiTran myself. --Stephen Gilbert
License
All code and data should be released under a free licence
Advantages
- faster translation of wikipedia
- generation of large amounts of useful data (corpora).
- creation of an useful tool
Lexical, syntactic and semantic analysis of wikipedia content
The first step for wikipedia translation is the analysis of wikipedia's content. This analysis will determine:
- Number of words and sentences
- Words distribution
- Frequency of the most popular sentences and expressions
- Semantic relations between words and between sentences
- Syntactic analysis of all sentences
Information about the most popular sentences and expressions can be used to create a translation database of such expressions so translators don't need to repeat a translation.
Resources:
- Dictionaries
- Dutch to English Translation Tools (source available)
- English dictionary
- Portuguese dictionary
- English-Portuguese dictionary
- Ergane (free dictionary, several languages)
- Translation rules
- Code
- GPLTran (Translator under GPL)
- http://www.translator.cx
- Supposed to translate paragraphs or entire webpapges
- Paragraph translation is spotty and buggy
- Web translation doesn't seem to work at all.
- Download code at http://www.translator.cx/dist/
- http://www.translator.cx
- Traduki
- Python-based project, uses Esperanto as a metalanguage
- Website hasn't been updated in while
- http://traduki.sourceforge.net (version 0.2 released, and translates "The dog eats the apple" to Esperanto: "La hundo mangxas la pomon")
- http://www.link.cs.cmu.edu/link/ -- Link Grammar
- GPLTran (Translator under GPL)
- Databases
- http://www.cogsci.princeton.edu/~wn/links/ -- WordNet, a lexical database for the English language.
TradWiki/WikiTran - Translation memory aproach
A Translation Memory is a computer program that uses a database of old translations to help a human translator. If this aproach is followed, WikipediaTranslator will need the following features:
- visualization of translated and original versions
- split of original versions on several parts for individual translation
Links
- general
- Links on Machine Translation (MT): http://www.ife.dk/url-mt.htm
- Machine translation (MT), and the future of the translation industry http://accurapid.com/journal/15mt.htm
- Machine Translation: an Introductory Guide: http://clwww.essex.ac.uk/MTbook/
- Visual Interactive Syntax Learning: http://visl.sdu.dk/visl/
- wikipedia articles
- Free translations on the web
- Neural nets
- Machine translation
- Translations memories
- wired magazine
- Portuguese
- Processamento Computacional do Português http://www.portugues.mct.pt/index.html
- Meta-language
- http://www.unl.ias.unu.edu A United Nations project based on an artificial, machine-readable language (UNL). The idea is to semi-automatically create a UNL text from, say, English, then have it fully-automatically translated in up to 150 languages on-the-fly.
- The World Wide Translator (The Tragedy of the Anticommons of translations memories)
Are you sure the other language wikipedias would rather translate text than write it themselves? It seems to me that it's almost more effort to translate text than to write an article yourself. For instance, I run the Esperanto wikipedia and I think we appreciate the international nature of our articles. I was wondering if other second-language wikipedias would feel the same way. I mean do the other language wikipedias want it?
It's true that the effort to write articles is almost the same as the effort to translate. But there are some exceptions. If you are not an expert in the topic, it's easier to translate than to write. On the other hand, it's clear to me that the number of contributers to the portuguese encyclopedia is very small. You must consider the fact that portuguese is not a second-language, but the first language of milions of people. Unfortunaly very few of those millions have access to the internet and/or have an education. A free encyclopedia would be an extraordinary resource for tose people, so everuy effort to speed the creation of portugurese version is welcomed. Of course, people that write to a second language wikipedia like Vikipedio, have diferent purposes, do it for fun and are not interested in Machine Translation.
PS-You may be interested in knowing that the Traduki project uses esperanto for the deeper word representation to achieve machine translation. user:joao
Point well made. It would be especially good for the minority languages. I was aware of the Traduki project and it looks interesting although it looks like nothing has happened on the project lately... maybe I'm wrong. I actually looked at the pages again yesterday. ...and since I'm going to start learning Portuguese soon (I plan to visit Brazil next August), I'll probably take a closer look at it later. I now know Sim, N~ao and Obrigado. :)