Brief overview of the various types of machine translation, the benefits of using a machine translation solution; includes translation samples and resources.
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
Machine Translation
1. solutions
Dispelling the myths of
machine translation
It is not surprising that myths, half-truths, and misunderstandings abound regarding machine translation: It seems
as if the experience most players in the translation field have with this technology does not go beyond toying a little
with one of the free online translation tools. Almost every week, I come across an article informing its readers either
that machine translation is and always will be a complete waste of time or that machine translation, while being
a waste of time today, might actually be useful some time in the distant future. In the hope of setting the record
straight, here is a closer look at some of the most common myths about machine translation.
Photo: Vasiliy Koval
22 AUGUST 2008
#5801_tcworld_04-08.indd 22 20.06.2008 14:17:19 Uhr
2. solutions
By Uwe Muegge pre-translated in a machine
translation system.1
• While most translations will
Myth: Machine require some editing and many
even rewriting, it is fair to expect
translation simply does that a considerable percentage
not work of machine-generated trans-
lations turn out to be perfect
With free online translation services available all (this is especially true for short
over the web, anyone can run a text through a instructions, headings, legends,
machine translation (MT) engine and then share and the like).
the results with the public as proof of the fact • ?At a minimum, key terms will
that machine translation is capable of little more be translated correctly and
than the most rudimentary rough translations consistently. And not only that,
(gisting), and, of course, providing nearly endless in most cases these terms will
entertainment. also be inflected correctly and
The main problem with these ‘tests’ is that using appear in the correct singular or
any of the free online translation environments plural form (try to do that with
gives only a glimpse of the true power of a full- your translation memory!)
fledged professional machine translation system. Example of a German>English machine translation from the author’s
For example, the typical online translation service Fact: Machine translation website www.muegge.cc
does not allow users to select a subject field or enables the translation of
provide user terminology, let alone set stylistic material that would otherwise
preferences. In fact, many - if not most - of the not be translated
free text translation tools support no translation Very few organizations, if any,
parameters other than the specification of the currently translate all materials that
language pair and the source text. No wonder would benefit from translation into
that the translations these machine translation all the languages spoken by all of
websites produce can be so ridiculously off target. their current or future customers.
The primary reason for this is that
Fact: Machine translation improves the for many types of documents, German search page for the Microsoft Knowledge Base with machine
productivity and consistency of human especially in the after-sales domain, translation option enabled
translators the budget is simply not available
Whenever new source text for a project is created, for large-scale human translation.
that text will have to be translated at some point. A number of organizations are using machine guage is as widely-held as it is wrong. All popular
Even when you work in what is considered a translation solutions for making large volumes machine translation systems, including the free
state-of-the-art globalization environment, i.e. of text available to their global customers in online translation services such as systransoft.
an integrated content management/translation their local language without involving any com, translate.google.com, and windowsli-
workflow system, you will end up with a certain human translators in the process. The Microsoft vetranslator.com employ highly sophisticated
percentage of low match/no match sentences. Knowledge Base, which contains more than algorithms that are the result of years of research
In a well-planned and well-managed globalizati- 200000 documents in English, is a well-known and development.
on project where writers, as well as software de- example of a text repository where the number
velopers, use a comprehensive project glossary, of machine-translated documents by far exceeds Fact: There is not one but many very different
as well as a style guide aimed at easy readability/ the number of those translated by humans. machine translation technologies that are all
comprehensibility, the low/no match sentences capable of producing excellent translation
Myth: Machine
can be pre-translated in a machine translation results in the right environment
system before being edited by human translators. Machine translation has been around for more
translation systems can
Benefits of machine-generated pre-translation: than 50 years, and during this half century a wide
only handle word-for-
• Translators always have a proposal to work range of MT technologies have evolved, e.g.
word translation
with instead of starting each new translation dictionary-based, rules-based, example-based,
from scratch. A representative case study statistical - plus countless hybrid forms. Here is a
recently conducted at Symantec indicates The belief that machine translation is basically brief discussion of the three machine translation
that the productivity of human translators limited to the sequential substitution of words in technologies that are most relevant for commer-
can double when unknown sentences are the source language with words in the target lan- cial applications today.
23
AUGUST 2008
#5801_tcworld_04-08.indd 23 20.06.2008 14:17:22 Uhr
3. solutions
Rules-based Machine Translation translation packages are available for dozens of some of the rules-based systems, this MT techno-
Rules-based machine translation, also known language combinations, many languages are still logy is primarily used by government agencies
as transfer machine translation, is the dominant not covered. – the intelligence community in particular – and
MT paradigm today. Systran, Babelfish, promt, large corporations.
to name just a few, are all rules-based systems. Statistical Machine Translation
Rules-based MT systems use a three-stage trans- Statistical machine translation (SMT) is getting a Direct Machine Translation
lation process: lot of media attention these days, especially after In its most primitive form, the only thing a direct
1. Analysis: Parses the source sentence to create Microsoft announced that it is using a proprietary machine translation system does is to replace
a tree of the syntactic structure of that sen- SMT system to translate its huge Knowledge Base the words in the source language with words in
document repository2 and Google won a large-
tence. the target language – in the same sequence and
2. Transfer: Converts the syntactic tree for the scale machine translation evaluation contest without any linguistic analysis or processing. The
with its statistical machine translation engine.3
source language into the corresponding tree only resource direct machine translation uses
for the target language. Statistical machine translation systems typically is a bilingual dictionary, which is why this MT
3. Generation: Populates the target tree with consist of two major components: technology is also known as dictionary-driven
corresponding words to create a sentence in • Translation Model: Generates translation machine translation.
the target language. proposals based on corresponding word se- Due to this rather unsophisticated technology,
Benefits of rules-based machine translation quences in aligned source and target training direct machine translation has been considered
include: data. obsolete for many years, and there are hardly any
• Mature, proven technology that can be imple- • Language Model: Selects the best translation commercial products available that use direct MT.
mented quickly and at relatively low cost. proposal based on training data in the target Despite its limited capabilities, I strongly believe
• Many commercial systems available covering language only. that direct machine translation still has a place
many language combinations. The good news about statistical machine in today’s arsenal of automated translation tools.
• Highly customizable through dictionary and translation is that once an SMT system has been For a number of common real-world applications,
style settings (some systems also support the trained on customer-specific data, this is the MT word-for-word or phrase-for-phrase substitution
customization of the rules base). technology that typically produces the highest is all that is required for successful translation.
Rules-based machine translation systems translation quality. On the flip side, that training Think of domains where both vocabulary and
have been in use in commercial settings for effort requires a substantial body of existing syntax are standardized, as is the case with
many years, e.g. at Autodesk, Daimler, and the translations: Language Weaver, the leading weather reports, financial profiles, and many
European Commission’s Translation Service. vendor of statistical machine translation systems, e-commerce applications.
The two primary challenges for rules-based MT recommends a bilingual corpus of two million In one recent implementation, Medtronic, a
are first, that the rules base of any system is by words or more per language pair. Because of the large medical device manufacturer, used direct
necessity limited, meaning that for best results, demanding training requirements, combined machine translation to translate a large product
database into multiple languages.4 Human trans-
authors need to adjust their writing style, and with the fact that statistical machine translation
second, while commercial rules-based machine systems tend to have a higher sticker price than lation was not an option for this project because
Flare without Help is like Help without Flare
single package!
Request your free demo versions now!
www.cognitas.de
24 AUGUST 2008
+ 49 Contact:
#5801_tcworld_04-08.indd 24 20.06.2008 14:17:24 Uhr
4. solutions
of cost and, yes, quality concerns (an analysis of ons may differ in many ways, the core translation
previous human translation projects indicated an engine is typically the same in both products. In Sources:
unacceptably high error rate among numeric va- other words: In terms of out-of-the-box translati- 1 Systran Software Inc. 2007. Systran
lues such as product numbers and dimensions). on quality, there is generally little if any difference Case Study: Symantec. Systran Software Inc.
Also, initial tests had shown that both translation between the 1000 dollar professional version Web site. [Online] 2007. [Cited: June 6, 2008.]
memories and rules-based machine translation and the 50000 dollar corporate version of a given www.systransoft.com/download/case-stu-
systems produced poor results with text that has machine translation product. dies/2007.12.Symantec.pdf.
the following characteristics: In addition, the developers of commercial ma- 2 Microsoft Corporation. 2008. Machine
– little or no repetition on the sentence level; chine translation systems have invested heavily Translation - Home. Microsoft Corporation Web
– high repetition on the word/phrase level; into making their products as intuitive to use as site. [Online] 2008. [Cited: June 6, 2008.] http://
– telegraphic/elliptic style, e.g. ‘winds from possible. In fact, I would even say that it is easier research.microsoft.com/nlp/projects/mtproj.
southerly direction, speed reaching 55 km/h’, – and certainly faster – to produce your first trans- aspx.
‘American Technology Associates (AMTA) strong lation with a typical MT product than it is with the 3 Institute of Standards and Technolo-
buy, Avion (AVIO) market outperform’, or ‘plate typical translation memory tool. gy. 2006. NIST 2006 Machine Translation
2456dr15 right-angled, slotted, 15 ea’. A few more facts to consider: Evaluation Official Results. National Instititue
This type of translation project is most definitely • Many low-priced machine translation pro- of Standards and Technology Web site. [Online]
among those that any self-respecting human ducts either feature a built-in translation me- November 1, 2006. [Cited: June 6, 2008.]
translator could easily do without. And since mory (TM) module to improve the efficiency http://www.nist.gov/speech/tests/mt/2006/
direct machine translation does not require of the post-editing process (‘never correct the doc/mt06eval_official_results.html.
human post-editing in a best case scenario, using same mistake twice’), and a few MT systems 4 Fully Automatic High Quality Machine Trans-
MT in this kind of environment might for once like promt Expert offer seamless integration lation of Restricted Text: A Case Study. Muegge,
be welcomed by translators (who would hate to with the SDL Trados translation memory Uwe. 2006. London: The Association of
do these translations themselves) and translati- system. Information Management (Aslib), 2006. Pro-
on buyers (who would love the idea of almost • A number of translation tools vendors, such ceedings of the Twenty-eighth International
instant, almost free translations). as Across, that cater to small and mid-sized Conference on Translating and the Computer.
companies, offer TM-MT system bundles and/ ISBN 978-0-85142-5.
Myth: Machine or MT integration via API.
• User education and MT system customization
translation is only for (e.g. building dictionaries), which are major fa
large organizations ctors in achieving the best possible transla-
tion results, are often easier to accomplish in
Yes, it is true: If you read any success stories smaller organizations than in larger ones.
about machine translation, they typically come
from the Caterpillars, Microsofts, and Symantecs
The bottom line
of this world. But that is true for many - if not
most - emerging technologies. It is also true that
some of the most powerful machine transla- Since its inception, machine translation has been
tion systems in use today are the result of the a highly controversial technology, and it will
contact
multi-million dollar research and development probably continue to be so for some time. Much
programs only corporate giants can afford. But of this controversy is based on false assumptions
that does not mean you have to spend big bucks about what machine translation can do and who Uwe Muegge is the cor-
to deploy a machine translation solution. might benefit from using this type of technology. porate terminologist at
Let me say it loud and clear: In general, the com- Medtronic, a manufacturer
Fact: Being both affordable and user-friendly, mercial machine translation systems available of medical technology.
many machine translation packages are today cannot replace human translators, especial- He serves in ISO Technical
available for even the smallest of businesses, ly when those MT systems are operated by users Committee 37 SC3 Compu-
including freelancers who have no linguistic background. However, ter Applications in Termnology and teaches Ter-
If you do a little research, you will find that many when the goal is to improve the efficiency of the minology Management and Computer-Assisted
commercial machine translation packages are in human translation process or to create compre- Translation at the Monterey Institute of Interna-
the same price range as their translation memory hensible translations in environments where hu- tional Studies in Monterey, California.
counterparts, and that is mostly true for both man translation is not an option, and when these
workstation solutions for single users and client- systems are operated by trained and motivated info@muegge.cc
server solutions for many users. And the secret is translation professionals, then machine translati- www.muegge.cc
out that while corporate and small-business versi- on is and has been a very powerful solution.
25
AUGUST 2008
#5801_tcworld_04-08.indd 25 20.06.2008 14:17:26 Uhr