2. Outline
2
Introduction
A brief introduction to Translation technology
Interest in MT
Problems Involved in Machine Translation
Translation Technology
Knowledge-based systems
Statistical machine translation systems
Rule-Based vs. Statistical MT
Current State of Machine Translation in Use
Personal Speech-to-Speech Translators
Machine Translation
3. Introduction
3
•These factors have increased both the
demand for translation services and
interest in computerized translation
technology.
•Some industry observers say machine
translation, a largely experimental
technology that has been around since
the late 1950s, is now ready to become
commercially viable.
Machine Translation
4. Definition
The sub-domain of artificial
intelligence concerned with the
task of developing programs
possessing some capability of
NLP ’ a natural language
in order to achieve some specific
goal.
A transformation from one
representation (the input text) to Understanding
another (internal representation)
Machine Translation
5. Introduction:
5
Machine Translation :
The use of computers to
translate from one language
to another.
One of the oldest dreams of
NLP, AI, and CS
(first system in 1954).
Machine Translation
6. 6
Why Machine Translation?
•Cheap, universal access to
world’s online information
regardless of original language.
(That’s the goal)
Machine Translation
7. Interest in MT
7
Interest in MT
Commercial interest Academic interest
challenging problems in Requires knowledge from
U.S. has invested in MT transferring resources from one
NLP research many NLP sub-areas
language to another
MT is popular on the web lexical semantics parsing
EU spends more than $1 statistical
morphological analysis
billion on translation modeling
(Semi-)automated
translation
Machine Translation
8. Problems Involved in Machine
Translation
8
Ambiguity
syntactic irregularity
multiple word meanings
the influence of context
are the main problems faced by MT systems.
A classic example is illustrated in the following pair of
sentences:
Time flies like an arrow.
Fruit flies like an apple.
Machine Translation
9. How can a machine understand these
differences?
9
Get the cat with the gloves.
Machine Translation
10. Outline
10
Introduction
A brief introduction to Translation technology
Interest in MT
Problems Involved in Machine Translation
Translation Technology
Knowledge-based systems
Statistical machine translation systems
Rule-Based vs. Statistical MT
Current State of Machine Translation in Use
Personal Speech-to-Speech Translators
Machine Translation
11. TRANSLATION TECHNOLOGY
11
•There are two kinds of machine translation:
•Knowledge-based systems
•Statistical machine translation
•Knowledge-based systems
Traditional translation technology takes a knowledge-
based approach.
These expert systems—used by vendors such as Fujitsu,
Logos, and Systran—translate documents by converting
words and grammar directly from one language into
another.
Machine Translation
12. Knowledge-based systems
12
How they work.
Hmm, every time he sees
Knowledge based systems ―banco‖, he either types
rely on programmers to enter ―bank‖ or ―bench‖ … but if
various languages’ vocabulary Man, this is so boring. he sees ―banco de…‖,
he always types ―bank‖,
and syntax information into never ―bench‖…
data bases.
The programmers then write
lists of rules that describe the
possible relationships
between a language’s parts of
speech.
The software, which can run
Translated documents
on a high-powered PC,
analyzes a document and
examines the rules for both the Machine Translation 12
text’s language and the target
13. Statistical machine translation systems
13
Rather than using the knowledge based system’s
Statistical machine translation
direct word-by-word translation techniques, statistical
approaches translate documents by statistically
analyzing entire phrases and, over time, ―learning‖
how various languages work.
How it works. Statistical systems
start with minimal dictionary and language
resources. Users then must train the
system before they can work with it on
extensive translations.
During the training, researchers feed the
system documents for which they already
have accurate human translations.
The system then uses its resources to
guess at
the documents’ meanings.
Machine Translation
14. Statistical machine translation
14
systems
Statistical systems generally work by
dividing documents into N-grams, with N
the number of words, usually three, in a
phrase. N-grams are statistical translation’s
building blocks.
Analyzing N-grams helps improve
translation accuracy and performance
because, while a word by itself may have
many definitions, it has far fewer potential
meanings when used as part of a phrase.
Machine Translation
15. Statistical machine translation
15
systems
Machine
Learning
Magic
Books in Same books,
English in Farsi P(F|E) model
Statistical machine translation (SMT) can be defined as the process of
maximizing the probability of a sentence s in the source language
matching a sentence t in the target language. We call collections stored
in two languages parallel corpora or parallel texts.
Machine Translation
16. Statistical machine translation
16
systems
Statistical machine translation systems, which
statistically analyze entire phrases and ―learn‖ how
various languages work, frequently work with other
types of systems to improve output quality.
The lexicon system provides translated words and their
variations.
The alignment system assures that phrases from the
source language are converted to the proper phrases
and presented in the proper order in the target
language.
The language system performs a morphological
analysis of individual words or a syntactic analysis of
sentences and thereby produces translations that read
properly. Machine Translation
17. Rule-Based vs. Statistical MT
17
Rule-based MT:
very labour intensive, time-consuming, and expensive
Rules can be based on lexical or structural transfer
Each program must be customized for each language-pair it works with.
Pro: firm grip on complex translation phenomena
Con: time-consuming, and expensive,Often very labor-intensive -> lack
of robustness
Statistical MT
Mainly word or phrase-based translations
Translation are learned from actual data
In general, in statistical machine translation, if more data will be
provided for learning; higher will be the quality of translation.
Pro: Translations are learned automatically
Con: Difficult to model complex translation phenomena
Machine Translation
18. Current State of Machine Translation in
Use
18
Google Translate is a service provided by
Google Inc. to translate a section of text, or a
webpage, into another language, with limits to
the number of paragraphs, or range of
technical terms, translated. For some
languages, users are asked for alternative
translations, such as for technical terms, to be
included for future updates to the translation
process. Google translate is based on an
approach called statistical machine translation.
Machine Translation
19. Current State of Machine Translation in Use
cont.
19
SYSTRAN's methodology is a sentence by sentence approach,
concentrating on individual words and their dictionary data, then
on the parse of the sentence unit, followed by the translation of
the parsed sentence.
AltaVista’s Babel fish
Babel Fish is a web-based application developed by AltaVista
(now part of Yahoo!) which automatically translates text or
web pages from one of several languages into another. The
translation technology for Babel Fish is provided by
SYSTRAN, whose technology also powers a number of other
sites and portals.
Machine Translation
20. Current State of Machine Translation in Use
cont.
20
is a Los Angeles, California–based company that was founded in 2002
by the University of Southern California's Kevin Knight and Daniel
Marcu, to commercialize a statistical approach to automatic |language
translation and natural language processing - now known globally as
statistical machine translation software (SMTS)
Language Weaver’s statistically-based translation software is an
instance of a recent advance in automated translation.
is a service provided by Microsoft as part of its
Windows Live services which allow users to translate
texts or entire web pages into different languages.
Computer-related texts are translated by Microsoft's
own statistical machine translation technology for eight
supported languages
Machine Translation
21. Personal Speech-to-Speech
Translators
21
•One of the newest research areas in machine translation is the personal speech to-
speech translator. People on business or personal trips could use these devices to
translate on the fly.
Speech-to-speech translation, which is still in the experimental
stage, is a complex process requiring speech-recognition
technology that converts speech to text, machine translation of the text, and then text-
to-speech conversion.
•IBM is working on the handheld multilingual automatic speech-to-speech translator
(Mastor), which uses a hybrid statistical/knowledge-base engine to translate the
content. Mastor tries to determine the general meaning of a phrase, rather than its
exact translation. This approach requires less database capacity, which makes it more
suitable for small devices.
Machine Translation
22. LOOKING AHEAD
22
•Because of ongoing
demand for better
translation systems,
research money will
continue to flow into the
field. In addition,
companies are likely to
develop and release more
commercial products.
Machine Translation
Statistical machine translation (SMT) can be defined as the process of maximizing the probability of a sentence s in the source language matching a sentence t in the target language.
the knowledge-based approach’s is very labour intensive, time-consuming, and expensive. And even after decades of work, the systems don’t generally provide more than the basic idea of a document’s meaning. However, until recently, knowledge based systems were still preferred by many researchers who contended that the statistical approach was too simple to effectively handle a complex task like translation. In addition, statistical systems require fast processors and large amounts of RAM, which were not readily and inexpensively available until several years ago.
It is critical to continue research and development in any field, knowing the current state of the technology, rather than re-inventing the wheel. Existing translation engines will be explained in the following slides.
Even as technology opens up e-commerce opportunities, companies must overcome language barriers to reach new potential customers and business partners. For example, many companies have decided to develop Web sites in the languages of the countries in which their customers and partners live.
Building machines to automate tasks requiring intelligent behaviour. Machine Translation (MT) is a subfield of natural language processing that involves automatic translation of sentences from one natural language to another.The sub-domain of artificial intelligence concerned with the task of developing programs possessing some capability of ‘understanding’ a natural language in order to achieve some specific goal.
Machine Translation also known as Automatic Translation is the process that translates one human language to another. A Machine Translation Systems can be thought of as a compiler. A compiler translates a high-level programming language like C++, Java and the like to low-level languages like assembly and machine language. The only difference being that the grammar of a natural language like English or Hindi is much more complex compared to the grammar of a programming language.
A Machine Translation Systems can be thought of as a compiler. A compiler translates a high-level programming language like C++, Java and the like to low-level languages like assembly and machine language. The only difference being that the grammar of a natural language like English or Hindi is much more complex compared to the grammar of a programming language.
Commercial interest:U.S. has invested in MT for intelligence purposesMT is popular on the web—it is the most used of Google’s special featuresEU spends more than $1 billion on translation costs each year.(Semi-)automated translation could lead to huge savingsAcademic interest:One of the most challenging problems in NLP researchRequires knowledge from many NLP sub-areas, e.g., lexical semantics, parsing, morphological analysis, statistical modeling,…Being able to establish links between two languages allows for transferring resources from one language to another
The sentence construction is parallel, but the meanings are entirely different: the first is a figure of speech involving a metaphor and the second is a literal description. And the identical words in the sentences - flies and like - are used in different grammatical categories