SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
Statistical machine translation in a few slides
Mikel L. Forcada1,2
1Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant,
E-03071 Alacant (Spain)
2Prompsit Language Engineering, S.L., E-03690 St. Vicent del Raspeig (Spain)
April 14-16, 2009: Free/open-source MT tutorial at the
CNGL
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
Contents
1 Translation as probability
2 “Decoding”
3 Training
4 “Log-linear”
5 Ain’t got nothin’ but the BLEUs?
6 The SMT lifecycle
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
The “canonical” model
Translation as probability/1
Instead of saying that
a source-language (SL) sentence s in a SL text
and a target-language (TL) sentence t
as found in a SL–TL bitext are or are not a translation of
each other,
in SMT one says that they are a translation of each other
with a probability p(s, t) = p(t, s) (a joint probability).
We’ll assume we have such a probability model available.
Or at least a reasonable estimate.
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
The “canonical” model
Translation as probability/2
According to basic probability laws, we can write:
p(s, t) = p(t, s) = p(s|t)p(t) = p(t|s)p(s) (1)
where p(x|y) is the conditional probability of x given y.
We are interested in translating from SL to TL. That is, we
want to find the most likely translation given the SL
sentence s:
t = arg max
t
p(t|s) (2)
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
The “canonical” model
The “canonical” model
We can rewrite eq. (1) as
p(t|s) =
p(s|t)p(t)
p(s)
(3)
and then with (2) to get
t = arg max
t
p(s|t)p(t) (4)
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
“Decoding”/1
t = arg max
t
p(s|t)p(t)
We have a product of two probability models:
A reverse translation model p(s|t) which tells us how likely
the SL sentence s is a translation of the candidate TL
sentence t, and
a target-language model p(t) which tells us how likely the
sentence t is in the TL side of bitexts.
These may be related (respectively) to the usual notions of
[reverse] adequacy: how much of the meaning of t is
conveyed by s
fluency: how fluent is the candidate TL sentence.
The arg max strikes a balance between the two.
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
“Decoding”/2
In SMT parlance, the process of finding t∗ is called
decoding.1
Obviously, it does not explore all possible translations t in
the search space. There are infinitely many.
The search space is pruned.
Therefore, one just gets a reasonable t instead of the
ideal t
Pruning and search strategies are a very active research
topic.
Free/open-source software: Moses.
1
Reading SMT articles usually entails deciphering jargon which may be
very obscure to outsiders or newcomers
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
Training/1
So where do these probabilities come from?
p(t) may easily be estimated from a large monolingual TL
corpus (free/open-source software: irstlm)
The estimation of p(s|t) is more complex. It’s usually made
of
a lexical model describing the probability that the
translation of certain TL word or sequence of words
(“phrase”2
) is a certain SL word or sequence of words.
an alignment model describing the reordering of words or
“phrases”.
2
A very unfortunate choice in SMT jargon
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
Training/2
The lexical model and the alignment model are estimated
using a large sentence-aligned bilingual corpus through a
complex iterative process.
An initial set of lexical probabilities is obtained by
assuming, for instance, that any word in the TL sentence
aligns with any word in its SL counterpart. And then:
Alignment probabilities in accordance with the lexical
probabilities are computed.
Lexical probabilities are obtained in accordance with the
alignment probabilities
This process (“expectation maximization”) is repeated a
fixed number of times or until some convergence is
observed (free/open-source software: Giza++).
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
Training/3
In “phrase-based” SMT, alignments may be used to extract
(SL-phrase, TL-phrase) pairs of phrases
and their corresponding probabilities
for easier decoding and to avoid “word salad”.
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
“Log-linear”/1
More SMT jargon!
It’s short for linear combination of logarithms of
probabilities.
And, sometimes, even features that aren’t logarithms or
probabilities of any kind.
OK, let’s take a look at the maths.
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
“Log-linear”/2
One can write a more general formula:
p(t|s) =
exp( nF
k=1 λk fk (t, s))
Z
(5)
with nF feature functions fk (t, s) which can depend on s, t
or both.
Setting nF = 2, f1(s, t) = log p(s|t), f2(s, t) = log p(t), and
Z = p(s) one recovers the canonical formula (3).
The best translation is then
t = arg max
t
nF
k=1
λk fk (t, s) (6)
Most of the fk (t, s) are logarithms, hence “log-linear”.
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
“Log-linear”/3
“Feature selection is a very open problem in SMT” (Lopez
2008)
Other possible functions include length penalties
(discouraging unreasonably short or long translations),
“inverted” versions of p(s|t), etc.
Where do we get the λk ’s from?
They are usually tuned so as to optimize the results on a
tuning set, according to a certain objective function that
is taken to be an indicator that correlates with translation
quality
may be automatically obtained from the output of the SMT
system and the translation in the corpus.
This is called MERT (minimum error rate training)
sometimes (free/open-source software: the Moses suite).
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
Ain’t got nothin’ but the BLEUs?
The most famous “quality indicator” is called BLEU, but
there are many others.
BLEU counts which fraction of the 1-word,
2-word,. . . n-word sequences in the output match the
reference translation.
Correlation with subjective assessments of quality is still an
open question.
A lot of SMT research is currently BLEU-driven and makes
little contact with real applications of MT.
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
The SMT lifecycle
Development:
Training: monolingual and sentence-aligned
bilingual corpora are used to estimate
probability models (features)
Tuning: a held-out portion of the
sentence-aligned bilingual corpus is
used to tune the coeficients λk
Decoding: sentences s are fed into the SMT system and
“decoded” into their translations t.
Evaluation: the system is evaluated against a reference
corpus.
Mikel L. Forcada SMT in a few slides
Translation as probability
“Decoding”
Training
“Log-linear”
Ain’t got nothin’ but the BLEUs?
The SMT lifecycle
License
This work may be distributed under the terms of
the Creative Commons Attribution–Share Alike license:
http:
//creativecommons.org/licenses/by-sa/3.0/
the GNU GPL v. 3.0 License:
http://www.gnu.org/licenses/gpl.html
Dual license! E-mail me to get the sources: mlf@ua.es
Mikel L. Forcada SMT in a few slides

Más contenido relacionado

La actualidad más candente

13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for TranslationRIILP
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationHrishikesh Nair
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine TranslationRIILP
 
Open-source machine translation for Icelandic: the Apertium platform as an o...
Open-source machine translation for Icelandic:
 the Apertium platform as an o...Open-source machine translation for Icelandic:
 the Apertium platform as an o...
Open-source machine translation for Icelandic: the Apertium platform as an o...Forcada Mikel
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi..."Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi...Yandex
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introductionnlab_utokyo
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approachvini89
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter Systemkkkseld
 
A tutorial on Machine Translation
A tutorial on Machine TranslationA tutorial on Machine Translation
A tutorial on Machine TranslationJaganadh Gopinadhan
 
Coping with Semantic Variation Points in Domain-Specific Modeling Languages
Coping with Semantic Variation Points in Domain-Specific Modeling LanguagesCoping with Semantic Variation Points in Domain-Specific Modeling Languages
Coping with Semantic Variation Points in Domain-Specific Modeling LanguagesMarc Pantel
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...John Tinsley
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translationMarcis Pinnis
 

La actualidad más candente (20)

eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Latest trends in NLP - Exploring BERT
Latest trends in NLP -  Exploring BERTLatest trends in NLP -  Exploring BERT
Latest trends in NLP - Exploring BERT
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
 
Open-source machine translation for Icelandic: the Apertium platform as an o...
Open-source machine translation for Icelandic:
 the Apertium platform as an o...Open-source machine translation for Icelandic:
 the Apertium platform as an o...
Open-source machine translation for Icelandic: the Apertium platform as an o...
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi..."Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
 
Automata
AutomataAutomata
Automata
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approach
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter System
 
A tutorial on Machine Translation
A tutorial on Machine TranslationA tutorial on Machine Translation
A tutorial on Machine Translation
 
Coping with Semantic Variation Points in Domain-Specific Modeling Languages
Coping with Semantic Variation Points in Domain-Specific Modeling LanguagesCoping with Semantic Variation Points in Domain-Specific Modeling Languages
Coping with Semantic Variation Points in Domain-Specific Modeling Languages
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
 
Intro to NLP. Lecture 2
Intro to NLP.  Lecture 2Intro to NLP.  Lecture 2
Intro to NLP. Lecture 2
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 

Destacado

Kim Daha Çabuk İş bulmak İster?
Kim Daha Çabuk İş bulmak İster?Kim Daha Çabuk İş bulmak İster?
Kim Daha Çabuk İş bulmak İster?Taylan Demirkaya
 
Traducció automàtica de codi obert: Apertium, una oportunitat per a llengües ...
Traducció automàtica de codi obert: Apertium, una oportunitat per a llengües ...Traducció automàtica de codi obert: Apertium, una oportunitat per a llengües ...
Traducció automàtica de codi obert: Apertium, una oportunitat per a llengües ...Forcada Mikel
 
Integrating corpus-based and rule-based approaches in an open-source machine ...
Integrating corpus-based and rule-based approaches in an open-source machine ...Integrating corpus-based and rule-based approaches in an open-source machine ...
Integrating corpus-based and rule-based approaches in an open-source machine ...Forcada Mikel
 
Davranışsal Finans ve Ekonomi
Davranışsal Finans ve EkonomiDavranışsal Finans ve Ekonomi
Davranışsal Finans ve EkonomiTaylan Demirkaya
 

Destacado (6)

Kim Daha Çabuk İş bulmak İster?
Kim Daha Çabuk İş bulmak İster?Kim Daha Çabuk İş bulmak İster?
Kim Daha Çabuk İş bulmak İster?
 
Embryonix Ne Yapar?
Embryonix Ne Yapar?Embryonix Ne Yapar?
Embryonix Ne Yapar?
 
Traducció automàtica de codi obert: Apertium, una oportunitat per a llengües ...
Traducció automàtica de codi obert: Apertium, una oportunitat per a llengües ...Traducció automàtica de codi obert: Apertium, una oportunitat per a llengües ...
Traducció automàtica de codi obert: Apertium, una oportunitat per a llengües ...
 
Chapter 7
Chapter 7Chapter 7
Chapter 7
 
Integrating corpus-based and rule-based approaches in an open-source machine ...
Integrating corpus-based and rule-based approaches in an open-source machine ...Integrating corpus-based and rule-based approaches in an open-source machine ...
Integrating corpus-based and rule-based approaches in an open-source machine ...
 
Davranışsal Finans ve Ekonomi
Davranışsal Finans ve EkonomiDavranışsal Finans ve Ekonomi
Davranışsal Finans ve Ekonomi
 

Similar a Smt in-a-few-slides

Deep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsDeep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsJeongkyu Shin
 
An introduction to erlang
An introduction to erlangAn introduction to erlang
An introduction to erlangMirko Bonadei
 
Latent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureLatent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureRakuten Group, Inc.
 
The hangover: A "modern" (?) high performance approach to build an offensive ...
The hangover: A "modern" (?) high performance approach to build an offensive ...The hangover: A "modern" (?) high performance approach to build an offensive ...
The hangover: A "modern" (?) high performance approach to build an offensive ...Nelson Brito
 
Model-driven Development of Model Transformations
Model-driven Development of Model TransformationsModel-driven Development of Model Transformations
Model-driven Development of Model TransformationsPieter Van Gorp
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019 Alexis Agahi
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfeliasabdi2024
 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.pptAmrita Sharma
 
Performance analysis of bangla speech recognizer model using hmm
Performance analysis of bangla speech recognizer model using hmmPerformance analysis of bangla speech recognizer model using hmm
Performance analysis of bangla speech recognizer model using hmmAbdullah al Mamun
 
Site visit presentation 2012 12 14
Site visit presentation 2012 12 14Site visit presentation 2012 12 14
Site visit presentation 2012 12 14Mitchell Wand
 
ITU - MDD - Textural Languages and Grammars
ITU - MDD - Textural Languages and GrammarsITU - MDD - Textural Languages and Grammars
ITU - MDD - Textural Languages and GrammarsTonny Madsen
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練Abner Huang
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analyticsshanbady
 
The future of DSLs - functions and formal methods
The future of DSLs - functions and formal methodsThe future of DSLs - functions and formal methods
The future of DSLs - functions and formal methodsMarkus Voelter
 
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)TAUS - The Language Data Network
 
Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers
Fast and Precise Symbolic Analysis of Concurrency Bugs in Device DriversFast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers
Fast and Precise Symbolic Analysis of Concurrency Bugs in Device DriversPantazis Deligiannis
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialAlyona Medelyan
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 

Similar a Smt in-a-few-slides (20)

Deep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsDeep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractions
 
An introduction to erlang
An introduction to erlangAn introduction to erlang
An introduction to erlang
 
Latent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureLatent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet Mixture
 
The hangover: A "modern" (?) high performance approach to build an offensive ...
The hangover: A "modern" (?) high performance approach to build an offensive ...The hangover: A "modern" (?) high performance approach to build an offensive ...
The hangover: A "modern" (?) high performance approach to build an offensive ...
 
Model-driven Development of Model Transformations
Model-driven Development of Model TransformationsModel-driven Development of Model Transformations
Model-driven Development of Model Transformations
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdf
 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.ppt
 
Performance analysis of bangla speech recognizer model using hmm
Performance analysis of bangla speech recognizer model using hmmPerformance analysis of bangla speech recognizer model using hmm
Performance analysis of bangla speech recognizer model using hmm
 
Site visit presentation 2012 12 14
Site visit presentation 2012 12 14Site visit presentation 2012 12 14
Site visit presentation 2012 12 14
 
ITU - MDD - Textural Languages and Grammars
ITU - MDD - Textural Languages and GrammarsITU - MDD - Textural Languages and Grammars
ITU - MDD - Textural Languages and Grammars
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analytics
 
The future of DSLs - functions and formal methods
The future of DSLs - functions and formal methodsThe future of DSLs - functions and formal methods
The future of DSLs - functions and formal methods
 
NLTK
NLTKNLTK
NLTK
 
Pycon Korea 2020
Pycon Korea 2020 Pycon Korea 2020
Pycon Korea 2020
 
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
 
Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers
Fast and Precise Symbolic Analysis of Concurrency Bugs in Device DriversFast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers
Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 

Último

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Smt in-a-few-slides

  • 1. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle Statistical machine translation in a few slides Mikel L. Forcada1,2 1Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03071 Alacant (Spain) 2Prompsit Language Engineering, S.L., E-03690 St. Vicent del Raspeig (Spain) April 14-16, 2009: Free/open-source MT tutorial at the CNGL Mikel L. Forcada SMT in a few slides
  • 2. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle Contents 1 Translation as probability 2 “Decoding” 3 Training 4 “Log-linear” 5 Ain’t got nothin’ but the BLEUs? 6 The SMT lifecycle Mikel L. Forcada SMT in a few slides
  • 3. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle The “canonical” model Translation as probability/1 Instead of saying that a source-language (SL) sentence s in a SL text and a target-language (TL) sentence t as found in a SL–TL bitext are or are not a translation of each other, in SMT one says that they are a translation of each other with a probability p(s, t) = p(t, s) (a joint probability). We’ll assume we have such a probability model available. Or at least a reasonable estimate. Mikel L. Forcada SMT in a few slides
  • 4. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle The “canonical” model Translation as probability/2 According to basic probability laws, we can write: p(s, t) = p(t, s) = p(s|t)p(t) = p(t|s)p(s) (1) where p(x|y) is the conditional probability of x given y. We are interested in translating from SL to TL. That is, we want to find the most likely translation given the SL sentence s: t = arg max t p(t|s) (2) Mikel L. Forcada SMT in a few slides
  • 5. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle The “canonical” model The “canonical” model We can rewrite eq. (1) as p(t|s) = p(s|t)p(t) p(s) (3) and then with (2) to get t = arg max t p(s|t)p(t) (4) Mikel L. Forcada SMT in a few slides
  • 6. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle “Decoding”/1 t = arg max t p(s|t)p(t) We have a product of two probability models: A reverse translation model p(s|t) which tells us how likely the SL sentence s is a translation of the candidate TL sentence t, and a target-language model p(t) which tells us how likely the sentence t is in the TL side of bitexts. These may be related (respectively) to the usual notions of [reverse] adequacy: how much of the meaning of t is conveyed by s fluency: how fluent is the candidate TL sentence. The arg max strikes a balance between the two. Mikel L. Forcada SMT in a few slides
  • 7. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle “Decoding”/2 In SMT parlance, the process of finding t∗ is called decoding.1 Obviously, it does not explore all possible translations t in the search space. There are infinitely many. The search space is pruned. Therefore, one just gets a reasonable t instead of the ideal t Pruning and search strategies are a very active research topic. Free/open-source software: Moses. 1 Reading SMT articles usually entails deciphering jargon which may be very obscure to outsiders or newcomers Mikel L. Forcada SMT in a few slides
  • 8. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle Training/1 So where do these probabilities come from? p(t) may easily be estimated from a large monolingual TL corpus (free/open-source software: irstlm) The estimation of p(s|t) is more complex. It’s usually made of a lexical model describing the probability that the translation of certain TL word or sequence of words (“phrase”2 ) is a certain SL word or sequence of words. an alignment model describing the reordering of words or “phrases”. 2 A very unfortunate choice in SMT jargon Mikel L. Forcada SMT in a few slides
  • 9. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle Training/2 The lexical model and the alignment model are estimated using a large sentence-aligned bilingual corpus through a complex iterative process. An initial set of lexical probabilities is obtained by assuming, for instance, that any word in the TL sentence aligns with any word in its SL counterpart. And then: Alignment probabilities in accordance with the lexical probabilities are computed. Lexical probabilities are obtained in accordance with the alignment probabilities This process (“expectation maximization”) is repeated a fixed number of times or until some convergence is observed (free/open-source software: Giza++). Mikel L. Forcada SMT in a few slides
  • 10. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle Training/3 In “phrase-based” SMT, alignments may be used to extract (SL-phrase, TL-phrase) pairs of phrases and their corresponding probabilities for easier decoding and to avoid “word salad”. Mikel L. Forcada SMT in a few slides
  • 11. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle “Log-linear”/1 More SMT jargon! It’s short for linear combination of logarithms of probabilities. And, sometimes, even features that aren’t logarithms or probabilities of any kind. OK, let’s take a look at the maths. Mikel L. Forcada SMT in a few slides
  • 12. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle “Log-linear”/2 One can write a more general formula: p(t|s) = exp( nF k=1 λk fk (t, s)) Z (5) with nF feature functions fk (t, s) which can depend on s, t or both. Setting nF = 2, f1(s, t) = log p(s|t), f2(s, t) = log p(t), and Z = p(s) one recovers the canonical formula (3). The best translation is then t = arg max t nF k=1 λk fk (t, s) (6) Most of the fk (t, s) are logarithms, hence “log-linear”. Mikel L. Forcada SMT in a few slides
  • 13. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle “Log-linear”/3 “Feature selection is a very open problem in SMT” (Lopez 2008) Other possible functions include length penalties (discouraging unreasonably short or long translations), “inverted” versions of p(s|t), etc. Where do we get the λk ’s from? They are usually tuned so as to optimize the results on a tuning set, according to a certain objective function that is taken to be an indicator that correlates with translation quality may be automatically obtained from the output of the SMT system and the translation in the corpus. This is called MERT (minimum error rate training) sometimes (free/open-source software: the Moses suite). Mikel L. Forcada SMT in a few slides
  • 14. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle Ain’t got nothin’ but the BLEUs? The most famous “quality indicator” is called BLEU, but there are many others. BLEU counts which fraction of the 1-word, 2-word,. . . n-word sequences in the output match the reference translation. Correlation with subjective assessments of quality is still an open question. A lot of SMT research is currently BLEU-driven and makes little contact with real applications of MT. Mikel L. Forcada SMT in a few slides
  • 15. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle The SMT lifecycle Development: Training: monolingual and sentence-aligned bilingual corpora are used to estimate probability models (features) Tuning: a held-out portion of the sentence-aligned bilingual corpus is used to tune the coeficients λk Decoding: sentences s are fed into the SMT system and “decoded” into their translations t. Evaluation: the system is evaluated against a reference corpus. Mikel L. Forcada SMT in a few slides
  • 16. Translation as probability “Decoding” Training “Log-linear” Ain’t got nothin’ but the BLEUs? The SMT lifecycle License This work may be distributed under the terms of the Creative Commons Attribution–Share Alike license: http: //creativecommons.org/licenses/by-sa/3.0/ the GNU GPL v. 3.0 License: http://www.gnu.org/licenses/gpl.html Dual license! E-mail me to get the sources: mlf@ua.es Mikel L. Forcada SMT in a few slides