SlideShare una empresa de Scribd logo
1 de 56
Descargar para leer sin conexión
Statistical Machine Translation
Part II: Decoding
Trevor Cohn, U. Sheffield
EXPERT winter school
November 2013

Some figures taken from Koehn 2009
Recap

 You’ve seen several models of translation
 word-based models: IBM 1-5
 phrase-based models
 grammar-based models
 Methods for
 learning translation rules from bitexts
 learning rule weights
 learning several other features: language models,
reordering etc
Decoding
 Central challenge is to predict a good translation
 Given text in the input language (f )
 Generate translation in the output language (e)
 Formally

 where our model scores each candidate translation e using a translation model and a
language model
 A decoder is a search algorithm for finding e*
 caveat: few modern systems use actual probabilities
Outline

 Decoding phrase-based models
 linear model
 dynamic programming approach
 approximate beam search
 Decoding grammar-based models
 synchronous grammars
 string-to-string decoding
Decoding objective
 Objective
 Where model, f, incorporates
 translation frequencies for phrases
 distortion cost based on (re)ordering
 language model cost of m-grams in e
 ...
 Problem of ambiguity
 may be many different sequences of translation decisions mapping f to e
 e.g. could translate word by word, or use larger units
Decoding for derivations
 A derivation is a sequence of translation decisions
 can “read off” the input string f and output e
 Define model over derivations not translations
 aka Viterbi approximation
 should sum over all derivations within the maximisation
 instead we maximise for tractability
 But see Blunsom, Cohn and Osborne (2008)
 sum out derivational ambiguity (during training)
Decoding

 Includes a coverage constraint
 all input words must be translated exactly once
 preserves input information
 Cf. ‘fertility’ in IBM word-based models
 phrases licence one to many mapping (insertions) and
many to one (deletions)
 but limited to contiguous spans
 Tractability effects on decoding
Translation process
 Translate this sentence

 translate input words and “phrases”
 reorder output to form target string
 Derivation = sequence of phrases
 1. er – he; 2. ja nicht – does not;
3. geht – go; 4. nach hause – home

Figure from Machine Translation Koehn 2009
Generating process
er

geht

ja

nicht

nach

1: segment
2: translate
3: order
Consider the translation decisions in a derivation

hause
Generating process
er

1: segment
2: translate
3: order

geht

er

geht

ja

nicht

ja nicht

nach

hause

nach hause
Generating process
er

geht

1: segment

er

geht

ja nicht

nach hause

2: translate

he

go

does not

home

3: order

ja

nicht

nach

hause
Generating process
er

geht

1: segment

er

geht
ja nicht
1: uniform cost (ignore)

2: translate

he

3: order

he

go

ja

nicht

does not
2: TM probability

does not

go

3: distortion cost & LM
probability

nach

hause

nach hause
home
home
Generating process
er

geht

ja

1: segment

er

geht

ja nicht

nach hause

2: translate

he

go

does not

home

3: order

he

go

home

does not

nicht

nach

hause

f=0
+ φ(er → he) + φ(geht → go) + φ(ja nicht → does not)
+ φ(nach hause → home)
+ ψ(he | <S>) + d(0) + ψ(does | he) + ψ(not | does) + d(1)
+ ψ(go| not) + d(-3) + ψ(home| go) + d(2) + ψ(</S>| home)
Linear Model
 Assume a linear model

 d is a derivation
 φ(rk) is the log conditional frequency of a phrase pair
 d is the distortion cost for two consecutive phrases
 ψ is the log language model probability
 each component is scaled by a separate weight
 Often mistakenly referred to as log-linear
Model components
 Typically:
 language model and word count

 translation model (s)

 distortion cost
 Values of α learned by discriminative training (not covered today)
Search problem
 Given options

 1000s of possible output strings
 he does not go home
 it is not in house
 yes he goes not to home …
Figure from Machine Translation Koehn 2009
Search Complexity

 Search space
 Number of segmentations
32 = 26
 Number of permutations
720 = 6!
 Number of translation options 4096 = 46
 Multiplying gives 94,371,840 derivations
(calculation is naïve, giving loose upper bound)
 How can we possibly search this space?
 especially for longer input sentences
Search insight
 Consider the sorted list of all derivations
 …
 he does not go after home
 he does not go after house
 he does not go home
 he does not go to home
 he does not go to house
 he does not goes home
 …
Many similar

derivations, each
with highly similar scores
Search insight #1
f = φ(er → he) + φ(geht → go) + φ(ja nicht → does not)
+ φ(nach hause → home) + ψ(he | <S>) + d(0)
+ ψ(does | he) + ψ(not | does) + d(1) + ψ(go| not)
+ d(-3)  he / does not / go+/ d(2) + ψ(</S>| home)
+ ψ(home| go) home
 he / does not / go / to home

f = φ(er → he) + φ(geht → go) + φ(ja nicht → does not)
+ φ(nach hause → to home) + ψ(he | <S>) + d(0)
+ ψ(does | he) + ψ(not | does) + d(1) + ψ(go| not)
+ d(-3) + ψ(to| go) + ψ(home| to) + d(2)
+ ψ(</S>| home)
Search insight #1
Consider all possible ways to finish the translation
Search insight #1
Score ‘f’ factorises, with shared components across all options.

Can find best completion by maximising f.
Search insight #2
Several partial translations can be finished the same way
Search insight #2
Several partial translations can be finished the same way

Only need to consider maximal scoring partial translation
Dynamic Programming Solution
 Key ideas behind dynamic programming
 factor out repeated computation
 efficiently solve the maximisation problem
 What are the key components for “sharing”?
 don’t have to be exactly identical; need same:
 set of untranslated words
 righter-most output words
 last translated input word location
 The decoding algorithm aims to exploit this
More formally
 Considering the decoding maximisation

 where d ranges over all derivations covering f
 We can split maxd into maxd1 maxd2 …
 move some ‘maxes’ inside the expression, over elements
not affected by that rule
 bracket independent parts of expression
 Akin to Viterbi algorithm in HMMs, PCFGs
Phrase-based Decoding

Start with empty state
Figure from Machine Translation Koehn 2009
Phrase-based Decoding

Expand by choosing
input span and
generating translation
Figure from Machine Translation Koehn 2009
Phrase-based Decoding

Consider all possible
options to start the
translation
Figure from Machine Translation Koehn 2009
Phrase-based Decoding
Continue to expand states, visiting
uncovered words. Generating
outputs left to right.

Figure from Machine Translation Koehn 2009
Phrase-based Decoding
Read off translation from
best complete derivation by
back-tracking

Figure from Machine Translation Koehn 2009
Dynamic Programming
 Recall that shared structure can be exploited
 vertices with same coverage, last output word, and input
position are identical for subsequent scoring
 Maximise over these paths

⇒
 aka “recombination” in the MT literature (but really just
dynamic programming)

Figure from Machine Translation Koehn 2009
Complexity
 Even with DP search is still intractable
 word-based and phrase-based decoding is NP complete
 Knight 99; Zaslavskiy, Dymetman, and Cancedda, 2009
 whereas SCFG decoding is polynomial
 Complexity arises from
 reordering model allowing all permutations (limit)
 no more than 6 uncovered words
 many translation options (limit)
 no more than 20 translations per phrase
 coverage constraints, i.e., all words to be translated once
Pruning

 Limit the size of the search graph by eliminating bad paths
early
 Pharaoh / Moses
 divide partial derivations into stacks, based on number of
input words translated
 limit the number of derivations in each stack
 limit the score difference in each stack
Stack based pruning

Algorithm iteratively “grows” from one stack to the next
larger ones, while pruning the entries in each stack.
Figure from Machine Translation Koehn 2009
Future cost estimate
 Higher scores for translating easy parts first
 language model prefers common words
 Early pruning will eliminate derivations starting with the difficult words
 pruning must incorporate estimate of the cost of translating the
remaining words
 “future cost estimate” assuming unigram LM and monotone translation
 Related to A* search and admissible heuristics
 but incurs search error (see Chang & Collins, 2011)
Beam search complexity
 Limit the number of translation options per phrase to constant (often 20)
 # translations proportional to input sentence length
 Stack pruning
 number of entries & score ratio
 Reordering limits
 finite number of uncovered words (typically 6)
but see Lopez EACL 2009
 Resulting complexity
 O( stack size x sentence length )
k-best outputs

 Can recover not just the best solution
 but also 2nd, 3rd etc best derivations
 straight-forward extension of beam search
 Useful in discriminative training of feature weights, and other
applications
Alternatives for PBMT decoding
 FST composition (Kumar & Byrne, 2005)
 each process encoded in WFST or WFSA
 simply compose automata, minimise and solve
 A* search (Och, Ueffing & Ney, 2001)
 Sampling (Arun et al, 2009)
 Integer linear programming
 Germann et al, 2001
 Reidel & Clarke, 2009
 Lagrangian relaxation
 Chang & Collins, 2011
Outline
 Decoding phrase-based models
 linear model
 dynamic programming approach
 approximate beam search
 Decoding grammar-based models
 tree-to-string decoding
 string-to-string decoding
 cube pruning
Grammar-based decoding
 Reordering in PBMT poor, must limit
 otherwise too many bad choices available
 and inference is intractable
 better if reordering decisions were driven by context
 simple form of lexicalised reordering in Moses
 Grammar based translation
 consider hierarchical phrases with gaps (Chiang 05)
 (re)ordering constrained by lexical context
 inform process by generating syntax tree
(Venugopal & Zollmann, 06; Galley et al, 06)
 exploit input syntax (Mi, Huang & Liu, 08)
Hierarchical phrase-based MT
Standard PBMT
yu Aozhou

have

you

diplomatic relations

bangjiao

with Australia

Grammar rule encodes this
common reordering:
yu X1 you X2 →
have X2 with X1
also correlates yu … you
and have … with.

Must ‘jump’ back and forth to
obtain correct ordering. Guided
primarily by language model.

Hierarchical PBMT
yu

have

Aozhou

you

diplomatic relations

bangjiao

with

Example from Chiang, CL 2007

Australia
SCFG recap
 Rules of form

yu

X

X
you

X
X

have X

with X

 can include aligned gaps
 can include informative non-terminal categories
(NN, NP, VP etc)
SCFG generation
X

X

 Synchronous grammars generate parallel texts

yu

X
Aozhou

you

X
bangiao

have

X

with

X

dipl. relations Australia

Further:
 applied to one text, can generate the other text
 leverage efficient monolingual parsing algorithms
SCFG extraction from bitexts
Step 1: identify aligned
phrase-pairs

Step 2: “subtract” out
subsumed
phrase-pairs
Example grammar
X
yu

X1

you

X
X2

have X2 with X1

X

X

Aozhou

Australia

X

X

bangiao

diplomatic relations

S

S

X

X
Decoding as parsing
 Consider only the foreign side of grammar

Step 1: parse input text
X
yu

X

X

you

X

Aozhou bangiao

S
X

X

S
X

yu

X
Aozhou

you

X
bangiao
Step 2: Translate
S

S

X

X
yu

X
Aozhou

you

X
bangiao

with X
X
yu

has
you

dipl. rels
Australia
Traverse tree, replacing each input production
with its highest scoring output side

X
X

Australia
dipl. rels
Chart parsing
1. length = 1
X → Aozhou
X → bangjiao

S0,4
X0,4

S0,2

X2,4

X0,2

X1,2
0

4. length = 4
S→SX
X → yu X you X

X3,4

yu Aozhou you bangiao
1

2. length = 2
X → yu X
X → you X
S→X

2

3

4

Two derivations yielding S0,4
Take the one with
maximum score
Chart parsing for decoding
S0,4
X0,4

S0,2

X2,4

X0,2
X1,2
0

X3,4

yu Aozhou you bangiao
1

2

3

• starting at full sentence
S0,J
• traverse down to find
maximum score
derivation
• translate each rule
using the maximum
scoring right-hand side
• emit the output string
4
LM intersection
 Very efficient
 cost of parsing, i.e., O(n3)
 reduces to linear if we impose a maximum span limit
 translation step simple O(n) post-processing step
 But what about the language model?
 CYK assumes model scores decompose with the tree
structure
 but the language model must span constituents
Problem: LM doesn’t factorise!
LM intersection via lexicalised NTs
 Encode LM context in NT categories
(Bar-Hillel et al, 1964)

X → <yu X1 you X2, have X2 with X1>
haveXb

→ <yu aXb1 you cXd2, have aXb2 with cXd1>

left & right m-1 words in output translation
 When used in parent rule, LM can access boundary words
 score now factorises with tree
LM intersection via lexicalised NTs
X
yu

X

withXb

φTM

you

yu

X

you

φTM

Aozhou

➠

Aozhou

φTM

diplomaticXrelations

φTM

bangiao

bangiao

φTM

S

S
φTM

X

cXd

AustraliaXAustralia

X

X

aX b

φTM + ψ(with → c)
+ ψ(d → has)
+ ψ(has → a)

aXb

φTM + ψ(<S> → a)
+ ψ(b → </S>)
+LM Decoding

 Same algorithm as before
 Viterbi parse with input side grammar (CYK)
 for each production, find best scoring output side
 read off output string
 But input grammar has blown up
 number of non-terminals is O(T2)
 overall translation complexity of O(n3T4(m-1))
 Terrible!
Beam search and pruning
 Resort to beam search
 prune poor entries from chart cells during CYK parsing
 histogram, threshold as in phrase-based MT
 rarely have sufficient context for LM evaluation
 Cube pruning
 lower order LM estimate search heuristic
 follows approximate ‘best first’ order for incorporating child spans into
parent rule
 stops once beam is full
 For more details, see
 Chiang “Hierarchical phrase-based translation”. 2007. Computational
Linguistics 33(2):201–228.
Further work
 Synchronous grammar systems
 SAMT (Venugopal & Zollman, 2006)
 ISI’s syntax system (Marcu et al.,2006)
 HRGG (Chiang et al., 2013)
 Tree to string (Liu, Liu & Lin, 2006)
 Probabilistic grammar induction
 Blunsom & Cohn (2009)
 Decoding and pruning
 cube growing (Huang & Chiang, 2007)
 left to right decoding (Huang & Mi, 2010)
Summary
 What we covered
 word based translation and alignment
 linear phrase-based and grammar-based models
 phrase-based (finite state) decoding
 synchronous grammar decoding
 What we didn’t cover
 rule extraction process
 discriminative training
 tree based models
 domain adaptation
 OOV translation
 …

Más contenido relacionado

La actualidad más candente

2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?tauyou
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)RIILP
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationasimuop
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for TranslationRIILP
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELSijnlc
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copyNakul Sharma
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesAntonio Toral
 
Improving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural LanguageImproving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural LanguageJinho Choi
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approachvini89
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translationMarcis Pinnis
 

La actualidad más candente (20)

2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
 
SMT3
SMT3SMT3
SMT3
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
BERT
BERTBERT
BERT
 
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
 
NLP
NLPNLP
NLP
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
 
Improving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural LanguageImproving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural Language
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approach
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 

Destacado

3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT Introduction3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT IntroductionRIILP
 
1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner Introductions1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner IntroductionsRIILP
 
18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) TerminologyRIILP
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
 
17. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 217. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 2RIILP
 
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)RIILP
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translationRIILP
 
16. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 116. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 1RIILP
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine TranslationRIILP
 
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...RIILP
 

Destacado (10)

3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT Introduction3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT Introduction
 
1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner Introductions1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner Introductions
 
18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
 
17. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 217. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 2
 
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
 
16. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 116. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 1
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation
 
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
 

Similar a 7. Trevor Cohn (usfd) Statistical Machine Translation

Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmMeetupDataScienceRoma
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
More on Indexing Text Operations (1).pptx
More on Indexing  Text Operations (1).pptxMore on Indexing  Text Operations (1).pptx
More on Indexing Text Operations (1).pptxMahsadelavari
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015RIILP
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfarpitaeron555
 
Computational model language and grammar bnf
Computational model language and grammar bnfComputational model language and grammar bnf
Computational model language and grammar bnfTaha Shakeel
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingKaterina Vylomova
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdfRamya Nellutla
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Keyappasami
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 
Model-driven Development of Model Transformations
Model-driven Development of Model TransformationsModel-driven Development of Model Transformations
Model-driven Development of Model TransformationsPieter Van Gorp
 
Fusing Modeling and Programming into Language-Oriented Programming
Fusing Modeling and Programming into Language-Oriented ProgrammingFusing Modeling and Programming into Language-Oriented Programming
Fusing Modeling and Programming into Language-Oriented ProgrammingMarkus Voelter
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...iyo
 

Similar a 7. Trevor Cohn (usfd) Statistical Machine Translation (20)

Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
project present
project presentproject present
project present
 
3.5
3.53.5
3.5
 
More on Indexing Text Operations (1).pptx
More on Indexing  Text Operations (1).pptxMore on Indexing  Text Operations (1).pptx
More on Indexing Text Operations (1).pptx
 
PDFTextProcessing
PDFTextProcessingPDFTextProcessing
PDFTextProcessing
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
 
Erlang session1
Erlang session1Erlang session1
Erlang session1
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdf
 
Computational model language and grammar bnf
Computational model language and grammar bnfComputational model language and grammar bnf
Computational model language and grammar bnf
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Key
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
Model-driven Development of Model Transformations
Model-driven Development of Model TransformationsModel-driven Development of Model Transformations
Model-driven Development of Model Transformations
 
Logic Programming and ILP
Logic Programming and ILPLogic Programming and ILP
Logic Programming and ILP
 
Fusing Modeling and Programming into Language-Oriented Programming
Fusing Modeling and Programming into Language-Oriented ProgrammingFusing Modeling and Programming into Language-Oriented Programming
Fusing Modeling and Programming into Language-Oriented Programming
 
How a Compiler Works ?
How a Compiler Works ?How a Compiler Works ?
How a Compiler Works ?
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 

Más de RIILP

Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD RIILP
 
Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic RIILP
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones RIILP
 
Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones RIILP
 
Gianluca Giulinin - FAO
Gianluca Giulinin - FAO Gianluca Giulinin - FAO
Gianluca Giulinin - FAO RIILP
 
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic RIILP
 
Tony O'Dowd - KantanMT
Tony O'Dowd -  KantanMT Tony O'Dowd -  KantanMT
Tony O'Dowd - KantanMT RIILP
 
Santanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARSantanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARRIILP
 
Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU RIILP
 
Anna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMAAnna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMARIILP
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD RIILP
 
Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW RIILP
 
Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA RIILP
 
Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU RIILP
 
Liling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARLiling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARRIILP
 
Sandra de luca - Acclaro
Sandra de luca - AcclaroSandra de luca - Acclaro
Sandra de luca - AcclaroRIILP
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015RIILP
 
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015RIILP
 
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015RIILP
 
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015RIILP
 

Más de RIILP (20)

Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD
 
Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
 
Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones
 
Gianluca Giulinin - FAO
Gianluca Giulinin - FAO Gianluca Giulinin - FAO
Gianluca Giulinin - FAO
 
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
 
Tony O'Dowd - KantanMT
Tony O'Dowd -  KantanMT Tony O'Dowd -  KantanMT
Tony O'Dowd - KantanMT
 
Santanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARSantanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAAR
 
Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU
 
Anna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMAAnna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMA
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD
 
Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW
 
Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA
 
Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU
 
Liling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARLiling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAAR
 
Sandra de luca - Acclaro
Sandra de luca - AcclaroSandra de luca - Acclaro
Sandra de luca - Acclaro
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
 
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
 
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
 
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
 

Último

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

7. Trevor Cohn (usfd) Statistical Machine Translation

  • 1. Statistical Machine Translation Part II: Decoding Trevor Cohn, U. Sheffield EXPERT winter school November 2013 Some figures taken from Koehn 2009
  • 2. Recap  You’ve seen several models of translation  word-based models: IBM 1-5  phrase-based models  grammar-based models  Methods for  learning translation rules from bitexts  learning rule weights  learning several other features: language models, reordering etc
  • 3. Decoding  Central challenge is to predict a good translation  Given text in the input language (f )  Generate translation in the output language (e)  Formally  where our model scores each candidate translation e using a translation model and a language model  A decoder is a search algorithm for finding e*  caveat: few modern systems use actual probabilities
  • 4. Outline  Decoding phrase-based models  linear model  dynamic programming approach  approximate beam search  Decoding grammar-based models  synchronous grammars  string-to-string decoding
  • 5. Decoding objective  Objective  Where model, f, incorporates  translation frequencies for phrases  distortion cost based on (re)ordering  language model cost of m-grams in e  ...  Problem of ambiguity  may be many different sequences of translation decisions mapping f to e  e.g. could translate word by word, or use larger units
  • 6. Decoding for derivations  A derivation is a sequence of translation decisions  can “read off” the input string f and output e  Define model over derivations not translations  aka Viterbi approximation  should sum over all derivations within the maximisation  instead we maximise for tractability  But see Blunsom, Cohn and Osborne (2008)  sum out derivational ambiguity (during training)
  • 7. Decoding  Includes a coverage constraint  all input words must be translated exactly once  preserves input information  Cf. ‘fertility’ in IBM word-based models  phrases licence one to many mapping (insertions) and many to one (deletions)  but limited to contiguous spans  Tractability effects on decoding
  • 8. Translation process  Translate this sentence  translate input words and “phrases”  reorder output to form target string  Derivation = sequence of phrases  1. er – he; 2. ja nicht – does not; 3. geht – go; 4. nach hause – home Figure from Machine Translation Koehn 2009
  • 9. Generating process er geht ja nicht nach 1: segment 2: translate 3: order Consider the translation decisions in a derivation hause
  • 10. Generating process er 1: segment 2: translate 3: order geht er geht ja nicht ja nicht nach hause nach hause
  • 11. Generating process er geht 1: segment er geht ja nicht nach hause 2: translate he go does not home 3: order ja nicht nach hause
  • 12. Generating process er geht 1: segment er geht ja nicht 1: uniform cost (ignore) 2: translate he 3: order he go ja nicht does not 2: TM probability does not go 3: distortion cost & LM probability nach hause nach hause home home
  • 13. Generating process er geht ja 1: segment er geht ja nicht nach hause 2: translate he go does not home 3: order he go home does not nicht nach hause f=0 + φ(er → he) + φ(geht → go) + φ(ja nicht → does not) + φ(nach hause → home) + ψ(he | <S>) + d(0) + ψ(does | he) + ψ(not | does) + d(1) + ψ(go| not) + d(-3) + ψ(home| go) + d(2) + ψ(</S>| home)
  • 14. Linear Model  Assume a linear model  d is a derivation  φ(rk) is the log conditional frequency of a phrase pair  d is the distortion cost for two consecutive phrases  ψ is the log language model probability  each component is scaled by a separate weight  Often mistakenly referred to as log-linear
  • 15. Model components  Typically:  language model and word count  translation model (s)  distortion cost  Values of α learned by discriminative training (not covered today)
  • 16. Search problem  Given options  1000s of possible output strings  he does not go home  it is not in house  yes he goes not to home … Figure from Machine Translation Koehn 2009
  • 17. Search Complexity  Search space  Number of segmentations 32 = 26  Number of permutations 720 = 6!  Number of translation options 4096 = 46  Multiplying gives 94,371,840 derivations (calculation is naïve, giving loose upper bound)  How can we possibly search this space?  especially for longer input sentences
  • 18. Search insight  Consider the sorted list of all derivations  …  he does not go after home  he does not go after house  he does not go home  he does not go to home  he does not go to house  he does not goes home  … Many similar derivations, each with highly similar scores
  • 19. Search insight #1 f = φ(er → he) + φ(geht → go) + φ(ja nicht → does not) + φ(nach hause → home) + ψ(he | <S>) + d(0) + ψ(does | he) + ψ(not | does) + d(1) + ψ(go| not) + d(-3)  he / does not / go+/ d(2) + ψ(</S>| home) + ψ(home| go) home  he / does not / go / to home f = φ(er → he) + φ(geht → go) + φ(ja nicht → does not) + φ(nach hause → to home) + ψ(he | <S>) + d(0) + ψ(does | he) + ψ(not | does) + d(1) + ψ(go| not) + d(-3) + ψ(to| go) + ψ(home| to) + d(2) + ψ(</S>| home)
  • 20. Search insight #1 Consider all possible ways to finish the translation
  • 21. Search insight #1 Score ‘f’ factorises, with shared components across all options. Can find best completion by maximising f.
  • 22. Search insight #2 Several partial translations can be finished the same way
  • 23. Search insight #2 Several partial translations can be finished the same way Only need to consider maximal scoring partial translation
  • 24. Dynamic Programming Solution  Key ideas behind dynamic programming  factor out repeated computation  efficiently solve the maximisation problem  What are the key components for “sharing”?  don’t have to be exactly identical; need same:  set of untranslated words  righter-most output words  last translated input word location  The decoding algorithm aims to exploit this
  • 25. More formally  Considering the decoding maximisation  where d ranges over all derivations covering f  We can split maxd into maxd1 maxd2 …  move some ‘maxes’ inside the expression, over elements not affected by that rule  bracket independent parts of expression  Akin to Viterbi algorithm in HMMs, PCFGs
  • 26. Phrase-based Decoding Start with empty state Figure from Machine Translation Koehn 2009
  • 27. Phrase-based Decoding Expand by choosing input span and generating translation Figure from Machine Translation Koehn 2009
  • 28. Phrase-based Decoding Consider all possible options to start the translation Figure from Machine Translation Koehn 2009
  • 29. Phrase-based Decoding Continue to expand states, visiting uncovered words. Generating outputs left to right. Figure from Machine Translation Koehn 2009
  • 30. Phrase-based Decoding Read off translation from best complete derivation by back-tracking Figure from Machine Translation Koehn 2009
  • 31. Dynamic Programming  Recall that shared structure can be exploited  vertices with same coverage, last output word, and input position are identical for subsequent scoring  Maximise over these paths ⇒  aka “recombination” in the MT literature (but really just dynamic programming) Figure from Machine Translation Koehn 2009
  • 32. Complexity  Even with DP search is still intractable  word-based and phrase-based decoding is NP complete  Knight 99; Zaslavskiy, Dymetman, and Cancedda, 2009  whereas SCFG decoding is polynomial  Complexity arises from  reordering model allowing all permutations (limit)  no more than 6 uncovered words  many translation options (limit)  no more than 20 translations per phrase  coverage constraints, i.e., all words to be translated once
  • 33. Pruning  Limit the size of the search graph by eliminating bad paths early  Pharaoh / Moses  divide partial derivations into stacks, based on number of input words translated  limit the number of derivations in each stack  limit the score difference in each stack
  • 34. Stack based pruning Algorithm iteratively “grows” from one stack to the next larger ones, while pruning the entries in each stack. Figure from Machine Translation Koehn 2009
  • 35. Future cost estimate  Higher scores for translating easy parts first  language model prefers common words  Early pruning will eliminate derivations starting with the difficult words  pruning must incorporate estimate of the cost of translating the remaining words  “future cost estimate” assuming unigram LM and monotone translation  Related to A* search and admissible heuristics  but incurs search error (see Chang & Collins, 2011)
  • 36. Beam search complexity  Limit the number of translation options per phrase to constant (often 20)  # translations proportional to input sentence length  Stack pruning  number of entries & score ratio  Reordering limits  finite number of uncovered words (typically 6) but see Lopez EACL 2009  Resulting complexity  O( stack size x sentence length )
  • 37. k-best outputs  Can recover not just the best solution  but also 2nd, 3rd etc best derivations  straight-forward extension of beam search  Useful in discriminative training of feature weights, and other applications
  • 38. Alternatives for PBMT decoding  FST composition (Kumar & Byrne, 2005)  each process encoded in WFST or WFSA  simply compose automata, minimise and solve  A* search (Och, Ueffing & Ney, 2001)  Sampling (Arun et al, 2009)  Integer linear programming  Germann et al, 2001  Reidel & Clarke, 2009  Lagrangian relaxation  Chang & Collins, 2011
  • 39. Outline  Decoding phrase-based models  linear model  dynamic programming approach  approximate beam search  Decoding grammar-based models  tree-to-string decoding  string-to-string decoding  cube pruning
  • 40. Grammar-based decoding  Reordering in PBMT poor, must limit  otherwise too many bad choices available  and inference is intractable  better if reordering decisions were driven by context  simple form of lexicalised reordering in Moses  Grammar based translation  consider hierarchical phrases with gaps (Chiang 05)  (re)ordering constrained by lexical context  inform process by generating syntax tree (Venugopal & Zollmann, 06; Galley et al, 06)  exploit input syntax (Mi, Huang & Liu, 08)
  • 41. Hierarchical phrase-based MT Standard PBMT yu Aozhou have you diplomatic relations bangjiao with Australia Grammar rule encodes this common reordering: yu X1 you X2 → have X2 with X1 also correlates yu … you and have … with. Must ‘jump’ back and forth to obtain correct ordering. Guided primarily by language model. Hierarchical PBMT yu have Aozhou you diplomatic relations bangjiao with Example from Chiang, CL 2007 Australia
  • 42. SCFG recap  Rules of form yu X X you X X have X with X  can include aligned gaps  can include informative non-terminal categories (NN, NP, VP etc)
  • 43. SCFG generation X X  Synchronous grammars generate parallel texts yu X Aozhou you X bangiao have X with X dipl. relations Australia Further:  applied to one text, can generate the other text  leverage efficient monolingual parsing algorithms
  • 44. SCFG extraction from bitexts Step 1: identify aligned phrase-pairs Step 2: “subtract” out subsumed phrase-pairs
  • 45. Example grammar X yu X1 you X X2 have X2 with X1 X X Aozhou Australia X X bangiao diplomatic relations S S X X
  • 46. Decoding as parsing  Consider only the foreign side of grammar Step 1: parse input text X yu X X you X Aozhou bangiao S X X S X yu X Aozhou you X bangiao
  • 47. Step 2: Translate S S X X yu X Aozhou you X bangiao with X X yu has you dipl. rels Australia Traverse tree, replacing each input production with its highest scoring output side X X Australia dipl. rels
  • 48. Chart parsing 1. length = 1 X → Aozhou X → bangjiao S0,4 X0,4 S0,2 X2,4 X0,2 X1,2 0 4. length = 4 S→SX X → yu X you X X3,4 yu Aozhou you bangiao 1 2. length = 2 X → yu X X → you X S→X 2 3 4 Two derivations yielding S0,4 Take the one with maximum score
  • 49. Chart parsing for decoding S0,4 X0,4 S0,2 X2,4 X0,2 X1,2 0 X3,4 yu Aozhou you bangiao 1 2 3 • starting at full sentence S0,J • traverse down to find maximum score derivation • translate each rule using the maximum scoring right-hand side • emit the output string 4
  • 50. LM intersection  Very efficient  cost of parsing, i.e., O(n3)  reduces to linear if we impose a maximum span limit  translation step simple O(n) post-processing step  But what about the language model?  CYK assumes model scores decompose with the tree structure  but the language model must span constituents Problem: LM doesn’t factorise!
  • 51. LM intersection via lexicalised NTs  Encode LM context in NT categories (Bar-Hillel et al, 1964) X → <yu X1 you X2, have X2 with X1> haveXb → <yu aXb1 you cXd2, have aXb2 with cXd1> left & right m-1 words in output translation  When used in parent rule, LM can access boundary words  score now factorises with tree
  • 52. LM intersection via lexicalised NTs X yu X withXb φTM you yu X you φTM Aozhou ➠ Aozhou φTM diplomaticXrelations φTM bangiao bangiao φTM S S φTM X cXd AustraliaXAustralia X X aX b φTM + ψ(with → c) + ψ(d → has) + ψ(has → a) aXb φTM + ψ(<S> → a) + ψ(b → </S>)
  • 53. +LM Decoding  Same algorithm as before  Viterbi parse with input side grammar (CYK)  for each production, find best scoring output side  read off output string  But input grammar has blown up  number of non-terminals is O(T2)  overall translation complexity of O(n3T4(m-1))  Terrible!
  • 54. Beam search and pruning  Resort to beam search  prune poor entries from chart cells during CYK parsing  histogram, threshold as in phrase-based MT  rarely have sufficient context for LM evaluation  Cube pruning  lower order LM estimate search heuristic  follows approximate ‘best first’ order for incorporating child spans into parent rule  stops once beam is full  For more details, see  Chiang “Hierarchical phrase-based translation”. 2007. Computational Linguistics 33(2):201–228.
  • 55. Further work  Synchronous grammar systems  SAMT (Venugopal & Zollman, 2006)  ISI’s syntax system (Marcu et al.,2006)  HRGG (Chiang et al., 2013)  Tree to string (Liu, Liu & Lin, 2006)  Probabilistic grammar induction  Blunsom & Cohn (2009)  Decoding and pruning  cube growing (Huang & Chiang, 2007)  left to right decoding (Huang & Mi, 2010)
  • 56. Summary  What we covered  word based translation and alignment  linear phrase-based and grammar-based models  phrase-based (finite state) decoding  synchronous grammar decoding  What we didn’t cover  rule extraction process  discriminative training  tree based models  domain adaptation  OOV translation  …