SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Machine Learning for Language Technology
Uppsala University
Department of Linguistics and Philology
Structured Prediction
October 2013
Slides borrowed from previous courses.
Thanks to Ryan McDonald (Google Research)
and Prof. Joakim Nivre
Machine Learning for Language Technology 1(36)
Structured Prediction
Outline
Last time:
Preliminaries: input/output, features, etc.
Linear classifiers
Perceptron
Large-margin classifiers (SVMs, MIRA)
Logistic regression
Today:
Structured prediction with linear classifiers
Structured perceptron
Structured large-margin classifiers (SVMs, MIRA)
Conditional random fields
Case study: Dependency parsing
Machine Learning for Language Technology 2(36)
Structured Prediction
Structured Prediction (i)
Sometimes our output space Y does not consist of simple
atomic classes
Examples:
Parsing: for a sentence x, Y is the set of possible parse trees
Sequence tagging: for a sentence x, Y is the set of possible
tag sequences, e.g., part-of-speech tags, named-entity tags
Machine translation: for a source sentence x, Y is the set of
possible target language sentences
Machine Learning for Language Technology 3(36)
Structured Prediction
Hidden Markov Models
Generative model – maximizes likelihood of P(x, y)
We are looking at discriminative versions of this
Not just sequences, though that will be the running example
Machine Learning for Language Technology 4(36)
Structured Prediction
Structured Prediction (ii)
Can’t we just use our multiclass learning algorithms?
In all the cases, the size of the set Y is exponential in the
length of the input x
It is non-trivial to apply our learning algorithms in such cases
Machine Learning for Language Technology 5(36)
Structured Prediction
Perceptron
Training data: T = {(xt, yt)}
|T |
t=1
1. w(0) = 0; i = 0
2. for n : 1..N
3. for t : 1..T
4. Let y = arg maxy w(i) · f(xt, y) (**)
5. if y = yt
6. w(i+1) = w(i) + f(xt, yt) − f(xt, y )
7. i = i + 1
8. return wi
(**) Solving the argmax requires a search over an exponential
space of outputs!
Machine Learning for Language Technology 6(36)
Structured Prediction
Large-Margin Classifiers
Batch (SVMs):
min
1
2
||w||2
such that:
w·f(xt, yt)−w·f(xt, y ) ≥ 1
∀(xt, yt) ∈ T and y ∈ ¯Yt (**)
Online (MIRA):
Training data: T = {(xt, yt)}
|T |
t=1
1. w(0)
= 0; i = 0
2. for n : 1..N
3. for t : 1..T
4. w(i+1)
= arg minw* w* − w(i)
such that:
w · f(xt, yt) − w · f(xt, y ) ≥ 1
∀y ∈ ¯Yt (**)
5. i = i + 1
6. return wi
(**) There are exponential constraints in the size of each input!!
Machine Learning for Language Technology 7(36)
Structured Prediction
Factor the Feature Representations
We can make an assumption that our feature representations
factor relative to the output
Example:
Context-free parsing:
f(x, y) =
A→BC∈y
f(x, A → BC)
Sequence analysis – Markov assumptions:
f(x, y) =
|y|
i=1
f(x, yi−1, yi )
These kinds of factorizations allow us to run algorithms like
CKY and Viterbi to compute the argmax function
Machine Learning for Language Technology 8(36)
Structured Prediction
Example – Sequence Labeling
Many NLP problems can be cast in this light
Part-of-speech tagging
Named-entity extraction
Semantic role labeling
...
Input: x = x0x1 . . . xn
Output: y = y0y1 . . . yn
Each yi ∈ Yatom – which is small
Each y ∈ Y = Yn
atom – which is large
Example: part-of-speech tagging – Yatom is set of tags
x = John saw Mary with the telescope
y = noun verb noun preposition article noun
Machine Learning for Language Technology 9(36)
Structured Prediction
Sequence Labeling – Output Interaction
x = John saw Mary with the telescope
y = noun verb noun preposition article noun
Why not just break up sequence into a set of multi-class
predictions?
Because there are interactions between neighbouring tags
What tag does“saw”have?
What if I told you the previous tag was article?
What if it was noun?
Machine Learning for Language Technology 10(36)
Structured Prediction
Sequence Labeling – Markov Factorization
x = John saw Mary with the telescope
y = noun verb noun preposition article noun
Markov factorization – factor by adjacent labels
First-order (like HMMs)
f(x, y) =
|y|
i=1
f(x, yi−1, yi )
kth-order
f(x, y) =
|y|
i=k
f(x, yi−k, . . . , yi−1, yi )
Machine Learning for Language Technology 11(36)
Structured Prediction
Sequence Labeling – Features
x = John saw Mary with the telescope
y = noun verb noun preposition article noun
First-order
f(x, y) =
|y|
i=1
f(x, yi−1, yi )
f(x, yi−1, yi ) is any feature of the input & two adjacent labels
fj (x, yi−1, yi ) =
8
<
:
1 if xi = “saw”
and yi−1 = noun and yi = verb
0 otherwise
fj (x, yi−1, yi ) =
8
<
:
1 if xi = “saw”
and yi−1 = article and yi = verb
0 otherwise
wj should get high weight and wj should get low weight
Machine Learning for Language Technology 12(36)
Structured Prediction
Sequence Labeling - Inference
How does factorization effect inference?
y = arg max
y
w · f(x, y)
= arg max
y
w ·
|y|
i=1
f(x, yi−1, yi )
= arg max
y
|y|
i=1
w · f(x, yi−1, yi )
= arg max
y
|y|
i=1
m
j=1
wj · fj (x, yi−1, yi )
Can use the Viterbi algorithm
Machine Learning for Language Technology 13(36)
Structured Prediction
Sequence Labeling – Viterbi Algorithm
Let αy,i be the score of the best labeling
Of the sequence x0x1 . . . xi
Where yi = y
Let’s say we know α, then
maxy αy,n is the score of the best labeling of the sequence
αy,i can be calculated with the following recursion
αy,0 = 0.0 ∀y ∈ Yatom
αy,i = max
y∗
αy∗,i−1 + w · f(x, y∗, y)
Machine Learning for Language Technology 14(36)
Structured Prediction
Sequence Labeling - Back-Pointers
But that only tells us what the best score is
Let βy,i be the ith label in the best labeling
Of the sequence x0x1 . . . xi
Where yi = y
βy,i can be calculated with the following recursion
βy,0 = nil ∀y ∈ Yatom
βy,i = arg max
y∗
αy∗,i−1 + w · f(x, y∗, y)
Thus:
The last label in the best sequence is yn = arg maxy αy,n
The second-to-last label is yn−1 = βyn,n
...
The first label is y0 = βy1,1
Machine Learning for Language Technology 15(36)
Structured Prediction
Structured Learning
We know we can solve the inference problem
At least for sequence labeling
But also for many other problems where one can factor
features appropriately (context-free parsing, dependency
parsing, semantic role labeling, . . . )
How does this change learning?
for the perceptron algorithm?
for SVMs?
for logistic regression?
Machine Learning for Language Technology 16(36)
Structured Prediction
Structured Perceptron
Exactly like original perceptron
Except now the argmax function uses factored features
Which we can solve with algorithms like the Viterbi algorithm
All of the original analysis carries over!!
1. w(0) = 0; i = 0
2. for n : 1..N
3. for t : 1..T
4. Let y = arg maxy w(i) · f(xt , y) (**)
5. if y = yt
6. w(i+1) = w(i) + f(xt , yt ) − f(xt , y )
7. i = i + 1
8. return wi
(**) Solve the argmax with Viterbi for sequence problems!
Machine Learning for Language Technology 17(36)
Structured Prediction
Online Structured SVMs (or Online MIRA)
1. w(0) = 0; i = 0
2. for n : 1..N
3. for t : 1..T
4. w(i+1) = arg minw*
‚
‚w* − w(i)
‚
‚
such that:
w · f(xt , yt ) − w · f(xt , y ) ≥ L(yt , y )
∀y ∈ ¯Yt and y ∈ k-best(xt , w(i))
5. i = i + 1
6. return wi
k-best(xt) is set of k outputs with highest scores using w(i)
Simple solution – only consider highest-scoring output y ∈ ¯Yt
Note: Old fixed margin of 1 is now a fixed loss L(yt, y )
between two structured outputs
Machine Learning for Language Technology 18(36)
Structured Prediction
Structured SVMs
min
1
2
||w||2
such that:
w · f(xt, yt) − w · f(xt, y ) ≥ L(yt, y )
∀(xt, yt) ∈ T and y ∈ ¯Yt
Still have an exponential number of constraints
Feature factorizations permit solutions (max-margin Markov
networks, structured SVMs)
Note: Old fixed margin of 1 is now a fixed loss L(yt, y )
between two structured outputs
Machine Learning for Language Technology 19(36)
Structured Prediction
Conditional Random Fields (i)
What about structured logistic regression?
Such a thing exists – Conditional Random Fields (CRFs)
Consider again the sequential case with 1st order factorization
Inference is identical to the structured perceptron – use Viterbi
arg max
y
P(y|x) = arg max
y
ew·f(x,y)
Zx
= arg max
y
ew·f(x,y)
= arg max
y
w · f(x, y)
= arg max
y
X
i=1
w · f(x, yi−1, yi )
Machine Learning for Language Technology 20(36)
Structured Prediction
Conditional Random Fields (ii)
However, learning does change
Reminder: pick w to maximize log-likelihood of training data:
w = arg max
w t
log P(yt|xt)
Take gradient and use gradient ascent
∂
∂wi
F(w) =
t
fi (xt, yt) −
t y ∈Y
P(y |xt)fi (xt, y )
And the gradient is:
F(w) = (
∂
∂w0
F(w),
∂
∂w1
F(w), . . . ,
∂
∂wm
F(w))
Machine Learning for Language Technology 21(36)
Structured Prediction
Conditional Random Fields (iii)
Problem: sum over output space Y
∂
∂wi
F(w) =
X
t
fi (xt , yt ) −
X
t
X
y ∈Y
P(y |xt )fi (xt , y )
=
X
t
X
j=1
fi (xt , yt,j−1, yt,j ) −
X
t
X
y ∈Y
X
j=1
P(y |xt )fi (xt , yj−1, yj )
Can easily calculate first term – just empirical counts
What about the second term?
Machine Learning for Language Technology 22(36)
Structured Prediction
Conditional Random Fields (iv)
Problem: sum over output space Y
t y ∈Y j=1
P(y |xt)fi (xt, yj−1, yj )
We need to show we can compute it for arbitrary xt
y ∈Y j=1
P(y |xt)fi (xt, yj−1, yj )
Solution: the forward-backward algorithm
Machine Learning for Language Technology 23(36)
Structured Prediction
Forward Algorithm (i)
Let αm
u be the forward scores
Let |xt| = n
αm
u is the sum over all labelings of x0 . . . xm such that ym = u
αm
u =
|y |=m, ym=u
ew·f(xt ,y )
=
|y |=m ym=u
e
P
j=1 w·f(xt ,yj−1,yj )
i.e., the sum of all labelings of length m, ending at position m
with label u
Note then that
Zxt =
y
ew·f(xt ,y )
=
u
αn
u
Machine Learning for Language Technology 24(36)
Structured Prediction
Forward Algorithm (ii)
We can fill in α as follows:
α0
u = 1.0 ∀u
αm
u =
v
αm−1
v × ew·f(xt ,v,u)
Machine Learning for Language Technology 25(36)
Structured Prediction
Backward Algorithm
Let βm
u be the symmetric backward scores
i.e., the sum over all labelings of xm . . . xn such that ym = u
We can fill in β as follows:
βn
u = 1.0 ∀u
βm
u =
v
βm+1
v × ew·f(xt ,u,v)
Note: β is overloaded – different from back-pointers
Machine Learning for Language Technology 26(36)
Structured Prediction
Conditional Random Fields - Final
Let’s show we can compute it for arbitrary xt
y ∈Y j=1
P(y |xt)fi (xt, yj−1, yj )
So we can re-write it as:
j=1
αj−1
yj−1
× ew·f(xt ,yj−1,yj )
× βj
yj
Zxt
fi (xt, yj−1, yj )
Forward-backward can calculate partial derivatives efficiently
Machine Learning for Language Technology 27(36)
Structured Prediction
Conditional Random Fields Summary
Inference: Viterbi
Learning: Use the forward-backward algorithm
What about not sequential problems
Context-free parsing – can use inside-outside algorithm
General problems – message passing & belief propagation
Machine Learning for Language Technology 28(36)
Structured Prediction
Case Study: Dependency Parsing
Given an input sentence x, predict syntactic dependencies y
Machine Learning for Language Technology 29(36)
Structured Prediction
Model 1: Arc-Factored Graph-Based Parsing
y = arg max
y
w · f(x, y)
= arg max
y
(i,j)∈y
w · f(i, j)
(i, j) ∈ y means xi → xj , i.e., a dependency from xi to xj
Solving the argmax
w · f(i, j) is weight of arc
A dependency tree is a spanning tree of a dense graph over x
Use max spanning tree algorithms for inference
Machine Learning for Language Technology 30(36)
Structured Prediction
Defining f(i, j)
Can contain any feature over arc or the input sentence
Some example features
Identities of xi and xj
Their part-of-speech tags
The part-of-speech of surrounding words
The distance between xi and xj
...
Machine Learning for Language Technology 31(36)
Structured Prediction
Empirical Results
Spanning tree dependency parsing results (McDonald 2006)
Trained using MIRA (online SVMs)
English Czech Chinese
Accuracy Complete Accuracy Complete Accuracy Complete
90.7 36.7 84.1 32.2 79.7 27.2
Simple structured linear classifier
Near state-of-the-art performance for many languages
Higher-order models give higher accuracy
Machine Learning for Language Technology 32(36)
Structured Prediction
Model 2: Transition-Based Parsing
y = arg max
y
w · f(x, y)
= arg max
y
t(s)∈T(y)
w · f(s, t)
t(s) ∈ T(y) means that the derivation of y includes the
application of transition t to state s
Solving the argmax
w · f(s, t) is score of transition t in state s
Use beam search to find best derivation from start state s0
Machine Learning for Language Technology 33(36)
Structured Prediction
Defining f(s, t)
Can contain any feature over parser states
Some example features
Identities of words in s (e.g., top of stack, head of queue)
Their part-of-speech tags
Their head and dependents (and their part-of-speech tags)
The number of dependents of words in s
...
Machine Learning for Language Technology 34(36)
Structured Prediction
Empirical Results
Transition-based dependency parsing with beam search(**)
(Zhang and Nivre 2011)
Trained using perceptron
English Chinese
Accuracy Complete Accuracy Complete
92.9 48.0 86.0 36.9
Simple structured linear classifier
State-of-the-art performance with rich non-local features
(**) Beam search is a heuristic search algorithm that explores a
graph by expanding the most promising node in a limited set.
Beam search is an optimization of best-first search that reduces its
memory requirements. It only finds an approxiamate solution
Machine Learning for Language Technology 35(36)
Structured Prediction
Structured Prediction Summary
Can’t use multiclass algorithms – search space too large
Solution: factor representations
Can allow for efficient inference and learning
Showed for sequence learning: Viterbi + forward-backward
But also true for other structures
CFG parsing: CKY + inside-outside
Dependency parsing: spanning tree algorithms or beam search
General graphs: message passing and belief propagation
Machine Learning for Language Technology 36(36)

Más contenido relacionado

La actualidad más candente

Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualStéphane Canu
 
Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataAda boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataShadhin Rahman
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slackStéphane Canu
 
Interconnections of hybrid systems
Interconnections of hybrid systemsInterconnections of hybrid systems
Interconnections of hybrid systemsMKosmykov
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learningsimaokasonse
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2zukun
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualStéphane Canu
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes FamilyKota Matsui
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningBig_Data_Ukraine
 
Submodularity slides
Submodularity slidesSubmodularity slides
Submodularity slidesdragonthu
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsNAVER Engineering
 
March12 natarajan
March12 natarajanMarch12 natarajan
March12 natarajanBBKuhn
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Gabriel Peyré
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svmStéphane Canu
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting treeDong Guo
 

La actualidad más candente (20)

Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
Lecture5 kernel svm
Lecture5 kernel svmLecture5 kernel svm
Lecture5 kernel svm
 
Midterm sols
Midterm solsMidterm sols
Midterm sols
 
Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataAda boost brown boost performance with noisy data
Ada boost brown boost performance with noisy data
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slack
 
Interconnections of hybrid systems
Interconnections of hybrid systemsInterconnections of hybrid systems
Interconnections of hybrid systems
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learning
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the Dual
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes Family
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Submodularity slides
Submodularity slidesSubmodularity slides
Submodularity slides
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
 
March12 natarajan
March12 natarajanMarch12 natarajan
March12 natarajan
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
Image denoising
Image denoisingImage denoising
Image denoising
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svm
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 

Destacado

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningbutest
 
PPT slides - MACHINE PERCEPTION LABORATORY
PPT slides - MACHINE PERCEPTION LABORATORYPPT slides - MACHINE PERCEPTION LABORATORY
PPT slides - MACHINE PERCEPTION LABORATORYbutest
 
Slides machine learning festival path pdf
Slides machine learning festival path pdfSlides machine learning festival path pdf
Slides machine learning festival path pdfFabio Bottura
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Machine Learning and Data Mining: 01 Data Mining
Machine Learning and Data Mining: 01 Data MiningMachine Learning and Data Mining: 01 Data Mining
Machine Learning and Data Mining: 01 Data MiningPier Luca Lanzi
 
Machine Learning in Pathology Diagnostics with Simagis Live
Machine Learning in Pathology Diagnostics with Simagis LiveMachine Learning in Pathology Diagnostics with Simagis Live
Machine Learning in Pathology Diagnostics with Simagis Livekhvatkov
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition systemVijay Apurva
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)Board of Innovation
 
The Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsThe Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsXPLAIN
 
Five Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideFive Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideCrispy Presentations
 
How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)Steven Hoober
 
Upworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy
 
What 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureWhat 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureReferralCandy
 
Why Content Marketing Fails
Why Content Marketing FailsWhy Content Marketing Fails
Why Content Marketing FailsRand Fishkin
 
The History of SEO
The History of SEOThe History of SEO
The History of SEOHubSpot
 

Destacado (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
PPT slides - MACHINE PERCEPTION LABORATORY
PPT slides - MACHINE PERCEPTION LABORATORYPPT slides - MACHINE PERCEPTION LABORATORY
PPT slides - MACHINE PERCEPTION LABORATORY
 
Slides machine learning festival path pdf
Slides machine learning festival path pdfSlides machine learning festival path pdf
Slides machine learning festival path pdf
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Machine Learning and Data Mining: 01 Data Mining
Machine Learning and Data Mining: 01 Data MiningMachine Learning and Data Mining: 01 Data Mining
Machine Learning and Data Mining: 01 Data Mining
 
Machine Learning in Pathology Diagnostics with Simagis Live
Machine Learning in Pathology Diagnostics with Simagis LiveMachine Learning in Pathology Diagnostics with Simagis Live
Machine Learning in Pathology Diagnostics with Simagis Live
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
The Minimum Loveable Product
The Minimum Loveable ProductThe Minimum Loveable Product
The Minimum Loveable Product
 
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
 
The Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsThe Seven Deadly Social Media Sins
The Seven Deadly Social Media Sins
 
Five Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideFive Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same Slide
 
How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)
 
Upworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The Internets
 
What 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureWhat 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From Failure
 
Design Your Career 2018
Design Your Career 2018Design Your Career 2018
Design Your Career 2018
 
Why Content Marketing Fails
Why Content Marketing FailsWhy Content Marketing Fails
Why Content Marketing Fails
 
The History of SEO
The History of SEOThe History of SEO
The History of SEO
 

Similar a Machine Learning for Structured Prediction

Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimizationhelalmohammad2
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Test Problems in Optimization
Test Problems in OptimizationTest Problems in Optimization
Test Problems in OptimizationXin-She Yang
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted ParaproductsVjekoslavKovac1
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsUmberto Picchini
 
Hybrid dynamics in large-scale logistics networks
Hybrid dynamics in large-scale logistics networksHybrid dynamics in large-scale logistics networks
Hybrid dynamics in large-scale logistics networksMKosmykov
 
SIAM SEAS Talk Slides
SIAM SEAS Talk SlidesSIAM SEAS Talk Slides
SIAM SEAS Talk SlidesRyan White
 
The Elements of Machine Learning
The Elements of Machine LearningThe Elements of Machine Learning
The Elements of Machine LearningAlexander Jung
 
Langrange Interpolation Polynomials
Langrange Interpolation PolynomialsLangrange Interpolation Polynomials
Langrange Interpolation PolynomialsSohaib H. Khan
 
Module II Partition and Generating Function (2).ppt
Module II Partition and Generating Function (2).pptModule II Partition and Generating Function (2).ppt
Module II Partition and Generating Function (2).pptssuser26e219
 

Similar a Machine Learning for Structured Prediction (20)

Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Test Problems in Optimization
Test Problems in OptimizationTest Problems in Optimization
Test Problems in Optimization
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Hybrid dynamics in large-scale logistics networks
Hybrid dynamics in large-scale logistics networksHybrid dynamics in large-scale logistics networks
Hybrid dynamics in large-scale logistics networks
 
SIAM SEAS Talk Slides
SIAM SEAS Talk SlidesSIAM SEAS Talk Slides
SIAM SEAS Talk Slides
 
The Elements of Machine Learning
The Elements of Machine LearningThe Elements of Machine Learning
The Elements of Machine Learning
 
Langrange Interpolation Polynomials
Langrange Interpolation PolynomialsLangrange Interpolation Polynomials
Langrange Interpolation Polynomials
 
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Module II Partition and Generating Function (2).ppt
Module II Partition and Generating Function (2).pptModule II Partition and Generating Function (2).ppt
Module II Partition and Generating Function (2).ppt
 

Más de Marina Santini

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: SummarizationMarina Santini
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Marina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 

Más de Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 

Último

CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 

Último (20)

CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 

Machine Learning for Structured Prediction

  • 1. Machine Learning for Language Technology Uppsala University Department of Linguistics and Philology Structured Prediction October 2013 Slides borrowed from previous courses. Thanks to Ryan McDonald (Google Research) and Prof. Joakim Nivre Machine Learning for Language Technology 1(36)
  • 2. Structured Prediction Outline Last time: Preliminaries: input/output, features, etc. Linear classifiers Perceptron Large-margin classifiers (SVMs, MIRA) Logistic regression Today: Structured prediction with linear classifiers Structured perceptron Structured large-margin classifiers (SVMs, MIRA) Conditional random fields Case study: Dependency parsing Machine Learning for Language Technology 2(36)
  • 3. Structured Prediction Structured Prediction (i) Sometimes our output space Y does not consist of simple atomic classes Examples: Parsing: for a sentence x, Y is the set of possible parse trees Sequence tagging: for a sentence x, Y is the set of possible tag sequences, e.g., part-of-speech tags, named-entity tags Machine translation: for a source sentence x, Y is the set of possible target language sentences Machine Learning for Language Technology 3(36)
  • 4. Structured Prediction Hidden Markov Models Generative model – maximizes likelihood of P(x, y) We are looking at discriminative versions of this Not just sequences, though that will be the running example Machine Learning for Language Technology 4(36)
  • 5. Structured Prediction Structured Prediction (ii) Can’t we just use our multiclass learning algorithms? In all the cases, the size of the set Y is exponential in the length of the input x It is non-trivial to apply our learning algorithms in such cases Machine Learning for Language Technology 5(36)
  • 6. Structured Prediction Perceptron Training data: T = {(xt, yt)} |T | t=1 1. w(0) = 0; i = 0 2. for n : 1..N 3. for t : 1..T 4. Let y = arg maxy w(i) · f(xt, y) (**) 5. if y = yt 6. w(i+1) = w(i) + f(xt, yt) − f(xt, y ) 7. i = i + 1 8. return wi (**) Solving the argmax requires a search over an exponential space of outputs! Machine Learning for Language Technology 6(36)
  • 7. Structured Prediction Large-Margin Classifiers Batch (SVMs): min 1 2 ||w||2 such that: w·f(xt, yt)−w·f(xt, y ) ≥ 1 ∀(xt, yt) ∈ T and y ∈ ¯Yt (**) Online (MIRA): Training data: T = {(xt, yt)} |T | t=1 1. w(0) = 0; i = 0 2. for n : 1..N 3. for t : 1..T 4. w(i+1) = arg minw* w* − w(i) such that: w · f(xt, yt) − w · f(xt, y ) ≥ 1 ∀y ∈ ¯Yt (**) 5. i = i + 1 6. return wi (**) There are exponential constraints in the size of each input!! Machine Learning for Language Technology 7(36)
  • 8. Structured Prediction Factor the Feature Representations We can make an assumption that our feature representations factor relative to the output Example: Context-free parsing: f(x, y) = A→BC∈y f(x, A → BC) Sequence analysis – Markov assumptions: f(x, y) = |y| i=1 f(x, yi−1, yi ) These kinds of factorizations allow us to run algorithms like CKY and Viterbi to compute the argmax function Machine Learning for Language Technology 8(36)
  • 9. Structured Prediction Example – Sequence Labeling Many NLP problems can be cast in this light Part-of-speech tagging Named-entity extraction Semantic role labeling ... Input: x = x0x1 . . . xn Output: y = y0y1 . . . yn Each yi ∈ Yatom – which is small Each y ∈ Y = Yn atom – which is large Example: part-of-speech tagging – Yatom is set of tags x = John saw Mary with the telescope y = noun verb noun preposition article noun Machine Learning for Language Technology 9(36)
  • 10. Structured Prediction Sequence Labeling – Output Interaction x = John saw Mary with the telescope y = noun verb noun preposition article noun Why not just break up sequence into a set of multi-class predictions? Because there are interactions between neighbouring tags What tag does“saw”have? What if I told you the previous tag was article? What if it was noun? Machine Learning for Language Technology 10(36)
  • 11. Structured Prediction Sequence Labeling – Markov Factorization x = John saw Mary with the telescope y = noun verb noun preposition article noun Markov factorization – factor by adjacent labels First-order (like HMMs) f(x, y) = |y| i=1 f(x, yi−1, yi ) kth-order f(x, y) = |y| i=k f(x, yi−k, . . . , yi−1, yi ) Machine Learning for Language Technology 11(36)
  • 12. Structured Prediction Sequence Labeling – Features x = John saw Mary with the telescope y = noun verb noun preposition article noun First-order f(x, y) = |y| i=1 f(x, yi−1, yi ) f(x, yi−1, yi ) is any feature of the input & two adjacent labels fj (x, yi−1, yi ) = 8 < : 1 if xi = “saw” and yi−1 = noun and yi = verb 0 otherwise fj (x, yi−1, yi ) = 8 < : 1 if xi = “saw” and yi−1 = article and yi = verb 0 otherwise wj should get high weight and wj should get low weight Machine Learning for Language Technology 12(36)
  • 13. Structured Prediction Sequence Labeling - Inference How does factorization effect inference? y = arg max y w · f(x, y) = arg max y w · |y| i=1 f(x, yi−1, yi ) = arg max y |y| i=1 w · f(x, yi−1, yi ) = arg max y |y| i=1 m j=1 wj · fj (x, yi−1, yi ) Can use the Viterbi algorithm Machine Learning for Language Technology 13(36)
  • 14. Structured Prediction Sequence Labeling – Viterbi Algorithm Let αy,i be the score of the best labeling Of the sequence x0x1 . . . xi Where yi = y Let’s say we know α, then maxy αy,n is the score of the best labeling of the sequence αy,i can be calculated with the following recursion αy,0 = 0.0 ∀y ∈ Yatom αy,i = max y∗ αy∗,i−1 + w · f(x, y∗, y) Machine Learning for Language Technology 14(36)
  • 15. Structured Prediction Sequence Labeling - Back-Pointers But that only tells us what the best score is Let βy,i be the ith label in the best labeling Of the sequence x0x1 . . . xi Where yi = y βy,i can be calculated with the following recursion βy,0 = nil ∀y ∈ Yatom βy,i = arg max y∗ αy∗,i−1 + w · f(x, y∗, y) Thus: The last label in the best sequence is yn = arg maxy αy,n The second-to-last label is yn−1 = βyn,n ... The first label is y0 = βy1,1 Machine Learning for Language Technology 15(36)
  • 16. Structured Prediction Structured Learning We know we can solve the inference problem At least for sequence labeling But also for many other problems where one can factor features appropriately (context-free parsing, dependency parsing, semantic role labeling, . . . ) How does this change learning? for the perceptron algorithm? for SVMs? for logistic regression? Machine Learning for Language Technology 16(36)
  • 17. Structured Prediction Structured Perceptron Exactly like original perceptron Except now the argmax function uses factored features Which we can solve with algorithms like the Viterbi algorithm All of the original analysis carries over!! 1. w(0) = 0; i = 0 2. for n : 1..N 3. for t : 1..T 4. Let y = arg maxy w(i) · f(xt , y) (**) 5. if y = yt 6. w(i+1) = w(i) + f(xt , yt ) − f(xt , y ) 7. i = i + 1 8. return wi (**) Solve the argmax with Viterbi for sequence problems! Machine Learning for Language Technology 17(36)
  • 18. Structured Prediction Online Structured SVMs (or Online MIRA) 1. w(0) = 0; i = 0 2. for n : 1..N 3. for t : 1..T 4. w(i+1) = arg minw* ‚ ‚w* − w(i) ‚ ‚ such that: w · f(xt , yt ) − w · f(xt , y ) ≥ L(yt , y ) ∀y ∈ ¯Yt and y ∈ k-best(xt , w(i)) 5. i = i + 1 6. return wi k-best(xt) is set of k outputs with highest scores using w(i) Simple solution – only consider highest-scoring output y ∈ ¯Yt Note: Old fixed margin of 1 is now a fixed loss L(yt, y ) between two structured outputs Machine Learning for Language Technology 18(36)
  • 19. Structured Prediction Structured SVMs min 1 2 ||w||2 such that: w · f(xt, yt) − w · f(xt, y ) ≥ L(yt, y ) ∀(xt, yt) ∈ T and y ∈ ¯Yt Still have an exponential number of constraints Feature factorizations permit solutions (max-margin Markov networks, structured SVMs) Note: Old fixed margin of 1 is now a fixed loss L(yt, y ) between two structured outputs Machine Learning for Language Technology 19(36)
  • 20. Structured Prediction Conditional Random Fields (i) What about structured logistic regression? Such a thing exists – Conditional Random Fields (CRFs) Consider again the sequential case with 1st order factorization Inference is identical to the structured perceptron – use Viterbi arg max y P(y|x) = arg max y ew·f(x,y) Zx = arg max y ew·f(x,y) = arg max y w · f(x, y) = arg max y X i=1 w · f(x, yi−1, yi ) Machine Learning for Language Technology 20(36)
  • 21. Structured Prediction Conditional Random Fields (ii) However, learning does change Reminder: pick w to maximize log-likelihood of training data: w = arg max w t log P(yt|xt) Take gradient and use gradient ascent ∂ ∂wi F(w) = t fi (xt, yt) − t y ∈Y P(y |xt)fi (xt, y ) And the gradient is: F(w) = ( ∂ ∂w0 F(w), ∂ ∂w1 F(w), . . . , ∂ ∂wm F(w)) Machine Learning for Language Technology 21(36)
  • 22. Structured Prediction Conditional Random Fields (iii) Problem: sum over output space Y ∂ ∂wi F(w) = X t fi (xt , yt ) − X t X y ∈Y P(y |xt )fi (xt , y ) = X t X j=1 fi (xt , yt,j−1, yt,j ) − X t X y ∈Y X j=1 P(y |xt )fi (xt , yj−1, yj ) Can easily calculate first term – just empirical counts What about the second term? Machine Learning for Language Technology 22(36)
  • 23. Structured Prediction Conditional Random Fields (iv) Problem: sum over output space Y t y ∈Y j=1 P(y |xt)fi (xt, yj−1, yj ) We need to show we can compute it for arbitrary xt y ∈Y j=1 P(y |xt)fi (xt, yj−1, yj ) Solution: the forward-backward algorithm Machine Learning for Language Technology 23(36)
  • 24. Structured Prediction Forward Algorithm (i) Let αm u be the forward scores Let |xt| = n αm u is the sum over all labelings of x0 . . . xm such that ym = u αm u = |y |=m, ym=u ew·f(xt ,y ) = |y |=m ym=u e P j=1 w·f(xt ,yj−1,yj ) i.e., the sum of all labelings of length m, ending at position m with label u Note then that Zxt = y ew·f(xt ,y ) = u αn u Machine Learning for Language Technology 24(36)
  • 25. Structured Prediction Forward Algorithm (ii) We can fill in α as follows: α0 u = 1.0 ∀u αm u = v αm−1 v × ew·f(xt ,v,u) Machine Learning for Language Technology 25(36)
  • 26. Structured Prediction Backward Algorithm Let βm u be the symmetric backward scores i.e., the sum over all labelings of xm . . . xn such that ym = u We can fill in β as follows: βn u = 1.0 ∀u βm u = v βm+1 v × ew·f(xt ,u,v) Note: β is overloaded – different from back-pointers Machine Learning for Language Technology 26(36)
  • 27. Structured Prediction Conditional Random Fields - Final Let’s show we can compute it for arbitrary xt y ∈Y j=1 P(y |xt)fi (xt, yj−1, yj ) So we can re-write it as: j=1 αj−1 yj−1 × ew·f(xt ,yj−1,yj ) × βj yj Zxt fi (xt, yj−1, yj ) Forward-backward can calculate partial derivatives efficiently Machine Learning for Language Technology 27(36)
  • 28. Structured Prediction Conditional Random Fields Summary Inference: Viterbi Learning: Use the forward-backward algorithm What about not sequential problems Context-free parsing – can use inside-outside algorithm General problems – message passing & belief propagation Machine Learning for Language Technology 28(36)
  • 29. Structured Prediction Case Study: Dependency Parsing Given an input sentence x, predict syntactic dependencies y Machine Learning for Language Technology 29(36)
  • 30. Structured Prediction Model 1: Arc-Factored Graph-Based Parsing y = arg max y w · f(x, y) = arg max y (i,j)∈y w · f(i, j) (i, j) ∈ y means xi → xj , i.e., a dependency from xi to xj Solving the argmax w · f(i, j) is weight of arc A dependency tree is a spanning tree of a dense graph over x Use max spanning tree algorithms for inference Machine Learning for Language Technology 30(36)
  • 31. Structured Prediction Defining f(i, j) Can contain any feature over arc or the input sentence Some example features Identities of xi and xj Their part-of-speech tags The part-of-speech of surrounding words The distance between xi and xj ... Machine Learning for Language Technology 31(36)
  • 32. Structured Prediction Empirical Results Spanning tree dependency parsing results (McDonald 2006) Trained using MIRA (online SVMs) English Czech Chinese Accuracy Complete Accuracy Complete Accuracy Complete 90.7 36.7 84.1 32.2 79.7 27.2 Simple structured linear classifier Near state-of-the-art performance for many languages Higher-order models give higher accuracy Machine Learning for Language Technology 32(36)
  • 33. Structured Prediction Model 2: Transition-Based Parsing y = arg max y w · f(x, y) = arg max y t(s)∈T(y) w · f(s, t) t(s) ∈ T(y) means that the derivation of y includes the application of transition t to state s Solving the argmax w · f(s, t) is score of transition t in state s Use beam search to find best derivation from start state s0 Machine Learning for Language Technology 33(36)
  • 34. Structured Prediction Defining f(s, t) Can contain any feature over parser states Some example features Identities of words in s (e.g., top of stack, head of queue) Their part-of-speech tags Their head and dependents (and their part-of-speech tags) The number of dependents of words in s ... Machine Learning for Language Technology 34(36)
  • 35. Structured Prediction Empirical Results Transition-based dependency parsing with beam search(**) (Zhang and Nivre 2011) Trained using perceptron English Chinese Accuracy Complete Accuracy Complete 92.9 48.0 86.0 36.9 Simple structured linear classifier State-of-the-art performance with rich non-local features (**) Beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set. Beam search is an optimization of best-first search that reduces its memory requirements. It only finds an approxiamate solution Machine Learning for Language Technology 35(36)
  • 36. Structured Prediction Structured Prediction Summary Can’t use multiclass algorithms – search space too large Solution: factor representations Can allow for efficient inference and learning Showed for sequence learning: Viterbi + forward-backward But also true for other structures CFG parsing: CKY + inside-outside Dependency parsing: spanning tree algorithms or beam search General graphs: message passing and belief propagation Machine Learning for Language Technology 36(36)