SlideShare a Scribd company logo
1 of 26
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty, Andrew McCallum, Fernando Pereira Speaker : Shu-Ying Li 1
Outline Introduction Conditional Random Fields Parameter Estimated for CRFs Experiments Conclusions 2
Introduction Sequence Segmenting and Labeling ,[object Object],Generative Models ,[object Object]
Assign a joint probability to paired observation and label sequences
The parameters typically trained to maximize the joint likelihood of train examplesSt-1 St St+1 Ot Ot+1 3
Introduction(cont.) Conditional Model ,[object Object]
Allow arbitrary, non-independent features of the observation sequence X.
The probability of a transition between labels may depend on past and feature observations.Maximum Entropy Markov Models (MEMMs) St-1 St St+1 ... Ot Ot+1 Ot-1 4
Introduction(cont.) The Label Bias Problem: ,[object Object],Pr(1 and 2|ro) = Pr(2|1,ro)Pr(1,ro) = Pr(2| 1,o)Pr(1,r) Pr(1 and 2|ri) =  Pr(2|1,ri)Pr(1,ri)  =  Pr(2| 1,i)Pr(1,r) Pr(2|1,o) = Pr(2|1,r) = 1 Pr(1 and 2|ro) = Pr(1 and 2|ri)  But it should be Pr(1 and 2|ro) < Pr(1 and 2|ri)!  5
Introduction(cont.) Solve the Label Bias Problem Change the state-transition structure of the model Start with fully-connected model and let the training procedure figure out a good structure. 6
Conditional Random Fields Random Field ,[object Object],Example : ,[object Object],7
Conditional Random Fields Suppose P(Yv| X, all other Y) = P(Yv|X, neighbors(Yv)) then X with Y is a conditional random field ,[object Object]
P(Y3 | X, all other Y) = P(Y3 |X, Y2, Y4)X = X1,…, Xn-1, Xn 8
Conditional Random Fields 9 Conditional Distribution[2] ,[object Object]
sk(yi, x, i) is a state feature function of the label at position i and the observation sequence
λkand μkare parameters to be estimated from training data.Conditional Distribution[1] ,[object Object]
y : label sequence
v : vertex from vertex set V
e : edge from edge set E over V
fk: Boolean vertex feature; gk : Boolean edge feature
k : the number of features
λk and μk are parameters to be estimated
y|e is the set of components of y defined by edge e
y|v is the set of components of y defined by vertex vYt-1 Yt Yt+1 ... Xt Xt+1 Xt-1
Conditional Random Fields Conditional Distribution ,[object Object]
Z(x) is a normalization over the data sequence x

More Related Content

What's hot

Hidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathHidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathLê Hòa
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)KU Leuven
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoostJoonyoung Yi
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Modelspetitegeek
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methodsChristian Robert
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Lecture 1 graphical models
Lecture 1  graphical modelsLecture 1  graphical models
Lecture 1 graphical modelsDuy Tung Pham
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 

What's hot (20)

Hidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathHidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable Path
 
Hidden Markov Model
Hidden Markov Model Hidden Markov Model
Hidden Markov Model
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Lecture 1 graphical models
Lecture 1  graphical modelsLecture 1  graphical models
Lecture 1 graphical models
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 

Similar to Conditional Random Fields

Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..butest
 
Chapter_09_ParameterEstimation.pptx
Chapter_09_ParameterEstimation.pptxChapter_09_ParameterEstimation.pptx
Chapter_09_ParameterEstimation.pptxVimalMehta19
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
2012 mdsp pr06  hmm
2012 mdsp pr06  hmm2012 mdsp pr06  hmm
2012 mdsp pr06  hmmnozomuhamada
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
 
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...Satoshi Kura
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleAlexander Litvinenko
 
20070823
2007082320070823
20070823neostar
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 

Similar to Conditional Random Fields (20)

Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..
 
Chapter_09_ParameterEstimation.pptx
Chapter_09_ParameterEstimation.pptxChapter_09_ParameterEstimation.pptx
Chapter_09_ParameterEstimation.pptx
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Section6 stochastic
Section6 stochasticSection6 stochastic
Section6 stochastic
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
3_MLE_printable.pdf
3_MLE_printable.pdf3_MLE_printable.pdf
3_MLE_printable.pdf
 
2012 mdsp pr06  hmm
2012 mdsp pr06  hmm2012 mdsp pr06  hmm
2012 mdsp pr06  hmm
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
simpl_nie_engl
simpl_nie_englsimpl_nie_engl
simpl_nie_engl
 
20070823
2007082320070823
20070823
 
Hmm and neural networks
Hmm and neural networksHmm and neural networks
Hmm and neural networks
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Conditional Random Fields

  • 1. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty, Andrew McCallum, Fernando Pereira Speaker : Shu-Ying Li 1
  • 2. Outline Introduction Conditional Random Fields Parameter Estimated for CRFs Experiments Conclusions 2
  • 3.
  • 4. Assign a joint probability to paired observation and label sequences
  • 5. The parameters typically trained to maximize the joint likelihood of train examplesSt-1 St St+1 Ot Ot+1 3
  • 6.
  • 7. Allow arbitrary, non-independent features of the observation sequence X.
  • 8. The probability of a transition between labels may depend on past and feature observations.Maximum Entropy Markov Models (MEMMs) St-1 St St+1 ... Ot Ot+1 Ot-1 4
  • 9.
  • 10. Introduction(cont.) Solve the Label Bias Problem Change the state-transition structure of the model Start with fully-connected model and let the training procedure figure out a good structure. 6
  • 11.
  • 12.
  • 13. P(Y3 | X, all other Y) = P(Y3 |X, Y2, Y4)X = X1,…, Xn-1, Xn 8
  • 14.
  • 15. sk(yi, x, i) is a state feature function of the label at position i and the observation sequence
  • 16.
  • 17. y : label sequence
  • 18. v : vertex from vertex set V
  • 19. e : edge from edge set E over V
  • 20. fk: Boolean vertex feature; gk : Boolean edge feature
  • 21. k : the number of features
  • 22. λk and μk are parameters to be estimated
  • 23. y|e is the set of components of y defined by edge e
  • 24. y|v is the set of components of y defined by vertex vYt-1 Yt Yt+1 ... Xt Xt+1 Xt-1
  • 25.
  • 26. Z(x) is a normalization over the data sequence x
  • 27. [1] :
  • 28. [2] : where each fj(yi-1, yi, x, i) is either a state function s(yi-1, yi, x, i) or a transition function t(yi-1, yi, x, i). 10
  • 29.
  • 30. Y’ and y are labels drawn from this alphabet.
  • 31. Define a set of n+1 matrices {Mi(x)|i=1,…,n+1}, where each Mi(x) is a matrix with elements of the form= exp ( ) 11
  • 32. Conditional Random Fields The normalization function is the (start, end) entry of the product of these matrices. The conditional probability of label sequence y is: [1] [2] where, y0 = start and yn+1 = end 12
  • 33. Parameter Estimated for CRFs Problem definition : determine the parameters θ= (λ1,λ2,…;μ1,μ2…). Goal : maximize the log-likelihood objective function. 13 [1] br />where is the empirical distribution of training data. This function is concave, guaranteeing convergence to the global maximum. [2] Ep[‧]denotes expectation with respect to distribution p
  • 34.
  • 35. δλk for edge feature fk is the solution of
  • 36. Efficiently computing the exponential sums on the right-hand sides of the these equations is problematic.->Because T(x, y) is a global property of (x, y) and dynamic programming will sum over sequence with potentially varying T. Dynamic Programming [2]
  • 37. Parameter Estimated for CRFs For each index i=0,…,n+1, we define forward vectors αi(x) and backward vectors βi(x) : [1] : [2]: 15
  • 38.
  • 39.
  • 40. Where S is a constant chosen so that s(x(i) , y) 0 for all y and all observation vectors x(i) in the training set
  • 42. Feature s is “global” : it does not correspond to any particular edge or vertex.16
  • 43. Parameter Estimated for CRFs Algorithm S [1] where δλk s = = = 17
  • 44. Parameter Estimated for CRFs Algorithm S [1] The constant S in algorithm S can be quite large, since in practice it is proportional to the length of the longest training observation sequence. The algorithm may converge slowly, taking very small steps toward the maximum in each iteration. 18
  • 45.
  • 46. Use forward-back ward recurrences to compute the expectations ak,t of feature fk and bk,t of feature gk given that T(x) = t.βk and γk are the unique positive roots to the following polynomial equations. which can be easily computed by Newton’s method. 19
  • 47.
  • 50. CRFs solve the label bias problem.20
  • 51.
  • 52. MEMMs converge in 100 iterations.MEMMs vs. HMM 21
  • 54.
  • 55. When the data is mostlysecond order   ½, the discriminatively trained CRF usually outperforms the MEMM23
  • 56.
  • 57. Data set: Penn Tree bank
  • 59. Use the optimal MEMM parameter vector as a starting point for training the corresponding CRF to accelerate convergence speed.24
  • 60. Conclusions Discriminatively trained models for sequence segmentation and labeling. Combination of arbitrary, overlapping and agglomerative observation features from both the past and future. Efficient training and decoding based on dynamic programming. Parameter estimation guaranteed to find the global optimum. 25
  • 61. Reference 26 J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilisticmodels for segmenting and labeling sequence data. In InternationalConference on Machine Learning, 2001. Hanna M. Wallach. Conditional Random Fields: An Introduction. University of Pennsylvania CIS Technical Report MS-CIS-04-21. 參考投影片(by RongkunShen)