SlideShare una empresa de Scribd logo
1 de 4
Descargar para leer sin conexión
CS229 Lecture notes 
Andrew Ng 
Mixtures of Gaussians and the EM algorithm 
In this set of notes, we discuss the EM (Expectation-Maximization) for den- 
sity estimation. 
Suppose that we are given a training set {x(1), . . . , x(m)} as usual. Since 
we are in the unsupervised learning setting, these points do not come with 
any labels. 
We wish to model the data by specifying a joint distribution p(x(i), z(i)) = 
p(x(i)|z(i))p(z(i)). Here, z(i)  Multinomial(φ) (where φj  0, Pk 
j=1 φj = 1, 
and the parameter φj gives p(z(i) = j),), and x(i)|z(i) = j  N(μj,j). We 
let k denote the number of values that the z(i)’s can take on. Thus, our 
model posits that each x(i) was generated by randomly choosing z(i) from 
{1, . . . , k}, and then x(i) was drawn from one of k Gaussians depending on 
z(i). This is called the mixture of Gaussians model. Also, note that the 
z(i)’s are latent random variables, meaning that they’re hidden/unobserved. 
This is what will make our estimation problem difficult. 
The parameters of our model are thus φ, μ and . To estimate them, we 
can write down the likelihood of our data: 
ℓ(φ, μ,) = 
m 
X 
i=1 
log p(x(i); φ, μ,) 
= 
m 
X 
i=1 
log 
k 
X 
z(i)=1 
p(x(i)|z(i); μ,)p(z(i); φ). 
However, if we set to zero the derivatives of this formula with respect to 
the parameters and try to solve, we’ll find that it is not possible to find the 
maximum likelihood estimates of the parameters in closed form. (Try this 
yourself at home.) 
The random variables z(i) indicate which of the k Gaussians each x(i) 
had come from. Note that if we knew what the z(i)’s were, the maximum 
1
2 
likelihood problem would have been easy. Specifically, we could then write 
down the likelihood as 
ℓ(φ, μ,) = 
m 
X 
i=1 
log p(x(i)|z(i); μ,) + log p(z(i); φ). 
Maximizing this with respect to φ, μ and  gives the parameters: 
φj = 
1 
m 
m 
X 
i=1 
1{z(i) = j}, 
μj = Pm 
1{z(i) = j}x(i) 
i=1 Pm 
1{z(i) = j} 
i=1 , 
j = Pm 
i=1 1{z(i) = j}(x(i) − μj)(x(i) − μj)T 
Pm 
i=1 1{z(i) = j} 
. 
Indeed, we see that if the z(i)’s were known, then maximum likelihood 
estimation becomes nearly identical to what we had when estimating the 
parameters of the Gaussian discriminant analysis model, except that here 
the z(i)’s playing the role of the class labels.1 
However, in our density estimation problem, the z(i)’s are not known. 
What can we do? 
The EM algorithm is an iterative algorithm that has two main steps. 
Applied to our problem, in the E-step, it tries to “guess” the values of the 
z(i)’s. In the M-step, it updates the parameters of our model based on our 
guesses. Since in the M-step we are pretending that the guesses in the first 
part were correct, the maximization becomes easy. Here’s the algorithm: 
Repeat until convergence: { 
(E-step) For each i, j, set 
w(i) 
j := p(z(i) = j|x(i); φ, μ,) 
1There are other minor differences in the formulas here from what we’d obtained in 
PS1 with Gaussian discriminant analysis, first because we’ve generalized the z(i)’s to be 
multinomial rather than Bernoulli, and second because here we are using a different j 
for each Gaussian.
3 
(M-step) Update the parameters: 
φj := 
1 
m 
m 
X 
i=1 
w(i) 
j , 
i=1 w(i) 
μj := Pm 
j x(i) 
i=1 w(i) 
Pm 
j 
, 
i=1 w(i) 
j := Pm 
j (x(i) − μj)(x(i) − μj)T 
i=1 w(i) 
Pm 
j 
} 
In the E-step, we calculate the posterior probability of our parameters 
the z(i)’s, given the x(i) and using the current setting of our parameters. I.e., 
using Bayes rule, we obtain: 
p(z(i) = j|x(i); φ, μ,) = 
p(x(i)|z(i) = j; μ,)p(z(i) = j; φ) 
Pk 
l=1 p(x(i)|z(i) = l; μ,)p(z(i) = l; φ) 
Here, p(x(i)|z(i) = j; μ,) is given by evaluating the density of a Gaussian 
with mean μj and covariance j at x(i); p(z(i) = j; φ) is given by φj , and so 
on. The values w(i) 
j calculated in the E-step represent our “soft” guesses2 for 
the values of z(i). 
Also, you should contrast the updates in the M-step with the formulas we 
had when the z(i)’s were known exactly. They are identical, except that in- 
stead of the indicator functions “1{z(i) = j}” indicating from which Gaussian 
each datapoint had come, we now instead have the w(i) 
j ’s. 
The EM-algorithm is also reminiscent of the K-means clustering algo- 
rithm, except that instead of the “hard” cluster assignments c(i), we instead 
have the “soft” assignments w(i) 
j . Similar to K-means, it is also susceptible 
to local optima, so reinitializing at several different initial parameters may 
be a good idea. 
It’s clear that the EM algorithm has a very natural interpretation of 
repeatedly trying to guess the unknown z(i)’s; but how did it come about, 
and can we make any guarantees about it, such as regarding its convergence? 
In the next set of notes, we will describe a more general view of EM, one 
2The term “soft” refers to our guesses being probabilities and taking values in [0, 1]; in 
contrast, a “hard” guess is one that represents a single best guess (such as taking values 
in {0, 1} or {1, . . . , k}).
4 
that will allow us to easily apply it to other estimation problems in which 
there are also latent variables, and which will allow us to give a convergence 
guarantee.

Más contenido relacionado

La actualidad más candente

On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCADmitrii Ignatov
 
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastavaBIOLOGICAL FORUM
 
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...RyotaroTsukada
 
Interpolation In Numerical Methods.
 Interpolation In Numerical Methods. Interpolation In Numerical Methods.
Interpolation In Numerical Methods.Abu Kaisar
 
limits and continuity
limits and continuity limits and continuity
limits and continuity imran khan
 
Limits and Continuity - Intuitive Approach part 3
Limits and Continuity - Intuitive Approach part 3Limits and Continuity - Intuitive Approach part 3
Limits and Continuity - Intuitive Approach part 3FellowBuddy.com
 
Partial derivatives love coffee.key
Partial derivatives love coffee.keyPartial derivatives love coffee.key
Partial derivatives love coffee.keyguest1a1479
 
2.3 Operations that preserve convexity & 2.4 Generalized inequalities
2.3 Operations that preserve convexity & 2.4 Generalized inequalities2.3 Operations that preserve convexity & 2.4 Generalized inequalities
2.3 Operations that preserve convexity & 2.4 Generalized inequalitiesRyotaroTsukada
 
limits and continuity
limits and continuitylimits and continuity
limits and continuityElias Dinsa
 
Indefinite and Definite Integrals Using the Substitution Method
Indefinite and Definite Integrals Using the Substitution MethodIndefinite and Definite Integrals Using the Substitution Method
Indefinite and Definite Integrals Using the Substitution MethodScott Bailey
 
Review 1 -_limits-_continuity_(pcalc+_to_ap_calc)
Review 1 -_limits-_continuity_(pcalc+_to_ap_calc)Review 1 -_limits-_continuity_(pcalc+_to_ap_calc)
Review 1 -_limits-_continuity_(pcalc+_to_ap_calc)Ron Eick
 
A043001006
A043001006A043001006
A043001006inventy
 
Lesson 2: Limits and Limit Laws
Lesson 2: Limits and Limit LawsLesson 2: Limits and Limit Laws
Lesson 2: Limits and Limit LawsMatthew Leingang
 
Continuity of functions by graph (exercises with detailed solutions)
Continuity of functions by graph   (exercises with detailed solutions)Continuity of functions by graph   (exercises with detailed solutions)
Continuity of functions by graph (exercises with detailed solutions)Tarun Gehlot
 

La actualidad más candente (19)

On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCA
 
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
 
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
 
Interpolation In Numerical Methods.
 Interpolation In Numerical Methods. Interpolation In Numerical Methods.
Interpolation In Numerical Methods.
 
limits and continuity
limits and continuity limits and continuity
limits and continuity
 
L7 fuzzy relations
L7 fuzzy relationsL7 fuzzy relations
L7 fuzzy relations
 
Limits and Continuity - Intuitive Approach part 3
Limits and Continuity - Intuitive Approach part 3Limits and Continuity - Intuitive Approach part 3
Limits and Continuity - Intuitive Approach part 3
 
Partial derivatives love coffee.key
Partial derivatives love coffee.keyPartial derivatives love coffee.key
Partial derivatives love coffee.key
 
Lesson 5: Continuity
Lesson 5: ContinuityLesson 5: Continuity
Lesson 5: Continuity
 
2.3 Operations that preserve convexity & 2.4 Generalized inequalities
2.3 Operations that preserve convexity & 2.4 Generalized inequalities2.3 Operations that preserve convexity & 2.4 Generalized inequalities
2.3 Operations that preserve convexity & 2.4 Generalized inequalities
 
limits and continuity
limits and continuitylimits and continuity
limits and continuity
 
Limit and continuity (2)
Limit and continuity (2)Limit and continuity (2)
Limit and continuity (2)
 
1551 limits and continuity
1551 limits and continuity1551 limits and continuity
1551 limits and continuity
 
Ch7
Ch7Ch7
Ch7
 
Indefinite and Definite Integrals Using the Substitution Method
Indefinite and Definite Integrals Using the Substitution MethodIndefinite and Definite Integrals Using the Substitution Method
Indefinite and Definite Integrals Using the Substitution Method
 
Review 1 -_limits-_continuity_(pcalc+_to_ap_calc)
Review 1 -_limits-_continuity_(pcalc+_to_ap_calc)Review 1 -_limits-_continuity_(pcalc+_to_ap_calc)
Review 1 -_limits-_continuity_(pcalc+_to_ap_calc)
 
A043001006
A043001006A043001006
A043001006
 
Lesson 2: Limits and Limit Laws
Lesson 2: Limits and Limit LawsLesson 2: Limits and Limit Laws
Lesson 2: Limits and Limit Laws
 
Continuity of functions by graph (exercises with detailed solutions)
Continuity of functions by graph   (exercises with detailed solutions)Continuity of functions by graph   (exercises with detailed solutions)
Continuity of functions by graph (exercises with detailed solutions)
 

Destacado

Computer network (3)
Computer network (3)Computer network (3)
Computer network (3)NYversity
 
Programming methodology lecture11
Programming methodology lecture11Programming methodology lecture11
Programming methodology lecture11NYversity
 
Computer network (18)
Computer network (18)Computer network (18)
Computer network (18)NYversity
 
Programming methodology lecture03
Programming methodology lecture03Programming methodology lecture03
Programming methodology lecture03NYversity
 
Programming methodology lecture26
Programming methodology lecture26Programming methodology lecture26
Programming methodology lecture26NYversity
 
Database system
Database systemDatabase system
Database systemNYversity
 
Computer network (2)
Computer network (2)Computer network (2)
Computer network (2)NYversity
 
Programming methodology lecture27
Programming methodology lecture27Programming methodology lecture27
Programming methodology lecture27NYversity
 
Programming methodology lecture23
Programming methodology lecture23Programming methodology lecture23
Programming methodology lecture23NYversity
 
Programming methodology lecture06
Programming methodology lecture06Programming methodology lecture06
Programming methodology lecture06NYversity
 
Data structures
Data structuresData structures
Data structuresNYversity
 
Programming methodology lecture17
Programming methodology lecture17Programming methodology lecture17
Programming methodology lecture17NYversity
 
Programming methodology lecture14
Programming methodology lecture14Programming methodology lecture14
Programming methodology lecture14NYversity
 
Design patterns
Design patternsDesign patterns
Design patternsNYversity
 
Machine learning (3)
Machine learning (3)Machine learning (3)
Machine learning (3)NYversity
 
Computer network (5)
Computer network (5)Computer network (5)
Computer network (5)NYversity
 
Computer network (15)
Computer network (15)Computer network (15)
Computer network (15)NYversity
 
Computer network (10)
Computer network (10)Computer network (10)
Computer network (10)NYversity
 
Computer network (14)
Computer network (14)Computer network (14)
Computer network (14)NYversity
 

Destacado (19)

Computer network (3)
Computer network (3)Computer network (3)
Computer network (3)
 
Programming methodology lecture11
Programming methodology lecture11Programming methodology lecture11
Programming methodology lecture11
 
Computer network (18)
Computer network (18)Computer network (18)
Computer network (18)
 
Programming methodology lecture03
Programming methodology lecture03Programming methodology lecture03
Programming methodology lecture03
 
Programming methodology lecture26
Programming methodology lecture26Programming methodology lecture26
Programming methodology lecture26
 
Database system
Database systemDatabase system
Database system
 
Computer network (2)
Computer network (2)Computer network (2)
Computer network (2)
 
Programming methodology lecture27
Programming methodology lecture27Programming methodology lecture27
Programming methodology lecture27
 
Programming methodology lecture23
Programming methodology lecture23Programming methodology lecture23
Programming methodology lecture23
 
Programming methodology lecture06
Programming methodology lecture06Programming methodology lecture06
Programming methodology lecture06
 
Data structures
Data structuresData structures
Data structures
 
Programming methodology lecture17
Programming methodology lecture17Programming methodology lecture17
Programming methodology lecture17
 
Programming methodology lecture14
Programming methodology lecture14Programming methodology lecture14
Programming methodology lecture14
 
Design patterns
Design patternsDesign patterns
Design patterns
 
Machine learning (3)
Machine learning (3)Machine learning (3)
Machine learning (3)
 
Computer network (5)
Computer network (5)Computer network (5)
Computer network (5)
 
Computer network (15)
Computer network (15)Computer network (15)
Computer network (15)
 
Computer network (10)
Computer network (10)Computer network (10)
Computer network (10)
 
Computer network (14)
Computer network (14)Computer network (14)
Computer network (14)
 

Similar a Machine learning (8)

Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jungkyu Lee
 
Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)NYversity
 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)NYversity
 
Cs229 notes8
Cs229 notes8Cs229 notes8
Cs229 notes8VuTran231
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)NYversity
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesEric Conner
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9VuTran231
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)NYversity
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmLoc Nguyen
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsAlexander Litvinenko
 
Conditional mixture model and its application for regression model
Conditional mixture model and its application for regression modelConditional mixture model and its application for regression model
Conditional mixture model and its application for regression modelLoc Nguyen
 
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionA Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionKonkuk University, Korea
 

Similar a Machine learning (8) (20)

Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘
 
Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)
 
Cs229 notes8
Cs229 notes8Cs229 notes8
Cs229 notes8
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)
 
3_MLE_printable.pdf
3_MLE_printable.pdf3_MLE_printable.pdf
3_MLE_printable.pdf
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEs
 
Conditional mixture model and its application for regression model
Conditional mixture model and its application for regression modelConditional mixture model and its application for regression model
Conditional mixture model and its application for regression model
 
Statistical Method In Economics
Statistical Method In EconomicsStatistical Method In Economics
Statistical Method In Economics
 
Aaapresmarsella
AaapresmarsellaAaapresmarsella
Aaapresmarsella
 
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionA Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
 
lec-ugc-sse.pdf
lec-ugc-sse.pdflec-ugc-sse.pdf
lec-ugc-sse.pdf
 

Más de NYversity

Programming methodology-1.1
Programming methodology-1.1Programming methodology-1.1
Programming methodology-1.1NYversity
 
3016 all-2007-dist
3016 all-2007-dist3016 all-2007-dist
3016 all-2007-distNYversity
 
Programming methodology lecture28
Programming methodology lecture28Programming methodology lecture28
Programming methodology lecture28NYversity
 
Programming methodology lecture25
Programming methodology lecture25Programming methodology lecture25
Programming methodology lecture25NYversity
 
Programming methodology lecture24
Programming methodology lecture24Programming methodology lecture24
Programming methodology lecture24NYversity
 
Programming methodology lecture22
Programming methodology lecture22Programming methodology lecture22
Programming methodology lecture22NYversity
 
Programming methodology lecture20
Programming methodology lecture20Programming methodology lecture20
Programming methodology lecture20NYversity
 
Programming methodology lecture19
Programming methodology lecture19Programming methodology lecture19
Programming methodology lecture19NYversity
 
Programming methodology lecture18
Programming methodology lecture18Programming methodology lecture18
Programming methodology lecture18NYversity
 
Programming methodology lecture16
Programming methodology lecture16Programming methodology lecture16
Programming methodology lecture16NYversity
 
Programming methodology lecture15
Programming methodology lecture15Programming methodology lecture15
Programming methodology lecture15NYversity
 
Programming methodology lecture13
Programming methodology lecture13Programming methodology lecture13
Programming methodology lecture13NYversity
 
Programming methodology lecture12
Programming methodology lecture12Programming methodology lecture12
Programming methodology lecture12NYversity
 
Programming methodology lecture10
Programming methodology lecture10Programming methodology lecture10
Programming methodology lecture10NYversity
 
Programming methodology lecture09
Programming methodology lecture09Programming methodology lecture09
Programming methodology lecture09NYversity
 
Programming methodology lecture08
Programming methodology lecture08Programming methodology lecture08
Programming methodology lecture08NYversity
 
Programming methodology lecture07
Programming methodology lecture07Programming methodology lecture07
Programming methodology lecture07NYversity
 
Programming methodology lecture05
Programming methodology lecture05Programming methodology lecture05
Programming methodology lecture05NYversity
 
Programming methodology lecture04
Programming methodology lecture04Programming methodology lecture04
Programming methodology lecture04NYversity
 
Programming methodology lecture02
Programming methodology lecture02Programming methodology lecture02
Programming methodology lecture02NYversity
 

Más de NYversity (20)

Programming methodology-1.1
Programming methodology-1.1Programming methodology-1.1
Programming methodology-1.1
 
3016 all-2007-dist
3016 all-2007-dist3016 all-2007-dist
3016 all-2007-dist
 
Programming methodology lecture28
Programming methodology lecture28Programming methodology lecture28
Programming methodology lecture28
 
Programming methodology lecture25
Programming methodology lecture25Programming methodology lecture25
Programming methodology lecture25
 
Programming methodology lecture24
Programming methodology lecture24Programming methodology lecture24
Programming methodology lecture24
 
Programming methodology lecture22
Programming methodology lecture22Programming methodology lecture22
Programming methodology lecture22
 
Programming methodology lecture20
Programming methodology lecture20Programming methodology lecture20
Programming methodology lecture20
 
Programming methodology lecture19
Programming methodology lecture19Programming methodology lecture19
Programming methodology lecture19
 
Programming methodology lecture18
Programming methodology lecture18Programming methodology lecture18
Programming methodology lecture18
 
Programming methodology lecture16
Programming methodology lecture16Programming methodology lecture16
Programming methodology lecture16
 
Programming methodology lecture15
Programming methodology lecture15Programming methodology lecture15
Programming methodology lecture15
 
Programming methodology lecture13
Programming methodology lecture13Programming methodology lecture13
Programming methodology lecture13
 
Programming methodology lecture12
Programming methodology lecture12Programming methodology lecture12
Programming methodology lecture12
 
Programming methodology lecture10
Programming methodology lecture10Programming methodology lecture10
Programming methodology lecture10
 
Programming methodology lecture09
Programming methodology lecture09Programming methodology lecture09
Programming methodology lecture09
 
Programming methodology lecture08
Programming methodology lecture08Programming methodology lecture08
Programming methodology lecture08
 
Programming methodology lecture07
Programming methodology lecture07Programming methodology lecture07
Programming methodology lecture07
 
Programming methodology lecture05
Programming methodology lecture05Programming methodology lecture05
Programming methodology lecture05
 
Programming methodology lecture04
Programming methodology lecture04Programming methodology lecture04
Programming methodology lecture04
 
Programming methodology lecture02
Programming methodology lecture02Programming methodology lecture02
Programming methodology lecture02
 

Último

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 

Último (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Machine learning (8)

  • 1. CS229 Lecture notes Andrew Ng Mixtures of Gaussians and the EM algorithm In this set of notes, we discuss the EM (Expectation-Maximization) for den- sity estimation. Suppose that we are given a training set {x(1), . . . , x(m)} as usual. Since we are in the unsupervised learning setting, these points do not come with any labels. We wish to model the data by specifying a joint distribution p(x(i), z(i)) = p(x(i)|z(i))p(z(i)). Here, z(i) Multinomial(φ) (where φj 0, Pk j=1 φj = 1, and the parameter φj gives p(z(i) = j),), and x(i)|z(i) = j N(μj,j). We let k denote the number of values that the z(i)’s can take on. Thus, our model posits that each x(i) was generated by randomly choosing z(i) from {1, . . . , k}, and then x(i) was drawn from one of k Gaussians depending on z(i). This is called the mixture of Gaussians model. Also, note that the z(i)’s are latent random variables, meaning that they’re hidden/unobserved. This is what will make our estimation problem difficult. The parameters of our model are thus φ, μ and . To estimate them, we can write down the likelihood of our data: ℓ(φ, μ,) = m X i=1 log p(x(i); φ, μ,) = m X i=1 log k X z(i)=1 p(x(i)|z(i); μ,)p(z(i); φ). However, if we set to zero the derivatives of this formula with respect to the parameters and try to solve, we’ll find that it is not possible to find the maximum likelihood estimates of the parameters in closed form. (Try this yourself at home.) The random variables z(i) indicate which of the k Gaussians each x(i) had come from. Note that if we knew what the z(i)’s were, the maximum 1
  • 2. 2 likelihood problem would have been easy. Specifically, we could then write down the likelihood as ℓ(φ, μ,) = m X i=1 log p(x(i)|z(i); μ,) + log p(z(i); φ). Maximizing this with respect to φ, μ and gives the parameters: φj = 1 m m X i=1 1{z(i) = j}, μj = Pm 1{z(i) = j}x(i) i=1 Pm 1{z(i) = j} i=1 , j = Pm i=1 1{z(i) = j}(x(i) − μj)(x(i) − μj)T Pm i=1 1{z(i) = j} . Indeed, we see that if the z(i)’s were known, then maximum likelihood estimation becomes nearly identical to what we had when estimating the parameters of the Gaussian discriminant analysis model, except that here the z(i)’s playing the role of the class labels.1 However, in our density estimation problem, the z(i)’s are not known. What can we do? The EM algorithm is an iterative algorithm that has two main steps. Applied to our problem, in the E-step, it tries to “guess” the values of the z(i)’s. In the M-step, it updates the parameters of our model based on our guesses. Since in the M-step we are pretending that the guesses in the first part were correct, the maximization becomes easy. Here’s the algorithm: Repeat until convergence: { (E-step) For each i, j, set w(i) j := p(z(i) = j|x(i); φ, μ,) 1There are other minor differences in the formulas here from what we’d obtained in PS1 with Gaussian discriminant analysis, first because we’ve generalized the z(i)’s to be multinomial rather than Bernoulli, and second because here we are using a different j for each Gaussian.
  • 3. 3 (M-step) Update the parameters: φj := 1 m m X i=1 w(i) j , i=1 w(i) μj := Pm j x(i) i=1 w(i) Pm j , i=1 w(i) j := Pm j (x(i) − μj)(x(i) − μj)T i=1 w(i) Pm j } In the E-step, we calculate the posterior probability of our parameters the z(i)’s, given the x(i) and using the current setting of our parameters. I.e., using Bayes rule, we obtain: p(z(i) = j|x(i); φ, μ,) = p(x(i)|z(i) = j; μ,)p(z(i) = j; φ) Pk l=1 p(x(i)|z(i) = l; μ,)p(z(i) = l; φ) Here, p(x(i)|z(i) = j; μ,) is given by evaluating the density of a Gaussian with mean μj and covariance j at x(i); p(z(i) = j; φ) is given by φj , and so on. The values w(i) j calculated in the E-step represent our “soft” guesses2 for the values of z(i). Also, you should contrast the updates in the M-step with the formulas we had when the z(i)’s were known exactly. They are identical, except that in- stead of the indicator functions “1{z(i) = j}” indicating from which Gaussian each datapoint had come, we now instead have the w(i) j ’s. The EM-algorithm is also reminiscent of the K-means clustering algo- rithm, except that instead of the “hard” cluster assignments c(i), we instead have the “soft” assignments w(i) j . Similar to K-means, it is also susceptible to local optima, so reinitializing at several different initial parameters may be a good idea. It’s clear that the EM algorithm has a very natural interpretation of repeatedly trying to guess the unknown z(i)’s; but how did it come about, and can we make any guarantees about it, such as regarding its convergence? In the next set of notes, we will describe a more general view of EM, one 2The term “soft” refers to our guesses being probabilities and taking values in [0, 1]; in contrast, a “hard” guess is one that represents a single best guess (such as taking values in {0, 1} or {1, . . . , k}).
  • 4. 4 that will allow us to easily apply it to other estimation problems in which there are also latent variables, and which will allow us to give a convergence guarantee.