SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Course Calendar (revised 2012 Dec. 27)
Class DATE Contents
1 Sep. 26 Course information & Course overview
2 Oct. 4 Bayes Estimation
3 〃 11 Classical Bayes Estimation - Kalman Filter -
4 〃 18 Simulation-based Bayesian Methods
5 〃 25 Modern Bayesian Estimation :Particle Filter
6 Nov. 1 HMM(Hidden Markov Model)
Nov. 8 No Class
7 〃 15 Bayesian Decision
8 〃 29 Non parametric Approaches
9 Dec. 6 PCA(Principal Component Analysis)
10 〃 13 ICA(Independent Component Analysis)
11 〃 20 Applications of PCA and ICA
12 〃 27 Clustering; k-means, Mixture Gaussian and EM
13 Jan. 17 Support Vector Machine
14 〃 22(Tue) No Class
Lecture Plan
Clustering:
K-means, Mixtures of Gaussians and EM
1. Introduction
2. K-means Algorithm
3. Mixtures of Gaussians
4. Re-formation of Mixtures of Gaussians
5. EM algorithm
3
1. Introduction
Unsupervised Learning and Clustering Problem
Given a set of feature vectors without labels of categories, we want to
attempt to find groups or clusters of the data samples in multi-
dimensional space.
We focus the following two methods:
- K-means algorithm
Non-parametric simple technique
- (Gaussian) Mixture models and EM(Expectation Maximization)
/Use a mixture of parametric densities such as Gaussians.
/The optimal model parameters are not given in a closed form
because of a highly non-linear coupled equations.
/The expectation-maximization algorithm is effective for
determining the optimal parameters.
4
1 2
:D-dimensional random vector
N dataset of : X:={ , , , }
: A group of data points whose inter-distances are small
compared with the distances to the points outside of the cluster
N
Cluster
Prototy
x
x x x x
 
 
of cluster: 1
: Find a set of vectors , such that the sum of the squared
disstances of each point to its cvlosest vector is minimized.
k
k
k
k K
K



pe
Aim
2. K-means Algorithm
The K-means algorithm is a non-statistical approach of clustering of
data points in multi-dimensional feature space.
Problem: Partition the dataset into some number K of clusters
(K is known)
Fig. 1
1 [Bishop book[1] and its web site]
5
Fig. 1
1 [Bishop book[1] and its web site]
6
-Assignment indicator
1 if is assigned to -th cluster
0
n
nk
k
r
otherwise

 

x
Algorithm
Introduce variable rnk denoting the assignment of data point
2
1 1
-Object Function (Distortion measure)
N K
nk n k
n k
J r
 
  x 
Squared of distance of each point xn to
its assigned vector 𝝁k
   -Find both and which minimizenk kr J
(1)
(2)
7
(0)
( )
- : for the
- : Minize J with respect to for fixed
- : Minimize J with respect to for fixed
k k
i
nk
k nk
r
r
 
 


Two - stage Optimization
initial value
First stage
Second stage
 
:
Determination of for given 1~ at
at argmin1
0
That is, we assign the to the closest cluster center.
nk k n
n j
j
nk
n
r k K x
k x
r
otherwise
x



  
 

First stage
(3)
8
 
:
Optimization of
0 2 0
Above equation gives the mean vector of all data points
assinged to cluster .
k
nk n k
nk
nk nn
k
nkn
n
J
r x
r x
r
x
k





   

 



Second stage
the number of points assigned
to cluster k
the sum of xn which assigned to
cluster k
(4)
9
Example 1 [Bishop book[1] and its web site]
Fig.2
(0)
1
(0)
2
Fig. 3 [1]
Application of k-means algorithm for color-based image
segmentation [Bishop book[1] and its web site]
K-means clustering applied to the color vectors of pixels in RGB
color-space
11
   
1
[Mixture of Gaussians]
Conside a superposition of Gaussians (Normal distributions)
,
K
k k k
k
K
p x x 

 
3. Mixtures of Gaussians
- Limitations of single Gaussian pdf model
Examples [Bishop[1]]
Single Gaussian model does not capture the multi-modes feature.
Fig 4
Mixture distribution approach: uses the linear combination of basic
distributions such as Gaussians
mixing coefficients mixture component
(5)
single Gaussian Mixture of Gaussians
12
single Gaussian
Mixture of Gaussians
13
 
 
 
   
1
1 1
0 0 1
The ( 1 ) satisfy the discrete probability requirements.
:The prior probability of selecting the -mixture component
,
K
k
k
k
k
k
k k
p x dx
p x
k K
p k k
x p x






  
   


 

     
   
   
   
 
 
 
1
responsibilit
: The probability of
i
with condition on
From Eq. (5)
- Define the by the posterior distributioe n
:
,
=
s
,
K
k
k
k
k k k
l l l
x k
p x p k p x k
x p k x
x p k x
p k p x k x
p x
x


 
 







1
K
l

(7)
(6)
14
 
 
 
 
1 2
1 2
1 2
1 2
(*)
(* see lect
- Parameters of mixture Gaussian (5)
:= , , ,
:= , , ,
:= , , ,
- Observed data X:= , , , Estimatte , ,
- Apply Maximum Likelihood method
K
K
K
Nx x x
  
  
  
   



   
1 1
ure 2 slides for a single Gaussian distribution case)
- Maximize the Log-Likelihood function
ln , , ln ,
N K
k n k k
n k
p X N x 
 
 
  
 
   
Too complex to give closed form solution
Go to EM (Expectation Maximization) algorithm
(8)
15
4. Re-formation of Mixtures of Gaussians
Formulation of Mixture of Gaussians in terms of discrete latent random
variables
- Introduce K-dimensional random variable z
- 1-of-K representation model of πk
 
 
 
1 2
1
: , , ,
0,1 and 1
1
T
K
K
k k
k
k k
z z z
z z
p z 


 
 

z
       ln , ,
z
p x p z p x z p X    
Equivalent formulation of the Gaussian mixture with explicit
latent variable z
(9)
16
   
   
   
 
 1 1
- The conditional probability of for given
: 1
1 1 ,
1 1 ,
k k
k k k k k
K K
k k j j j
j j
z x
z p z x
p z p x z N x
p z p x z N x

 
 
 
 
  
 
   
The responsibility that component k
takes for explaining the observation x
The posterior probability for observed x
The prior probability of zk=1
 
   
1 2
1
- Modeling a data set X:= , , , using a mixture of Gaussians
Assuming , , are drawn independently from , ,
the Log-Likelihood function is given by Eq.()
N
N k k
x x x
x x p x  
(10)
17
- With respect to and , the conditions that must be
satisfied at a maximum of the likelihood function
k k 
 
   
1 1
- Maximization of ln , , with respect to
subject to a constraint 1 is also solved.
- Solutions are given by
1
where :
k
kk
N N
k nk n k nk
n nk
k
p X
z x N z
N


  
 

 


 
  
   
 
1
The responsibility of with respect to -th cluster
1
where =Eq. (10) n
N
T
nk n k n k
nk
k
k
nk x k
z x x
N
N
N
z
  



  


5. EM Algorithm
 ln , ,
0, ,k k
p X
 


  

   (11)
(14)
(13)
(12)
18
 
Three equations ()-() do not give solutions directly because
, contain unknowns , , and in complex ways.
[EM algorithm for Gaussian Mixture Mode]
Simple iterative scheme which altaernate the
nk kz N   
 
E (Expectation)
and M (Maximization) steps.
: Evaluate the posterior probabilities (responsibilities)
using the current parameters
: Re-estimate parameters , , a
nkz
 
E step
M step
 
 
nd using the
evaluated
Color illustration of in two-category case
nk
nk
z
z



19
(0)
2
(0)
1
20
Example 2 EM algorithm [Bishop book[1] and its web site]
(0)
1
(0)
2
21
k-means algorithm
EM algorithm
22
References:
[1] C. M. Bishop, “Pattern Recognition and Machine Learning”,
Springer, 2006
[2] R.O. Duda, P.E. Hart, and D. G. Stork, “Pattern Classification”,
John Wiley & Sons, 2nd edition, 2004
23
   
 
 
 
     
2
1 1
21
1
2
2 2
2
Proof of 1-dimensional case
ln , , ln ,
ln , , 0
,
- When
1
,
derives Eq.
,
,
1( 2)
n k k
n k k
N K
j n j j
n j
kN
K
n
j n j j
j
k
n k k n k
k k
N x
N x
p X N x
p X
N x
N x x
 


    

 

  
 
  



 


 
   
 





  


 
 


Appendix
(A.1)
(A.2)
(A.3)
24
 
  
22
1
2
- When
Calculate and substitute it into Eq. (A.2)
derives
1
,
k
N
k nk n k
n
k
k
n
k
k
z x
N x
N
 

 
  


 



 
 
 
For the maximization problem of ln , , with respect
to subject to 1 , Lagrange multiplier method provides
an elegant solution.
- Introduce Lagragian function given by
, : ln ,
k kk
k
p X
L p X
 
 



  
    , 1kk
  
(A.4)
(A.5)
25
   
   
 
 
 
2
21
1
2
21
1
- Stationarity conditions
, ,
0, 0
,,
0
,
Multiply both sides above, we have
,
,
, and the s
k k
k
N
n k kk
K
nk
j n j j
j
k
N
k n k k
kK
n
j n j j
j
L L
N xL
N x
N x
N x
   
 
  


  

  

  




 
 
 

  






 
 
2
21
1
ummation over gives
,
,
N
k n k kk
kK k
n
j n j j
j
k
N x
N x
  
 
  



 

(A.6)
(A.7)
(A.8)
26
 
 
2
21
1
We then have
From (A.7),
,1
=
,
N
k n k k k
k K
n
j n j j
j
N
N x N
N N
N x

  

  




(A.9)

Más contenido relacionado

La actualidad más candente

2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
nozomuhamada
 
2012 mdsp pr02 1004
2012 mdsp pr02 10042012 mdsp pr02 1004
2012 mdsp pr02 1004
nozomuhamada
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter
nozomuhamada
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniques
Krishna Gali
 
PosterPresentations.com-3 6x48-Template-V5 - 副本
PosterPresentations.com-3 6x48-Template-V5 - 副本PosterPresentations.com-3 6x48-Template-V5 - 副本
PosterPresentations.com-3 6x48-Template-V5 - 副本
Yijun Zhou
 

La actualidad más candente (20)

Jacobi method
Jacobi methodJacobi method
Jacobi method
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
 
Numerical Methods Solving Linear Equations
Numerical Methods Solving Linear EquationsNumerical Methods Solving Linear Equations
Numerical Methods Solving Linear Equations
 
NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)
 
Section4 stochastic
Section4 stochasticSection4 stochastic
Section4 stochastic
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learning
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
system of algebraic equation by Iteration method
system of algebraic equation by Iteration methodsystem of algebraic equation by Iteration method
system of algebraic equation by Iteration method
 
2012 mdsp pr02 1004
2012 mdsp pr02 10042012 mdsp pr02 1004
2012 mdsp pr02 1004
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
It 05104 digsig_1
It 05104 digsig_1It 05104 digsig_1
It 05104 digsig_1
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniques
 
Point Collocation Method used in the solving of Differential Equations, parti...
Point Collocation Method used in the solving of Differential Equations, parti...Point Collocation Method used in the solving of Differential Equations, parti...
Point Collocation Method used in the solving of Differential Equations, parti...
 
Convolution and FFT
Convolution and FFTConvolution and FFT
Convolution and FFT
 
Nsm
Nsm Nsm
Nsm
 
PosterPresentations.com-3 6x48-Template-V5 - 副本
PosterPresentations.com-3 6x48-Template-V5 - 副本PosterPresentations.com-3 6x48-Template-V5 - 副本
PosterPresentations.com-3 6x48-Template-V5 - 副本
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
 
linear system of solutions
linear system of solutionslinear system of solutions
linear system of solutions
 

Similar a 2012 mdsp pr12 k means mixture of gaussian

MASSS_Presentation_20160209
MASSS_Presentation_20160209MASSS_Presentation_20160209
MASSS_Presentation_20160209
Yimin Wu
 
machinelearning project
machinelearning projectmachinelearning project
machinelearning project
Lianli Liu
 
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
SEENET-MTP
 
Reachability Analysis "Control Of Dynamical Non-Linear Systems"
Reachability Analysis "Control Of Dynamical Non-Linear Systems" Reachability Analysis "Control Of Dynamical Non-Linear Systems"
Reachability Analysis "Control Of Dynamical Non-Linear Systems"
M Reza Rahmati
 
Reachability Analysis Control of Non-Linear Dynamical Systems
Reachability Analysis Control of Non-Linear Dynamical SystemsReachability Analysis Control of Non-Linear Dynamical Systems
Reachability Analysis Control of Non-Linear Dynamical Systems
M Reza Rahmati
 

Similar a 2012 mdsp pr12 k means mixture of gaussian (20)

Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
How to design a linear control system
How to design a linear control systemHow to design a linear control system
How to design a linear control system
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
MASSS_Presentation_20160209
MASSS_Presentation_20160209MASSS_Presentation_20160209
MASSS_Presentation_20160209
 
machinelearning project
machinelearning projectmachinelearning project
machinelearning project
 
ML ALL in one (1).pdf
ML ALL in one (1).pdfML ALL in one (1).pdf
ML ALL in one (1).pdf
 
A Level Set Method For Multiobjective Combinatorial Optimization Application...
A Level Set Method For Multiobjective Combinatorial Optimization  Application...A Level Set Method For Multiobjective Combinatorial Optimization  Application...
A Level Set Method For Multiobjective Combinatorial Optimization Application...
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Problems 2
Problems 2Problems 2
Problems 2
 
Clustering-beamer.pdf
Clustering-beamer.pdfClustering-beamer.pdf
Clustering-beamer.pdf
 
Jacobi iterative method
Jacobi iterative methodJacobi iterative method
Jacobi iterative method
 
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Unit 4 jwfiles
Unit 4 jwfilesUnit 4 jwfiles
Unit 4 jwfiles
 
On image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDAOn image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDA
 
Reachability Analysis "Control Of Dynamical Non-Linear Systems"
Reachability Analysis "Control Of Dynamical Non-Linear Systems" Reachability Analysis "Control Of Dynamical Non-Linear Systems"
Reachability Analysis "Control Of Dynamical Non-Linear Systems"
 
Reachability Analysis Control of Non-Linear Dynamical Systems
Reachability Analysis Control of Non-Linear Dynamical SystemsReachability Analysis Control of Non-Linear Dynamical Systems
Reachability Analysis Control of Non-Linear Dynamical Systems
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

2012 mdsp pr12 k means mixture of gaussian

  • 1. Course Calendar (revised 2012 Dec. 27) Class DATE Contents 1 Sep. 26 Course information & Course overview 2 Oct. 4 Bayes Estimation 3 〃 11 Classical Bayes Estimation - Kalman Filter - 4 〃 18 Simulation-based Bayesian Methods 5 〃 25 Modern Bayesian Estimation :Particle Filter 6 Nov. 1 HMM(Hidden Markov Model) Nov. 8 No Class 7 〃 15 Bayesian Decision 8 〃 29 Non parametric Approaches 9 Dec. 6 PCA(Principal Component Analysis) 10 〃 13 ICA(Independent Component Analysis) 11 〃 20 Applications of PCA and ICA 12 〃 27 Clustering; k-means, Mixture Gaussian and EM 13 Jan. 17 Support Vector Machine 14 〃 22(Tue) No Class
  • 2. Lecture Plan Clustering: K-means, Mixtures of Gaussians and EM 1. Introduction 2. K-means Algorithm 3. Mixtures of Gaussians 4. Re-formation of Mixtures of Gaussians 5. EM algorithm
  • 3. 3 1. Introduction Unsupervised Learning and Clustering Problem Given a set of feature vectors without labels of categories, we want to attempt to find groups or clusters of the data samples in multi- dimensional space. We focus the following two methods: - K-means algorithm Non-parametric simple technique - (Gaussian) Mixture models and EM(Expectation Maximization) /Use a mixture of parametric densities such as Gaussians. /The optimal model parameters are not given in a closed form because of a highly non-linear coupled equations. /The expectation-maximization algorithm is effective for determining the optimal parameters.
  • 4. 4 1 2 :D-dimensional random vector N dataset of : X:={ , , , } : A group of data points whose inter-distances are small compared with the distances to the points outside of the cluster N Cluster Prototy x x x x x     of cluster: 1 : Find a set of vectors , such that the sum of the squared disstances of each point to its cvlosest vector is minimized. k k k k K K    pe Aim 2. K-means Algorithm The K-means algorithm is a non-statistical approach of clustering of data points in multi-dimensional feature space. Problem: Partition the dataset into some number K of clusters (K is known) Fig. 1 1 [Bishop book[1] and its web site]
  • 5. 5 Fig. 1 1 [Bishop book[1] and its web site]
  • 6. 6 -Assignment indicator 1 if is assigned to -th cluster 0 n nk k r otherwise     x Algorithm Introduce variable rnk denoting the assignment of data point 2 1 1 -Object Function (Distortion measure) N K nk n k n k J r     x  Squared of distance of each point xn to its assigned vector 𝝁k    -Find both and which minimizenk kr J (1) (2)
  • 7. 7 (0) ( ) - : for the - : Minize J with respect to for fixed - : Minimize J with respect to for fixed k k i nk k nk r r       Two - stage Optimization initial value First stage Second stage   : Determination of for given 1~ at at argmin1 0 That is, we assign the to the closest cluster center. nk k n n j j nk n r k K x k x r otherwise x          First stage (3)
  • 8. 8   : Optimization of 0 2 0 Above equation gives the mean vector of all data points assinged to cluster . k nk n k nk nk nn k nkn n J r x r x r x k                Second stage the number of points assigned to cluster k the sum of xn which assigned to cluster k (4)
  • 9. 9 Example 1 [Bishop book[1] and its web site] Fig.2 (0) 1 (0) 2
  • 10. Fig. 3 [1] Application of k-means algorithm for color-based image segmentation [Bishop book[1] and its web site] K-means clustering applied to the color vectors of pixels in RGB color-space
  • 11. 11     1 [Mixture of Gaussians] Conside a superposition of Gaussians (Normal distributions) , K k k k k K p x x     3. Mixtures of Gaussians - Limitations of single Gaussian pdf model Examples [Bishop[1]] Single Gaussian model does not capture the multi-modes feature. Fig 4 Mixture distribution approach: uses the linear combination of basic distributions such as Gaussians mixing coefficients mixture component (5) single Gaussian Mixture of Gaussians
  • 13. 13           1 1 1 0 0 1 The ( 1 ) satisfy the discrete probability requirements. :The prior probability of selecting the -mixture component , K k k k k k k k p x dx p x k K p k k x p x                                           1 responsibilit : The probability of i with condition on From Eq. (5) - Define the by the posterior distributioe n : , = s , K k k k k k k l l l x k p x p k p x k x p k x x p k x p k p x k x p x x              1 K l  (7) (6)
  • 14. 14         1 2 1 2 1 2 1 2 (*) (* see lect - Parameters of mixture Gaussian (5) := , , , := , , , := , , , - Observed data X:= , , , Estimatte , , - Apply Maximum Likelihood method K K K Nx x x                     1 1 ure 2 slides for a single Gaussian distribution case) - Maximize the Log-Likelihood function ln , , ln , N K k n k k n k p X N x               Too complex to give closed form solution Go to EM (Expectation Maximization) algorithm (8)
  • 15. 15 4. Re-formation of Mixtures of Gaussians Formulation of Mixture of Gaussians in terms of discrete latent random variables - Introduce K-dimensional random variable z - 1-of-K representation model of πk       1 2 1 : , , , 0,1 and 1 1 T K K k k k k k z z z z z p z         z        ln , , z p x p z p x z p X     Equivalent formulation of the Gaussian mixture with explicit latent variable z (9)
  • 16. 16                1 1 - The conditional probability of for given : 1 1 1 , 1 1 , k k k k k k k K K k k j j j j j z x z p z x p z p x z N x p z p x z N x                   The responsibility that component k takes for explaining the observation x The posterior probability for observed x The prior probability of zk=1       1 2 1 - Modeling a data set X:= , , , using a mixture of Gaussians Assuming , , are drawn independently from , , the Log-Likelihood function is given by Eq.() N N k k x x x x x p x   (10)
  • 17. 17 - With respect to and , the conditions that must be satisfied at a maximum of the likelihood function k k        1 1 - Maximization of ln , , with respect to subject to a constraint 1 is also solved. - Solutions are given by 1 where : k kk N N k nk n k nk n nk k p X z x N z N                        1 The responsibility of with respect to -th cluster 1 where =Eq. (10) n N T nk n k n k nk k k nk x k z x x N N N z            5. EM Algorithm  ln , , 0, ,k k p X            (11) (14) (13) (12)
  • 18. 18   Three equations ()-() do not give solutions directly because , contain unknowns , , and in complex ways. [EM algorithm for Gaussian Mixture Mode] Simple iterative scheme which altaernate the nk kz N      E (Expectation) and M (Maximization) steps. : Evaluate the posterior probabilities (responsibilities) using the current parameters : Re-estimate parameters , , a nkz   E step M step     nd using the evaluated Color illustration of in two-category case nk nk z z   
  • 20. 20 Example 2 EM algorithm [Bishop book[1] and its web site] (0) 1 (0) 2
  • 22. 22 References: [1] C. M. Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006 [2] R.O. Duda, P.E. Hart, and D. G. Stork, “Pattern Classification”, John Wiley & Sons, 2nd edition, 2004
  • 23. 23                 2 1 1 21 1 2 2 2 2 Proof of 1-dimensional case ln , , ln , ln , , 0 , - When 1 , derives Eq. , , 1( 2) n k k n k k N K j n j j n j kN K n j n j j j k n k k n k k k N x N x p X N x p X N x N x x                                                     Appendix (A.1) (A.2) (A.3)
  • 24. 24      22 1 2 - When Calculate and substitute it into Eq. (A.2) derives 1 , k N k nk n k n k k n k k z x N x N                      For the maximization problem of ln , , with respect to subject to 1 , Lagrange multiplier method provides an elegant solution. - Introduce Lagragian function given by , : ln , k kk k p X L p X               , 1kk    (A.4) (A.5)
  • 25. 25               2 21 1 2 21 1 - Stationarity conditions , , 0, 0 ,, 0 , Multiply both sides above, we have , , , and the s k k k N n k kk K nk j n j j j k N k n k k kK n j n j j j L L N xL N x N x N x                                               2 21 1 ummation over gives , , N k n k kk kK k n j n j j j k N x N x               (A.6) (A.7) (A.8)
  • 26. 26     2 21 1 We then have From (A.7), ,1 = , N k n k k k k K n j n j j j N N x N N N N x             (A.9)