SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
MCMC for mixtures of Gaussians, and model 
selection 
Aaron McDaid, aaronmcdaid@gmail.com 
October 30, 2014 
1 / 36
Six models 
l 
l 
l 
l 
ll 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
ll l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
lll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
ll 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
−20 0 20 40 60 
0 20 40 60 80 100 
V1 
V2 
(a) 1. vvv 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
−10 −5 0 5 10 
−30 −20 −10 0 10 20 30 
V1 
V2 
(b) 2. eee 
2 / 36
Six models 
l 
l l 
l l 
l 
l 
l l 
l 
l 
l 
l 
ll 
l l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
ll l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l l l l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l ll 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
−10 0 10 20 30 
−20 −10 0 10 
V1 
V2 
(a) 3. vvi 
l 
l 
l 
ll 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
llll 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l l 
l l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
ll 
l 
ll 
ll 
l 
l l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
ll 
l 
l l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll l 
ll 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
−30 −20 −10 0 10 20 30 
−30 −20 −10 0 10 20 30 
V1 
V2 
(b) 4. eei 
3 / 36
Six models 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l l 
l 
l 
l l 
l 
l l 
l ll l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l ll 
l 
l 
l 
l 
l 
ll 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
ll 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
lll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
ll 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
lll 
l 
l 
l 
l l 
ll 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
ll 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l ll 
ll 
l 
l 
−20 −10 0 10 20 
−20 −10 0 10 20 
V1 
V2 
(a) 5. vii 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
ll 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
−100 0 100 200 
−200 −100 0 100 200 
V1 
V2 
(b) 6. eii 
4 / 36
Old Faithful N=272 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 
50 60 70 80 90 
eruptions 
waiting 
Old Faithful - Yellowstone National 
Park 
5 / 36
Old Faithful N=272 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
ll 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
l 
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 
50 60 70 80 90 
V1 
V2 
Old Faithful - Yellowstone National 
Park 
6 / 36
Overview 
Goals 
De
ne the mclust model 
Bayes Factor and BIC - connection between mclust and 
MCMC 
Priors 
Integration (analytical and numerical) 
MCMC algorithm1 
Selecting from the six models via MCMC 
Evaluation (on synthetic data) 
One application 
1Mahlet G. Tadesse, Naijun Sha, and Marina Vannucci. Bayesian Variable 
Selection in Clustering High-Dimensional Data". In: Journal of the American 
Statistical Association 100.470 (June 2005), pp. 602{617. issn: 0162-1459. 
doi: 10.1198/016214504000001565. url: 
http://www.stat.rice.edu/~{}marina/papers/jasa05.pdf. 
7 / 36
Goals 
Not a `shootout' with mclust 
See what MCMC can do 
Calculate the Bayes Factor more precisely - is it better than 
BIC? 
Push to larger numbers of clusters 
8 / 36
Basic model 
N data points in a p-dimensional space. 
m 2 (fvvv; eee; vvi; eei; vii; eiig) 
K number of clusters 
k covariance of clusterk 
k mean Pof cluster k 
 
K 
k=1 k = 1 
zi P(zi = k) = k 
xi jzi=k  Normal(k ;k ): 
Mixture models 
P(xi jzi=k) = N(xi jk ;k ) 
P(xi ) = 
XK 
k=1 
kN(xi jk ;k ) 
9 / 36
mclust 
MLE (Maximum Likelihood Estimate) 
R package mclust2 
Given (K;m), use Expectation-Maximization (EM) algorithm 
to estimate (;;). 
P(Xjk ;k ;;m;K) 
Requires running EM for each possible combination of (K;m). 
Hundreds of runs may be required. f(K = 2;m = VVI); (K = 
3;m = EEI ); (K = 50;m = EEI ); : : : g 
Then use BIC to select among the models. 
2Chris Fraley and Adrian E. Raftery. MCLUST: Software for model-based 
cluster analysis. In: Journal of Classi
cation 16.2 (1999), pp. 297{306. 
10 / 36
mclust 
Why do we need model selection? 
vvv vvi vii 
eee eei eii 
De
ne  = (;;). 
P(Xj=^ eee;K;m=vvv;K) = P(Xj=^ eee;K;m=eee;K) 
Cannot maximize P(Xj;m;K) 
Count the degrees-of-freedom f , in order to penalize the more 
complex model. 
AIC = 2 log P(XjMLE 
m;K ;m;K)  2f 
BIC = 2 log P(XjMLE 
m;K ;m;K)  log(N)f 
11 / 36
Bayes Factor 
(BIC) Bayesian Information Criterion 
BIC  2 log 
Bayes Factor z }| { 
(P(Xjm;K)) 
P(X = Xobs jm;K) 
Informally, the average P(Xj;m;K) over all . 
Can we compute this (weighted) average more accurately? 
12 / 36
Bayes Factor 
(BIC) Bayesian Information Criterion 
BIC  2 log 
Bayes Factor z }| { 
(P(Xjm;K)) 
P(X = Xobs jm;K) 
Informally, the average P(Xj;m;K) over all . 
Can we compute this (weighted) average more accurately? 
P(Xjm=vvv;K) 
P(Xjm=eee;K) = 
R 
R P(X;jm=vvv;K) d 
P(X;jm=eee;K) d 
12 / 36
Full model 
N data points in a p-dimensional space. 
dependence distribution 
m  Uniform(fvvv; eee; vvi; eei; vii; eiig) 
K jK0  Poisson(1) 
 jK  Dirichlet(0): 
zi j;K P(zi = kj;K) = k 
k jm;K  Wishart1(V0; g0): 
k jk ;m;K  Normal(0; 1 
n0 
k ): 
xi jzi=k; k ;;m;K  Normal(k ;k ): 
0 = 
 
1 
2 
; 
1 
2 
; :::; 
1 
2 
 
0 = X 
n0 = 0:001 
g0 = (p+1)+n0(p+1) 
1n0 
 p + 1 +  
V0 = Cov(X)(g0  p  1) 
13 / 36
Dirichlet 
( 1K 
; 1K 
; :::; 1K 
). 
PK 
k=1 k = 1 
Dirichlet gives us random vectors 
Dirichlet(1; 2; :::; K) 
K = 4,  = (0:01; 0:09; 0:80; 0:10) 
May lead to empty clusters in the prior, and therefore in 
posterior too 
KjX  KTRUE 
Solution3. K  Poisson(1)jK  1 
3Agostino Nobile. Bayesian
nite mixtures: a note on prior speci
cation 
and posterior computation. In: arXiv preprint arXiv:0711.0458 (2007). 
14 / 36
Integration 
Joint probability 
P(X;;; z;;K;m) =P(Xj;; z;;K;m) 
 P(j;z;;K;m)0;n0 
 P(jz;;K;m)V0;g0 
 P(zj;K;m) 
 P(jK;m)0 
 P(K jm) 
 P(m) 
15 / 36
Integration 
In general, 
P(ajb) = 
X 
c 
P(a; cjb) 
P(ajb) = 
Z 
P(a; ejb) de 
P(ajb) = 
X 
c 
P(ajc; b)P(cjb) 
P(ajb) = 
Z 
P(aje; b)P(ejb) de 
16 / 36
Integration 
P(mjX) = 
1X 
K=1 
X 
z 
Z Z Z 
P(;; z;;K;mjX) d d d 
P(KjX) = 
X 
m 
X 
z 
Z Z Z 
P(;; z;;K;mjX) d d d 
P(K;mjX) = 
X 
z 
Z Z Z 
P(;; z;;K;mjX) d d d 
P(zjX) = 
1X 
K=1 
X 
m 
Z Z Z 
P(;; z;;K;mjX) d d d 
17 / 36
Integration 
P(z;K;mjX) = 
Z Z Z 
P(;; z;;K;mjX) d d d 
P(z;K;mjX) = 
Z Z Z 
1 
P(X) 
P(;; z;;K;m;X) d d d 
P(z;K;mjX) = 
1 
P(X) 
Z Z Z 
P(;;jz;K;m;X)P(z;K;m;X) d d P(z;K;m;X) 
P(z;K;mjX) = 
P(X) 
Z Z Z 
P(;;jz;K;m;X) d d d 
18 / 36
Mini-overview 
We speci
ed the model, with all our priors, earlier. How do we get 
our estimates? 
RJMCMC4 would give many estimates of 
P(;; z;;K;mjX). 
Want faster MCMC. 
Solve P(X; z;K;m) analytically. 
Use that to sample z;K;mjX. 
4Peter J. Green. Reversible Jump Markov Chain Monte Carlo computation 
and Bayesian model determination. In: Biometrika 82.4 (Dec. 1995), 
pp. 711{732. doi: 10.1093/biomet/82.4.711. url: 
http://dx.doi.org/10.1093/biomet/82.4.711. 
19 / 36
Mini-overview 
We speci
ed the model, with all our priors, earlier. How do we get 
our estimates? 
RJMCMC4 would give many estimates of 
P(;; z;;K;mjX). 
Want faster MCMC. 
Solve P(X; z;K;m) analytically. 
Use that to sample z;K;mjX. 
Count popular (m), or (K), or (m;K) in sample. P(m;KjX). 
(Proven identical to RJMCMC - dierent MCMC algorithms 
(usually) don't change results, just speed.) 
4Peter J. Green. Reversible Jump Markov Chain Monte Carlo computation 
and Bayesian model determination. In: Biometrika 82.4 (Dec. 1995), 
pp. 711{732. doi: 10.1093/biomet/82.4.711. url: 
http://dx.doi.org/10.1093/biomet/82.4.711. 
19 / 36
Mini-overview 
We speci
ed the model, with all our priors, earlier. How do we get 
our estimates? 
RJMCMC4 would give many estimates of 
P(;; z;;K;mjX). 
Want faster MCMC. 
Solve P(X; z;K;m) analytically. 
Use that to sample z;K;mjX. 
Count popular (m), or (K), or (m;K) in sample. P(m;KjX). 
(Proven identical to RJMCMC - dierent MCMC algorithms 
(usually) don't change results, just speed.) 
If desired, ;;jX; z;K;m is easily generated. 
4Peter J. Green. Reversible Jump Markov Chain Monte Carlo computation 
and Bayesian model determination. In: Biometrika 82.4 (Dec. 1995), 
pp. 711{732. doi: 10.1093/biomet/82.4.711. url: 
http://dx.doi.org/10.1093/biomet/82.4.711. 
19 / 36
Analytical integration 
P(X; z;K;m) = 
Z Z Z 
P(X;;; z;;K;m) d d d 
P(;;;X; z;K;m) = P(;;;X; z;K;m) 
P(;;jX; z;K;m)P(X; z;K;m) = P(X; z;K;mj;;)P(;;) 
P(X; z;K;m) = P(X;z;K;mj;;)P(;;) 
P(;;jX;z;K;m) 
20 / 36
Numerical integration (MCMC) 
Markov Chain Monte Carlo (MCMC) 
Begin with an initial estimate (z1;m1;K1) 
At each iteration, propose to perturb 
(zi ;mi ;Ki ) ) (zi;mi;Ki) 
Similar to current state, to enable a gradual `climb' towards 
the good estimates. 
21 / 36
Numerical integration (MCMC) 
Markov Chain Monte Carlo (MCMC) 
Begin with an initial estimate (z1;m1;K1) 
At each iteration, propose to perturb 
(zi ;mi ;Ki ) ) (zi;mi;Ki) 
Similar to current state, to enable a gradual `climb' towards 
the good estimates. 
h 
De
ne ai = min 
1; P(X;zi;mi;Ki) 
P(X;zi ;mi ;Ki ) 
q(zi ;mi ;Ki jzi;mi;Ki) 
q(zi;mi;Kijzi ;mi ;Ki ) 
i 
21 / 36
Numerical integration (MCMC) 
Markov Chain Monte Carlo (MCMC) 
Begin with an initial estimate (z1;m1;K1) 
At each iteration, propose to perturb 
(zi ;mi ;Ki ) ) (zi;mi;Ki) 
Similar to current state, to enable a gradual `climb' towards 
the good estimates. 
h 
De
ne ai = min 
1; P(X;zi;mi;Ki) 
P(X;zi ;mi ;Ki ) 
q(zi ;mi ;Ki jzi;mi;Ki) 
q(zi;mi;Kijzi ;mi ;Ki ) 
i 
(zi+1;mi+1;Ki+1) = (zi;mi;Ki) with probability ai . 
(zi+1;mi+1;Ki+1) = (zi ;mi ;Ki ) with probability 1ai . 
Resulting estimates will be drawn as z;m;KjX 
21 / 36
Numerical integration (MCMC) 
Markov Chain Monte Carlo (MCMC) 
Begin with an initial estimate (z1;m1;K1) 
At each iteration, propose to perturb 
(zi ;mi ;Ki ) ) (zi;mi;Ki) 
Similar to current state, to enable a gradual `climb' towards 
the good estimates. 
h 
De
ne ai = min 
1; P(X;zi;mi;Ki) 
P(X;zi ;mi ;Ki ) 
q(zi ;mi ;Ki jzi;mi;Ki) 
q(zi;mi;Kijzi ;mi ;Ki ) 
i 
(zi+1;mi+1;Ki+1) = (zi;mi;Ki) with probability ai . 
(zi+1;mi+1;Ki+1) = (zi ;mi ;Ki ) with probability 1ai . 
Resulting estimates will be drawn as z;m;KjX 
`Good' proposals don't aect the distribution, but they do 
improve speed 
21 / 36
The above is too slow. Still too much correlation, slowing the progress. 
So I run six chains, 
(z;KjX;m = vvv) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) 
(z;KjX;m = eee) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) 
(z;KjX;m = vvi) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) 
(z;KjX;m = eei) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) 
(z;KjX;m = vii) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) 
(z;KjX;m = eii) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) 
and combine the results. 
Results should be combined in proportion to P(mjX). 
22 / 36
*VVV* .. iteration: 0/10000 nonEmpty: 1 K: 1 nmi: 0 entropy: 0.000000 
*VVV* .. iteration: 50/10000 nonEmpty: 6 K: 6 nmi: 52.5095 entropy: 1.477233 
*VVV* .. iteration: 100/10000 nonEmpty: 9 K: 9 nmi: 72.713 entropy: 2.045612 
*VVV* .. iteration: 150/10000 nonEmpty: 9 K: 9 nmi: 75.5046 entropy: 2.124148 
*VVV* .. iteration: 200/10000 nonEmpty: 9 K: 9 nmi: 74.8402 entropy: 2.105455 
*VVV* .. iteration: 250/10000 nonEmpty: 10 K: 11 nmi: 77.8969 entropy: 2.191450 
*VVV* .. iteration: 300/10000 nonEmpty: 11 K: 11 nmi: 79.2266 entropy: 2.228856 
*VVV* .. iteration: 350/10000 nonEmpty: 12 K: 12 nmi: 82.1832 entropy: 2.312034 
*VVV* .. iteration: 400/10000 nonEmpty: 12 K: 12 nmi: 82.1832 entropy: 2.312034 
*VVV* .. iteration: 450/10000 nonEmpty: 11 K: 11 nmi: 81.1627 entropy: 2.283326 
*VVV* .. iteration: 500/10000 nonEmpty: 13 K: 13 nmi: 84.8982 entropy: 2.388416 
*VVV* .. iteration: 550/10000 nonEmpty: 13 K: 13 nmi: 84.8982 entropy: 2.388416 
*VVV* .. iteration: 600/10000 nonEmpty: 14 K: 14 nmi: 88.896 entropy: 2.500883 
*VVV* .. iteration: 650/10000 nonEmpty: 14 K: 14 nmi: 88.896 entropy: 2.500883 
*VVV* .. iteration: 700/10000 nonEmpty: 14 K: 15 nmi: 91.0987 entropy: 2.562850 
*VVV* .. iteration: 750/10000 nonEmpty: 15 K: 15 nmi: 92.2898 entropy: 2.596360 
*VVV* .. iteration: 800/10000 nonEmpty: 15 K: 15 nmi: 92.2898 entropy: 2.596360 
*VVV* .. iteration: 850/10000 nonEmpty: 15 K: 15 nmi: 92.2898 entropy: 2.596360 
*VVV* .. iteration: 900/10000 nonEmpty: 15 K: 15 nmi: 92.2898 entropy: 2.596360 
*VVV* .. iteration: 950/10000 nonEmpty: 16 K: 16 nmi: 94.6821 entropy: 2.663661 
*VVV* .. iteration: 1000/10000 nonEmpty: 15 K: 15 nmi: 93.7927 entropy: 2.638641 
*VVV* .. iteration: 1050/10000 nonEmpty: 14 K: 14 nmi: 91.2693 entropy: 2.567652 
*VVV* .. iteration: 1100/10000 nonEmpty: 14 K: 14 nmi: 91.0987 entropy: 2.562850 
*VVV* .. iteration: 1150/10000 nonEmpty: 14 K: 14 nmi: 91.2693 entropy: 2.567652 
*VVV* .. iteration: 1200/10000 nonEmpty: 17 K: 17 nmi: 97.9391 entropy: 2.755290 
*VVV* .. iteration: 1250/10000 nonEmpty: 17 K: 17 nmi: 97.9391 entropy: 2.755290 
*VVV* .. iteration: 1300/10000 nonEmpty: 17 K: 17 nmi: 97.9391 entropy: 2.755290 
*VVV* .. iteration: 1350/10000 nonEmpty: 17 K: 17 nmi: 97.9391 entropy: 2.755290 
*VVV* .. iteration: 1400/10000 nonEmpty: 16 K: 16 nmi: 96.4608 entropy: 2.713701 
*VVV* .. iteration: 1450/10000 nonEmpty: 17 K: 17 nmi: 97.9391 entropy: 2.755290 
23 / 36
(High level) description of complete algorithm:5 
I run six chains (z;KjX;m). In parallel, independently of each 
other. 
There is a variable M which is the `current'/`best' model. 
At iteration i , based on P(X; zi ;m;Ki ) (and other quantities). 
Can be proven that M will be distributed proportional to 
P(Xjm=M). 
5Bradley P Carlin and Siddhartha Chib. Bayesian model choice via Markov 
chain Monte Carlo methods. In: Journal of the Royal Statistical 
Society-Series B Methodological 57.3 (1995), pp. 473{484. 
24 / 36
(High level) description of complete algorithm:5 
I run six chains (z;KjX;m). In parallel, independently of each 
other. 
There is a variable M which is the `current'/`best' model. 
At iteration i , based on P(X; zi ;m;Ki ) (and other quantities). 
Can be proven that M will be distributed proportional to 
P(Xjm=M). 
to work well, need to train good pseudopriors in advance. 
5Bradley P Carlin and Siddhartha Chib. Bayesian model choice via Markov 
chain Monte Carlo methods. In: Journal of the Royal Statistical 
Society-Series B Methodological 57.3 (1995), pp. 473{484. 
24 / 36
Application 
Wine dataset N=178 p=27 
3 regions of Italy 
mclust (3,VVI) 
1 2 3 
1 58 1 
2 5 65 1 
3 48 
MCMC (3,EEI) 
1 2 3 
1 58 1 
2 2 66 3 
3 48 
25 / 36

Más contenido relacionado

Destacado

Usa las vegas_4_hotels_luxueuses
Usa las vegas_4_hotels_luxueusesUsa las vegas_4_hotels_luxueuses
Usa las vegas_4_hotels_luxueusesfilipj2000
 
Haiti pour eve
Haiti pour eveHaiti pour eve
Haiti pour eveourbothy
 
Feudal System (short)
Feudal System (short)Feudal System (short)
Feudal System (short)benstory
 
Tribus de l'omo
Tribus de l'omoTribus de l'omo
Tribus de l'omofilipj2000
 
Cominfo11
Cominfo11Cominfo11
Cominfo11ATD13
 
Présentation d'Open Data Paris au Mobile 2.0 2011
Présentation d'Open Data Paris au Mobile 2.0 2011Présentation d'Open Data Paris au Mobile 2.0 2011
Présentation d'Open Data Paris au Mobile 2.0 2011Mairie de Paris
 
CRFCB AMU evolutions_catalogage_091213_enjeux_1
CRFCB AMU evolutions_catalogage_091213_enjeux_1CRFCB AMU evolutions_catalogage_091213_enjeux_1
CRFCB AMU evolutions_catalogage_091213_enjeux_1nonue12
 
Jean paris 1900-(cons)
Jean paris 1900-(cons)Jean paris 1900-(cons)
Jean paris 1900-(cons)filipj2000
 
EDOMA Présentation 2009
EDOMA Présentation 2009EDOMA Présentation 2009
EDOMA Présentation 2009huntziger
 

Destacado (20)

Usa las vegas_4_hotels_luxueuses
Usa las vegas_4_hotels_luxueusesUsa las vegas_4_hotels_luxueuses
Usa las vegas_4_hotels_luxueuses
 
Consejos en Trabajos con Goma Eva
Consejos en Trabajos con Goma EvaConsejos en Trabajos con Goma Eva
Consejos en Trabajos con Goma Eva
 
Haiti pour eve
Haiti pour eveHaiti pour eve
Haiti pour eve
 
Feudal System (short)
Feudal System (short)Feudal System (short)
Feudal System (short)
 
Tribus de l'omo
Tribus de l'omoTribus de l'omo
Tribus de l'omo
 
hjjjj
hjjjjhjjjj
hjjjj
 
Cominfo11
Cominfo11Cominfo11
Cominfo11
 
Info sacu
Info sacuInfo sacu
Info sacu
 
Présentation d'Open Data Paris au Mobile 2.0 2011
Présentation d'Open Data Paris au Mobile 2.0 2011Présentation d'Open Data Paris au Mobile 2.0 2011
Présentation d'Open Data Paris au Mobile 2.0 2011
 
¡Confianza alegre al trabajar!
¡Confianza alegre al trabajar!¡Confianza alegre al trabajar!
¡Confianza alegre al trabajar!
 
CRFCB AMU evolutions_catalogage_091213_enjeux_1
CRFCB AMU evolutions_catalogage_091213_enjeux_1CRFCB AMU evolutions_catalogage_091213_enjeux_1
CRFCB AMU evolutions_catalogage_091213_enjeux_1
 
Abraham
AbrahamAbraham
Abraham
 
Olympiades Mondiales 2009 Jour4
Olympiades Mondiales 2009  Jour4Olympiades Mondiales 2009  Jour4
Olympiades Mondiales 2009 Jour4
 
Zotero 3.0 - Doctorado Formación en la Sociedad del Conocimiento
Zotero 3.0 - Doctorado Formación en la Sociedad del ConocimientoZotero 3.0 - Doctorado Formación en la Sociedad del Conocimiento
Zotero 3.0 - Doctorado Formación en la Sociedad del Conocimiento
 
Jean paris 1900-(cons)
Jean paris 1900-(cons)Jean paris 1900-(cons)
Jean paris 1900-(cons)
 
EDOMA Présentation 2009
EDOMA Présentation 2009EDOMA Présentation 2009
EDOMA Présentation 2009
 
Info sacu
Info sacuInfo sacu
Info sacu
 
Biodiversidad
BiodiversidadBiodiversidad
Biodiversidad
 
Archivo 2
Archivo 2Archivo 2
Archivo 2
 
Archivo 1
Archivo 1Archivo 1
Archivo 1
 

Similar a MCMC for clustering of multivariate-Normal data

Trabajo
TrabajoTrabajo
Trabajoyucai
 
Book*
Book*Book*
Book*LPCO
 
Ejercicios De Mecanografía
Ejercicios De MecanografíaEjercicios De Mecanografía
Ejercicios De Mecanografíawongaa
 
Ejercicios De MecanografíA
Ejercicios De MecanografíAEjercicios De MecanografíA
Ejercicios De MecanografíAwongaa
 
Huruf jawi bersambung
Huruf jawi bersambungHuruf jawi bersambung
Huruf jawi bersambungHasimah Muda
 
Mapa de un diseño para el PAFSu de una UMF
Mapa de un diseño para el PAFSu de una UMFMapa de un diseño para el PAFSu de una UMF
Mapa de un diseño para el PAFSu de una UMFBryan Bone
 

Similar a MCMC for clustering of multivariate-Normal data (13)

Trabajo mecanet
Trabajo mecanetTrabajo mecanet
Trabajo mecanet
 
Mecanografia.
Mecanografia.Mecanografia.
Mecanografia.
 
Trabajo de Mecanet
Trabajo de Mecanet Trabajo de Mecanet
Trabajo de Mecanet
 
Trabajo
TrabajoTrabajo
Trabajo
 
Book*
Book*Book*
Book*
 
Alfabeto Cursiva
Alfabeto CursivaAlfabeto Cursiva
Alfabeto Cursiva
 
mecanografia
mecanografiamecanografia
mecanografia
 
Ejercicios De Mecanografía
Ejercicios De MecanografíaEjercicios De Mecanografía
Ejercicios De Mecanografía
 
Ejercicios De MecanografíA
Ejercicios De MecanografíAEjercicios De MecanografíA
Ejercicios De MecanografíA
 
Ejercicio mecanografia
Ejercicio mecanografiaEjercicio mecanografia
Ejercicio mecanografia
 
Huruf jawi bersambung
Huruf jawi bersambungHuruf jawi bersambung
Huruf jawi bersambung
 
DHV13
DHV13DHV13
DHV13
 
Mapa de un diseño para el PAFSu de una UMF
Mapa de un diseño para el PAFSu de una UMFMapa de un diseño para el PAFSu de una UMF
Mapa de un diseño para el PAFSu de una UMF
 

Último

9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 

Último (20)

9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 

MCMC for clustering of multivariate-Normal data

  • 1. MCMC for mixtures of Gaussians, and model selection Aaron McDaid, aaronmcdaid@gmail.com October 30, 2014 1 / 36
  • 2. Six models l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l ll l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l ll l ll l l l l l l l l l lll l l l l l l l l l l l l l ll l l l l l l ll l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l ll l l l ll l l l l l l l l l l l l ll l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l −20 0 20 40 60 0 20 40 60 80 100 V1 V2 (a) 1. vvv l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l −10 −5 0 5 10 −30 −20 −10 0 10 20 30 V1 V2 (b) 2. eee 2 / 36
  • 3. Six models l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l −10 0 10 20 30 −20 −10 0 10 V1 V2 (a) 3. vvi l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l llll l l l l l ll l l l l l l l l l l l l l l l l l l l l l ll l ll ll l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l ll l l l l l l l l l l l ll l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l −30 −20 −10 0 10 20 30 −30 −20 −10 0 10 20 30 V1 V2 (b) 4. eei 3 / 36
  • 4. Six models l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l ll l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l ll l l l l l l l lll l l l l l ll l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l ll l l l l l l l l l l l l l l ll ll l l −20 −10 0 10 20 −20 −10 0 10 20 V1 V2 (a) 5. vii l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l ll l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l −100 0 100 200 −200 −100 0 100 200 V1 V2 (b) 6. eii 4 / 36
  • 5. Old Faithful N=272 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 50 60 70 80 90 eruptions waiting Old Faithful - Yellowstone National Park 5 / 36
  • 6. Old Faithful N=272 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 50 60 70 80 90 V1 V2 Old Faithful - Yellowstone National Park 6 / 36
  • 8. ne the mclust model Bayes Factor and BIC - connection between mclust and MCMC Priors Integration (analytical and numerical) MCMC algorithm1 Selecting from the six models via MCMC Evaluation (on synthetic data) One application 1Mahlet G. Tadesse, Naijun Sha, and Marina Vannucci. Bayesian Variable Selection in Clustering High-Dimensional Data". In: Journal of the American Statistical Association 100.470 (June 2005), pp. 602{617. issn: 0162-1459. doi: 10.1198/016214504000001565. url: http://www.stat.rice.edu/~{}marina/papers/jasa05.pdf. 7 / 36
  • 9. Goals Not a `shootout' with mclust See what MCMC can do Calculate the Bayes Factor more precisely - is it better than BIC? Push to larger numbers of clusters 8 / 36
  • 10. Basic model N data points in a p-dimensional space. m 2 (fvvv; eee; vvi; eei; vii; eiig) K number of clusters k covariance of clusterk k mean Pof cluster k K k=1 k = 1 zi P(zi = k) = k xi jzi=k Normal(k ;k ): Mixture models P(xi jzi=k) = N(xi jk ;k ) P(xi ) = XK k=1 kN(xi jk ;k ) 9 / 36
  • 11. mclust MLE (Maximum Likelihood Estimate) R package mclust2 Given (K;m), use Expectation-Maximization (EM) algorithm to estimate (;;). P(Xjk ;k ;;m;K) Requires running EM for each possible combination of (K;m). Hundreds of runs may be required. f(K = 2;m = VVI); (K = 3;m = EEI ); (K = 50;m = EEI ); : : : g Then use BIC to select among the models. 2Chris Fraley and Adrian E. Raftery. MCLUST: Software for model-based cluster analysis. In: Journal of Classi
  • 12. cation 16.2 (1999), pp. 297{306. 10 / 36
  • 13. mclust Why do we need model selection? vvv vvi vii eee eei eii De
  • 14. ne = (;;). P(Xj=^ eee;K;m=vvv;K) = P(Xj=^ eee;K;m=eee;K) Cannot maximize P(Xj;m;K) Count the degrees-of-freedom f , in order to penalize the more complex model. AIC = 2 log P(XjMLE m;K ;m;K) 2f BIC = 2 log P(XjMLE m;K ;m;K) log(N)f 11 / 36
  • 15. Bayes Factor (BIC) Bayesian Information Criterion BIC 2 log Bayes Factor z }| { (P(Xjm;K)) P(X = Xobs jm;K) Informally, the average P(Xj;m;K) over all . Can we compute this (weighted) average more accurately? 12 / 36
  • 16. Bayes Factor (BIC) Bayesian Information Criterion BIC 2 log Bayes Factor z }| { (P(Xjm;K)) P(X = Xobs jm;K) Informally, the average P(Xj;m;K) over all . Can we compute this (weighted) average more accurately? P(Xjm=vvv;K) P(Xjm=eee;K) = R R P(X;jm=vvv;K) d P(X;jm=eee;K) d 12 / 36
  • 17. Full model N data points in a p-dimensional space. dependence distribution m Uniform(fvvv; eee; vvi; eei; vii; eiig) K jK0 Poisson(1) jK Dirichlet(0): zi j;K P(zi = kj;K) = k k jm;K Wishart1(V0; g0): k jk ;m;K Normal(0; 1 n0 k ): xi jzi=k; k ;;m;K Normal(k ;k ): 0 = 1 2 ; 1 2 ; :::; 1 2 0 = X n0 = 0:001 g0 = (p+1)+n0(p+1) 1n0 p + 1 + V0 = Cov(X)(g0 p 1) 13 / 36
  • 18. Dirichlet ( 1K ; 1K ; :::; 1K ). PK k=1 k = 1 Dirichlet gives us random vectors Dirichlet(1; 2; :::; K) K = 4, = (0:01; 0:09; 0:80; 0:10) May lead to empty clusters in the prior, and therefore in posterior too KjX KTRUE Solution3. K Poisson(1)jK 1 3Agostino Nobile. Bayesian
  • 19. nite mixtures: a note on prior speci
  • 20. cation and posterior computation. In: arXiv preprint arXiv:0711.0458 (2007). 14 / 36
  • 21. Integration Joint probability P(X;;; z;;K;m) =P(Xj;; z;;K;m) P(j;z;;K;m)0;n0 P(jz;;K;m)V0;g0 P(zj;K;m) P(jK;m)0 P(K jm) P(m) 15 / 36
  • 22. Integration In general, P(ajb) = X c P(a; cjb) P(ajb) = Z P(a; ejb) de P(ajb) = X c P(ajc; b)P(cjb) P(ajb) = Z P(aje; b)P(ejb) de 16 / 36
  • 23. Integration P(mjX) = 1X K=1 X z Z Z Z P(;; z;;K;mjX) d d d P(KjX) = X m X z Z Z Z P(;; z;;K;mjX) d d d P(K;mjX) = X z Z Z Z P(;; z;;K;mjX) d d d P(zjX) = 1X K=1 X m Z Z Z P(;; z;;K;mjX) d d d 17 / 36
  • 24. Integration P(z;K;mjX) = Z Z Z P(;; z;;K;mjX) d d d P(z;K;mjX) = Z Z Z 1 P(X) P(;; z;;K;m;X) d d d P(z;K;mjX) = 1 P(X) Z Z Z P(;;jz;K;m;X)P(z;K;m;X) d d P(z;K;m;X) P(z;K;mjX) = P(X) Z Z Z P(;;jz;K;m;X) d d d 18 / 36
  • 26. ed the model, with all our priors, earlier. How do we get our estimates? RJMCMC4 would give many estimates of P(;; z;;K;mjX). Want faster MCMC. Solve P(X; z;K;m) analytically. Use that to sample z;K;mjX. 4Peter J. Green. Reversible Jump Markov Chain Monte Carlo computation and Bayesian model determination. In: Biometrika 82.4 (Dec. 1995), pp. 711{732. doi: 10.1093/biomet/82.4.711. url: http://dx.doi.org/10.1093/biomet/82.4.711. 19 / 36
  • 28. ed the model, with all our priors, earlier. How do we get our estimates? RJMCMC4 would give many estimates of P(;; z;;K;mjX). Want faster MCMC. Solve P(X; z;K;m) analytically. Use that to sample z;K;mjX. Count popular (m), or (K), or (m;K) in sample. P(m;KjX). (Proven identical to RJMCMC - dierent MCMC algorithms (usually) don't change results, just speed.) 4Peter J. Green. Reversible Jump Markov Chain Monte Carlo computation and Bayesian model determination. In: Biometrika 82.4 (Dec. 1995), pp. 711{732. doi: 10.1093/biomet/82.4.711. url: http://dx.doi.org/10.1093/biomet/82.4.711. 19 / 36
  • 30. ed the model, with all our priors, earlier. How do we get our estimates? RJMCMC4 would give many estimates of P(;; z;;K;mjX). Want faster MCMC. Solve P(X; z;K;m) analytically. Use that to sample z;K;mjX. Count popular (m), or (K), or (m;K) in sample. P(m;KjX). (Proven identical to RJMCMC - dierent MCMC algorithms (usually) don't change results, just speed.) If desired, ;;jX; z;K;m is easily generated. 4Peter J. Green. Reversible Jump Markov Chain Monte Carlo computation and Bayesian model determination. In: Biometrika 82.4 (Dec. 1995), pp. 711{732. doi: 10.1093/biomet/82.4.711. url: http://dx.doi.org/10.1093/biomet/82.4.711. 19 / 36
  • 31. Analytical integration P(X; z;K;m) = Z Z Z P(X;;; z;;K;m) d d d P(;;;X; z;K;m) = P(;;;X; z;K;m) P(;;jX; z;K;m)P(X; z;K;m) = P(X; z;K;mj;;)P(;;) P(X; z;K;m) = P(X;z;K;mj;;)P(;;) P(;;jX;z;K;m) 20 / 36
  • 32. Numerical integration (MCMC) Markov Chain Monte Carlo (MCMC) Begin with an initial estimate (z1;m1;K1) At each iteration, propose to perturb (zi ;mi ;Ki ) ) (zi;mi;Ki) Similar to current state, to enable a gradual `climb' towards the good estimates. 21 / 36
  • 33. Numerical integration (MCMC) Markov Chain Monte Carlo (MCMC) Begin with an initial estimate (z1;m1;K1) At each iteration, propose to perturb (zi ;mi ;Ki ) ) (zi;mi;Ki) Similar to current state, to enable a gradual `climb' towards the good estimates. h De
  • 34. ne ai = min 1; P(X;zi;mi;Ki) P(X;zi ;mi ;Ki ) q(zi ;mi ;Ki jzi;mi;Ki) q(zi;mi;Kijzi ;mi ;Ki ) i 21 / 36
  • 35. Numerical integration (MCMC) Markov Chain Monte Carlo (MCMC) Begin with an initial estimate (z1;m1;K1) At each iteration, propose to perturb (zi ;mi ;Ki ) ) (zi;mi;Ki) Similar to current state, to enable a gradual `climb' towards the good estimates. h De
  • 36. ne ai = min 1; P(X;zi;mi;Ki) P(X;zi ;mi ;Ki ) q(zi ;mi ;Ki jzi;mi;Ki) q(zi;mi;Kijzi ;mi ;Ki ) i (zi+1;mi+1;Ki+1) = (zi;mi;Ki) with probability ai . (zi+1;mi+1;Ki+1) = (zi ;mi ;Ki ) with probability 1ai . Resulting estimates will be drawn as z;m;KjX 21 / 36
  • 37. Numerical integration (MCMC) Markov Chain Monte Carlo (MCMC) Begin with an initial estimate (z1;m1;K1) At each iteration, propose to perturb (zi ;mi ;Ki ) ) (zi;mi;Ki) Similar to current state, to enable a gradual `climb' towards the good estimates. h De
  • 38. ne ai = min 1; P(X;zi;mi;Ki) P(X;zi ;mi ;Ki ) q(zi ;mi ;Ki jzi;mi;Ki) q(zi;mi;Kijzi ;mi ;Ki ) i (zi+1;mi+1;Ki+1) = (zi;mi;Ki) with probability ai . (zi+1;mi+1;Ki+1) = (zi ;mi ;Ki ) with probability 1ai . Resulting estimates will be drawn as z;m;KjX `Good' proposals don't aect the distribution, but they do improve speed 21 / 36
  • 39. The above is too slow. Still too much correlation, slowing the progress. So I run six chains, (z;KjX;m = vvv) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) (z;KjX;m = eee) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) (z;KjX;m = vvi) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) (z;KjX;m = eei) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) (z;KjX;m = vii) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) (z;KjX;m = eii) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki) and combine the results. Results should be combined in proportion to P(mjX). 22 / 36
  • 40. *VVV* .. iteration: 0/10000 nonEmpty: 1 K: 1 nmi: 0 entropy: 0.000000 *VVV* .. iteration: 50/10000 nonEmpty: 6 K: 6 nmi: 52.5095 entropy: 1.477233 *VVV* .. iteration: 100/10000 nonEmpty: 9 K: 9 nmi: 72.713 entropy: 2.045612 *VVV* .. iteration: 150/10000 nonEmpty: 9 K: 9 nmi: 75.5046 entropy: 2.124148 *VVV* .. iteration: 200/10000 nonEmpty: 9 K: 9 nmi: 74.8402 entropy: 2.105455 *VVV* .. iteration: 250/10000 nonEmpty: 10 K: 11 nmi: 77.8969 entropy: 2.191450 *VVV* .. iteration: 300/10000 nonEmpty: 11 K: 11 nmi: 79.2266 entropy: 2.228856 *VVV* .. iteration: 350/10000 nonEmpty: 12 K: 12 nmi: 82.1832 entropy: 2.312034 *VVV* .. iteration: 400/10000 nonEmpty: 12 K: 12 nmi: 82.1832 entropy: 2.312034 *VVV* .. iteration: 450/10000 nonEmpty: 11 K: 11 nmi: 81.1627 entropy: 2.283326 *VVV* .. iteration: 500/10000 nonEmpty: 13 K: 13 nmi: 84.8982 entropy: 2.388416 *VVV* .. iteration: 550/10000 nonEmpty: 13 K: 13 nmi: 84.8982 entropy: 2.388416 *VVV* .. iteration: 600/10000 nonEmpty: 14 K: 14 nmi: 88.896 entropy: 2.500883 *VVV* .. iteration: 650/10000 nonEmpty: 14 K: 14 nmi: 88.896 entropy: 2.500883 *VVV* .. iteration: 700/10000 nonEmpty: 14 K: 15 nmi: 91.0987 entropy: 2.562850 *VVV* .. iteration: 750/10000 nonEmpty: 15 K: 15 nmi: 92.2898 entropy: 2.596360 *VVV* .. iteration: 800/10000 nonEmpty: 15 K: 15 nmi: 92.2898 entropy: 2.596360 *VVV* .. iteration: 850/10000 nonEmpty: 15 K: 15 nmi: 92.2898 entropy: 2.596360 *VVV* .. iteration: 900/10000 nonEmpty: 15 K: 15 nmi: 92.2898 entropy: 2.596360 *VVV* .. iteration: 950/10000 nonEmpty: 16 K: 16 nmi: 94.6821 entropy: 2.663661 *VVV* .. iteration: 1000/10000 nonEmpty: 15 K: 15 nmi: 93.7927 entropy: 2.638641 *VVV* .. iteration: 1050/10000 nonEmpty: 14 K: 14 nmi: 91.2693 entropy: 2.567652 *VVV* .. iteration: 1100/10000 nonEmpty: 14 K: 14 nmi: 91.0987 entropy: 2.562850 *VVV* .. iteration: 1150/10000 nonEmpty: 14 K: 14 nmi: 91.2693 entropy: 2.567652 *VVV* .. iteration: 1200/10000 nonEmpty: 17 K: 17 nmi: 97.9391 entropy: 2.755290 *VVV* .. iteration: 1250/10000 nonEmpty: 17 K: 17 nmi: 97.9391 entropy: 2.755290 *VVV* .. iteration: 1300/10000 nonEmpty: 17 K: 17 nmi: 97.9391 entropy: 2.755290 *VVV* .. iteration: 1350/10000 nonEmpty: 17 K: 17 nmi: 97.9391 entropy: 2.755290 *VVV* .. iteration: 1400/10000 nonEmpty: 16 K: 16 nmi: 96.4608 entropy: 2.713701 *VVV* .. iteration: 1450/10000 nonEmpty: 17 K: 17 nmi: 97.9391 entropy: 2.755290 23 / 36
  • 41. (High level) description of complete algorithm:5 I run six chains (z;KjX;m). In parallel, independently of each other. There is a variable M which is the `current'/`best' model. At iteration i , based on P(X; zi ;m;Ki ) (and other quantities). Can be proven that M will be distributed proportional to P(Xjm=M). 5Bradley P Carlin and Siddhartha Chib. Bayesian model choice via Markov chain Monte Carlo methods. In: Journal of the Royal Statistical Society-Series B Methodological 57.3 (1995), pp. 473{484. 24 / 36
  • 42. (High level) description of complete algorithm:5 I run six chains (z;KjX;m). In parallel, independently of each other. There is a variable M which is the `current'/`best' model. At iteration i , based on P(X; zi ;m;Ki ) (and other quantities). Can be proven that M will be distributed proportional to P(Xjm=M). to work well, need to train good pseudopriors in advance. 5Bradley P Carlin and Siddhartha Chib. Bayesian model choice via Markov chain Monte Carlo methods. In: Journal of the Royal Statistical Society-Series B Methodological 57.3 (1995), pp. 473{484. 24 / 36
  • 43. Application Wine dataset N=178 p=27 3 regions of Italy mclust (3,VVI) 1 2 3 1 58 1 2 5 65 1 3 48 MCMC (3,EEI) 1 2 3 1 58 1 2 2 66 3 3 48 25 / 36
  • 44. Synthetic data N = 400 K 2 f5; 10; 20g p 2 f16; 4g g0 2 fp; p + 1; p + 2g n0 2 f0:001; 0:01; 0:1g m 2 fvvv; eee; vvi; eei; vii; eiig 324 kinds of dataset. 5 realizations of each. A total of 1620 datasets. Ran mclust and MCMC algorithm on each 26 / 36
  • 45. N=400 K=5 m = vvv;K = 5; p = 16 mclust ^K ^m 36 5 VVV 4 3 VVV 3 6 VVV 1 4 VVV 1 2 VVV MCMC ^K ^m 39 VVV 5 3 VVV 4 1 VVI 8 1 VVI 11 1 EEE 10 m = vvv;K = 5; p = 4 ^m ^K 40 5 VVV 2 6 VVV 1 8 VVV 1 7 VVV 1 4 VVV ^m ^K 43 5 VVV 1 8 EEE 1 4 VVV 27 / 36
  • 46. N=400 K=20 m = eee;K = 20; p = 16 mclust ^K ^m 28 20 EEE 4 23 EEE 3 27 EEE 2 24 EEE 2 22 EEE 2 21 EEE ... MCMC ^K ^m 45 20 EEE m = eee;K = 20; p = 4 ^m ^K 13 20 EEE 8 19 EEE 4 23 EEE 3 18 EEE 3 15 EEE ... ^m ^K 28 20 EEE 4 17 EEE 3 16 EEE 3 13 EEE 3 12 EEE ... 28 / 36
  • 47. Synthetic data N=100 K=20 V1 −40 0 20 40 l l ll l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l lll l ll l l l ll l ll l l l l ll l l l l l l l l l l l l l l ll ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l l l l l l ll l l l l l l l l l l l l l −40 0 20 40 l l l l l l l ll l l l l l l ll l l l l l l l ll ll l l l l l ll l l lll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l l l ll l l l l l l l l l l l ll l l ll l l l ll l l l lll l l l ll l l l l l l l l l l l l l l l l l l −40 0 20 40 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l ll l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l ll l l l l lll l l l l l l l l l l l l l l l l l l l l l l l l l l −40 0 20 40 −40 0 20 l l l l ll l l l l l l l l l l l l l l l l l l l l l ll l l l l l ll l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l −40 0 20 l l l l l l ll l l ll l l l l l l l ll l l l l l l l l l l l l l l l ll l l ll l l ll l l ll l l l l l l l l l l l l l l l l l l l l V2 l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l ll l l l l ll l l l l l l ll l l l l l l l l l l l l l ll l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l ll l ll ll l ll l l l l l l l ll l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l l ll l l ll l l ll l l l l l l l l l l l l l l l l l l ll l ll l l l l l l ll l ll l l l l l ll l l l l l l l l l l l l l ll l l ll l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l ll l ll ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l ll l l l l ll ll ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l ll l l l l l ll l l l ll l lll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l ll l ll l l ll l l l l l l ll l lll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l V3 l l ll l l l l l ll ll ll l l l l l l l ll l l l l l l l l ll l l l l l l ll l l l ll l lll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l l l l l l l lll l l l l l l l ll l l llll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l ll l lll l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l ll l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l −40 0 20 l l l l l l l l l l l l l l l ll l l l l l l l l l ll l l l l l l lll l l ll l l l ll l ll l l l l l l l l l l ll ll l l l ll l l l l l l l l l l −40 0 20 ll l l l l l l ll l l l l l l l l l l ll ll l ll l l l l l l l ll l l l l l l ll l l l l l l ll l l l l l l l l l l ll l l l lll l ll l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l lll l ll l l l l ll l l l l l l l l l ll l l l l l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l ll l ll l l l ll l l l l l l l l l l l l l l l l ll l l l V4 ll l l l l l l l l l l l l l l l l l ll l l ll l l l l l l l l ll l l l l ll l l l l l l l l l l l ll ll l l l l l ll l l l l l l l l ll l l lll l l l l ll l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l ll lll l l l l l l l l l l l ll l l ll l l lll l l l l l l l l l l l l l l l l l l l ll l l l l l ll l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l ll l ll l l l lll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l ll l lll l ll l l l l l l l l l l l l l ll l l l l l l ll l l l l l l l l ll l l l l ll l l l l l l l l l l l l l l l l l l l l lll l l ll l ll l l l l l ll l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l ll l l l l l l l l l ll l l l l l l ll l l l l l l l l l l l l l l l l l l l ll ll ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l ll l lll l l l l l l l l l l l l l l ll l l l l l ll l l l lll ll ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l lll l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l V5 l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l −40 0 20 l l l l l l l l l l l l l l l ll l ll l l l l l ll l l l ll l l l l l l l l l l l l l l l l l l l l l −40 0 20 l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l ll l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l ll l l l l l l ll l l l l l l l l ll l l ll l l lll l ll l l l l ll l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l l l ll l ll l ll l l l l l l l l ll l l l l l l l l l l ll l l l lll l l l l l ll l l l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l ll l V6 lll l l l ll ll l l l ll l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l ll l l l l ll l l ll l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l ll l l ll l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l V7 −40 0 20 l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l −40 0 20 40 −40 0 20 l l ll l l l l l lll l l l l l l l l l l l ll l l l l l l l ll l l l ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l lll l l l l l l l l lll l ll l l l lll ll ll l l l l l l l l l ll l l l ll l l l l l ll l l l l l l l ll ll l l l l l −40 0 20 40 ll l l l l lll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l lll l l l l l ll l l l l ll ll l l l ll ll l l l l l l l l lll l l l l l l l l ll l l l l lll l l l l l ll l ll l l l l l l l l l l l l l l ll l l l l l l l l l l l −40 0 20 40 ll l l l lllll l l l ll l l l l ll l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l lll l l l l l l l l l l l l l lll l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l −40 0 20 40 ll l l l l lll ll l l l ll l l l l l l l l l ll l ll l l l l lll l l l l l l l l l l l l l l l l l l l l ll l l l l l V8 29 / 36
  • 48. Synthetic data N=100 K=20 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l −40 −20 0 20 40 −40 −20 0 20 40 V1 V2 30 / 36
  • 49. Synthetic data N=100 K=20 l l l l ll l l l l l l l l l l l l l l l l l −40 −20 0 20 40 −40 −20 0 20 40 V1 V2 31 / 36
  • 50. N = 100;K = 20; p = 8;m = vvv . 15 such datasets mclust ^K ^m 5 VVV 7 VVV 16 VII 19 VII 22 EII 23 EEE 23 EEE 25 EII 25 EII 26 EII 27 EEE 29 EEE 29 EEI 29 EII 34 EEE MCMC ^K ^m 17 VVV 17 VVV 17 VVV 17 VVV 18 VVV 18 VVV 18 VVV 18 VVV 19 VVV 19 VVV 19 VVV 19 VVV 20 VVV 20 VVV 20 VVV 32 / 36
  • 51. N = 100;K = 20; p = 8;m = eee . 15 such datasets mclust ^K ^m 19 EEE 19 EEE 19 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 21 EEE 22 EEE MCMC ^K ^m 18 EEE 19 EEE 19 EEE 19 EEE 19 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 20 EEE 33 / 36
  • 52. N = 100;K = 20; p = 8;m = vvi . 15 such datasets mclust ^K ^m 11 VVI 13 VVI 14 VVI 14 VVI 15 VVI 15 VVI 16 VVI 17 VVI 18 VVI 19 EEI 19 VVI 20 VII 20 VVI 21 EEI 24 EII MCMC ^K ^m 17 VVI 18 VVI 19 VVI 19 VVI 19 VVI 20 VVI 20 VVI 20 VVI 20 VVI 20 VVI 20 VVI 20 VVI 20 VVI 20 VVI 20 VVI 34 / 36
  • 53. N = 100;K = 20; p = 8;m = vii . 15 such datasets mclust ^K ^m 5 VVV 8 VII 12 VII 13 VII 17 VII 17 VII 18 VII 18 VII 18 VII 19 VII 19 VII 19 VII 20 VII 21 EII 32 EEE MCMC ^K ^m 14 VII 16 VVV 17 VVV 19 VII 19 VII 19 VII 19 VII 19 VII 19 VII 19 VII 20 VII 20 VII 20 VII 20 VII 20 VII 35 / 36
  • 54. Concluding remarks V** more dicult that E** VVV more dicult that VVI and VII Large K, small N, most dicult MCMC excels here (At
  • 55. rst, I expected dierently) Should repeat N = 100;K 2 f10; 20g across more p, more n0, et cetera. 36 / 36