This is a discussion of the presentations of John Geweke and of Sylvia Früwirth-Schnatter, during the ICMS convference on March 3-5, 2010, in Edinburgh
Disha NEET Physics Guide for classes 11 and 12.pdf
ICMS Discussion, March 2010
1. Does MCMC converge?
Postprocessing MCMC output
Multimodality and label switching: a discussion
Christian P. Robert
Universit´ Paris-Dauphine and CREST, INSEE
e
http://www.ceremade.dauphine.fr/~xian
Workshop on mixtures, ICMS
February 28, 2010
Christian P. Robert Multimodality and label switching: a discussion
2. Does MCMC converge?
Postprocessing MCMC output
Outline
1 Does MCMC converge?
2 Postprocessing MCMC output
Christian P. Robert Multimodality and label switching: a discussion
3. Does MCMC converge?
Postprocessing MCMC output
Monte Carlo perspective
When, given a target π, an MCMC sampler never visits more than
50% of the support of π,
Christian P. Robert Multimodality and label switching: a discussion
4. Does MCMC converge?
Postprocessing MCMC output
Monte Carlo perspective
When, given a target π, an MCMC sampler never visits more than
50% of the support of π, it can be argued that the sampler does
not converge!
Christian P. Robert Multimodality and label switching: a discussion
5. Does MCMC converge?
Postprocessing MCMC output
Monte Carlo perspective
When, given a target π, an MCMC sampler never visits more than
50% of the support of π, it can be argued that the sampler does
not converge!
Two-component normal mixture and Gibbs sampler
4
Case when both means µi
3
are the only unknowns,
2
with different weights and µ2
1
same variance: identifiable 0
model
−1
−1 0 1 2 3 4
µ1
Christian P. Robert Multimodality and label switching: a discussion
6. Does MCMC converge?
Postprocessing MCMC output
Monte Carlo perspective
When, given a target π, an MCMC sampler never visits more than
50% of the support of π, it can be argued that the sampler does
not converge!
Two-component normal mixture and Gibbs sampler
4
Case when both means µi
3
are the only unknowns,
2
with different weights and µ2
1
same variance: identifiable 0
model
−1
−1 0 1 2 3 4
µ1
Christian P. Robert Multimodality and label switching: a discussion
7. Does MCMC converge?
Postprocessing MCMC output
Monte Carlo perspective
When, given a target π, an MCMC sampler never visits more than
50% of the support of π, it can be argued that the sampler does
not converge!
Two-component normal mixture and Gibbs sampler
4
Case when both means µi
3
are the only unknowns,
2
with different weights and
µ2
1
same variance: identifiable 0
model
−1
−1 0 1 2 3 4
µ1
(C.) Simple MCMC does not work
Christian P. Robert Multimodality and label switching: a discussion
8. Does MCMC converge?
Postprocessing MCMC output
Imposed permutations may miss the mark
While duplicating the MCMC sampler according to all
permutations ρ in Sk produces perfect exchangeability [nice!],
Christian P. Robert Multimodality and label switching: a discussion
9. Does MCMC converge?
Postprocessing MCMC output
Imposed permutations may miss the mark
While duplicating the MCMC sampler according to all
permutations ρ in Sk produces perfect exchangeability [nice!],
it does not bring additional energy to the MCMC sampler
it does not identify other modes (under- or over-fitting)
it does not apply in nearly-but-not exchangeable settings
Christian P. Robert Multimodality and label switching: a discussion
10. Does MCMC converge?
Postprocessing MCMC output
Illustrations
Example (Two-mean Gaussian mixture)
4
3
2
Case of
pN (µ1 , 1) + (1 − p)N (µ1 , 1)
µ1
1
(p = 0.5)
0
−1
4 3 2 1 0 −1
µ2
Christian P. Robert Multimodality and label switching: a discussion
11. Does MCMC converge?
Postprocessing MCMC output
Illustrations (2)
Example (Two-mean Gaussian mixture and outliers)
Same model, but data from 5-component mixture
Christian P. Robert Multimodality and label switching: a discussion
12. Does MCMC converge?
Postprocessing MCMC output
Illustrations (3)
Example (Outlier Gaussian mixture)
Case of pN (0, 1) + (1 − p)N (µ, σ 2 ) with p known
Christian P. Robert Multimodality and label switching: a discussion
13. Does MCMC converge?
Postprocessing MCMC output
Illustrations (3)
Example (Outlier Gaussian mixture)
Case of pN (0, 1) + (1 − p)N (µ, σ 2 ) with p known
Christian P. Robert Multimodality and label switching: a discussion
14. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Postprocessing issues
When assessing the number k of components via the evidence
Zk = πk (θk )Lk (θk ) dθk ,
Θk
aka the marginal likelihood,
Christian P. Robert Multimodality and label switching: a discussion
15. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Postprocessing issues
When assessing the number k of components via the evidence
Zk = πk (θk )Lk (θk ) dθk ,
Θk
aka the marginal likelihood, label switching is a liability and a
uninteresting phenomenon
Christian P. Robert Multimodality and label switching: a discussion
16. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Postprocessing issues
When assessing the number k of components via the evidence
Zk = πk (θk )Lk (θk ) dθk ,
Θk
aka the marginal likelihood, label switching is a liability and a
uninteresting phenomenon
Indeed,
πk (θk )Lk (θk ) dθk , = k! πk (θk )Lk (θk ) dθk
Θk Θk /Sk
means that integrating over the restricted space is [more than] ok!
Christian P. Robert Multimodality and label switching: a discussion
17. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
Zk = mk (x) =
πk (θk |x)
Christian P. Robert Multimodality and label switching: a discussion
18. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
Zk = mk (x) =
πk (θk |x)
Use of an approximation to the posterior
∗ ∗
fk (x|θk ) πk (θk )
Zk = mk (x) = .
πk (θk |x)
ˆ ∗
Christian P. Robert Multimodality and label switching: a discussion
19. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwell
estimate
T
∗ 1 ∗ (t)
πk (θk |x) = πk (θk |x, zk ) ,
T
t=1
(t)
where the zk ’s are Gibbs sampled latent variables
Christian P. Robert Multimodality and label switching: a discussion
20. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwell
estimate
T
∗ 1 ∗ (t)
πk (θk |x) = πk (θk |x, zk ) ,
T
t=1
(t)
where the zk ’s are Gibbs sampled latent variables
But convergence impaired by lack of label switching
Christian P. Robert Multimodality and label switching: a discussion
21. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwell
estimate
T
∗ 1 ∗ (t)
πk (θk |x) = πk (θk |x, zk ) ,
T
t=1
(t)
where the zk ’s are Gibbs sampled latent variables
But convergence impaired by lack of label switching
(C.) Simple MCMC does not work
Christian P. Robert Multimodality and label switching: a discussion
22. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Compensation for label switching
(t)
For mixture models, zk usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (ρ(θk )|x) = πk (ρ(θk )|x)
k!
ρ∈S
for all ρ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Christian P. Robert Multimodality and label switching: a discussion
23. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Compensation for label switching
(t)
For mixture models, zk usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (ρ(θk )|x) = πk (ρ(θk )|x)
k!
ρ∈S
for all ρ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Recover the theoretical symmetry by using
T
∗ 1 ∗ (t)
πk (θk |x) = πk (ρ(θk )|x, zk ) .
T k!
ρ∈Sk t=1
[Berkhof, Mechelen, & Gelman, 2003]
Christian P. Robert Multimodality and label switching: a discussion
24. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Galaxy dataset (k)
∗
Using only the original estimate, with θk as the MAP estimator,
log(mk (x)) = −105.1396
ˆ
for k = 3 (based on 103 simulations), while introducing the
permutations leads to
log(mk (x)) = −103.3479 = −105.1396 + log(3!)
ˆ
Christian P. Robert Multimodality and label switching: a discussion
25. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Galaxy dataset (k)
∗
Using only the original estimate, with θk as the MAP estimator,
log(mk (x)) = −105.1396
ˆ
for k = 3 (based on 103 simulations), while introducing the
permutations leads to
log(mk (x)) = −103.3479 = −105.1396 + log(3!)
ˆ
k 2 3 4 5 6 7 8
mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44
Estimations of the marginal likelihoods by the symmetrised Chib’s
approximation (based on 105 Gibbs iterations and, for k > 5, 100
permutations selected at random in Sk ).
[Lee, Marin, Mengersen & Robert, 2008]
Christian P. Robert Multimodality and label switching: a discussion
26. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Comparison between evidence approximations
1 Nested sampling: M = 1000 points, with 10 random walk
moves at each step, simulations from the constr’d prior and a
stopping rule at 95% of the observed maximum likelihood
2 T = 104 MCMC (=Gibbs) simulations producing
non-parametric estimates ϕ
3 Monte Carlo estimates Z1 , Z2 , Z3 using product of two
Gaussian kernels
4 numerical integration based on 850 × 950 grid [reference
value, confirmed by Chib’s]
Christian P. Robert Multimodality and label switching: a discussion
27. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Comparison (cont’d)
Graph based on a sample of 10 observations for µ = 2 and
σ = 3/2 (150 replicas) V1=Nested sampling, V2=importance
sampling, V3=harmonic mean, V4=bridge sampling.
[Chopin & Robert, 2010]
Christian P. Robert Multimodality and label switching: a discussion
28. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Comparison (cont’d)
Graph based on a sample of 50 observations for µ = 2 and
σ = 3/2 (150 replicas) V1=Nested sampling, V2=importance
sampling, V3=harmonic mean, V4=bridge sampling.
[Chopin & Robert, 2010]
Christian P. Robert Multimodality and label switching: a discussion
29. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Comparison (cont’d)
Graph based on a sample of 100 observations for µ = 2 and
σ = 3/2 (150 replicas) V1=Nested sampling, V2=importance
sampling, V3=harmonic mean, V4=bridge sampling.
[Chopin & Robert, 2010]
Christian P. Robert Multimodality and label switching: a discussion
30. Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling
Comparison (cont’d)
[Lee, Marin, Mengersen & Robert, 2010]
Christian P. Robert Multimodality and label switching: a discussion