ICMS Discussion, March 2010

Does MCMC converge?
Postprocessing MCMC output

Multimodality and label switching: a discussion

Christian P. Robert

Universit´ Paris-Dauphine and CREST, INSEE
e
http://www.ceremade.dauphine.fr/~xian

Workshop on mixtures, ICMS
February 28, 2010

Christian P. Robert Multimodality and label switching: a discussion

Does MCMC converge?

Outline

1 Does MCMC converge?

2 Postprocessing MCMC output


Does MCMC converge?

Monte Carlo perspective

When, given a target π, an MCMC sampler never visits more than
50% of the support of π,


Does MCMC converge?


50% of the support of π, it can be argued that the sampler does
not converge!


Does MCMC converge?


not converge!
Two-component normal mixture and Gibbs sampler

4
Case when both means µi

3
are the only unknowns,

2
with diﬀerent weights and µ2
1

same variance: identiﬁable 0

model
−1

−1 0 1 2 3 4

µ1


Does MCMC converge?


not converge!
Two-component normal mixture and Gibbs sampler

4
Case when both means µi

3
are the only unknowns,

2
with diﬀerent weights and
µ2
1
same variance: identiﬁable 0

model
−1

−1 0 1 2 3 4

µ1

(C.) Simple MCMC does not work


Does MCMC converge?

Imposed permutations may miss the mark

While duplicating the MCMC sampler according to all
permutations ρ in Sk produces perfect exchangeability [nice!],


Does MCMC converge?

Imposed permutations may miss the mark

While duplicating the MCMC sampler according to all
permutations ρ in Sk produces perfect exchangeability [nice!],
it does not bring additional energy to the MCMC sampler
it does not identify other modes (under- or over-ﬁtting)
it does not apply in nearly-but-not exchangeable settings


Does MCMC converge?

Illustrations

Example (Two-mean Gaussian mixture)

4
3
2
Case of
pN (µ1 , 1) + (1 − p)N (µ1 , 1)

µ1
1
(p = 0.5)

0
−1
4 3 2 1 0 −1

µ2


Does MCMC converge?

Illustrations (2)

Example (Two-mean Gaussian mixture and outliers)
Same model, but data from 5-component mixture


Does MCMC converge?

Illustrations (3)

Example (Outlier Gaussian mixture)
Case of pN (0, 1) + (1 − p)N (µ, σ 2 ) with p known


Does MCMC converge? Chib’s solution
Postprocessing MCMC output Nested sampling

Postprocessing issues

When assessing the number k of components via the evidence

Zk = πk (θk )Lk (θk ) dθk ,
Θk

aka the marginal likelihood,





Θk

aka the marginal likelihood, label switching is a liability and a
uninteresting phenomenon





Θk

aka the marginal likelihood, label switching is a liability and a
uninteresting phenomenon
Indeed,

πk (θk )Lk (θk ) dθk , = k! πk (θk )Lk (θk ) dθk
Θk Θk /Sk

means that integrating over the restricted space is [more than] ok!



Chib’s representation

Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
Zk = mk (x) =
πk (θk |x)



Chib’s representation

Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
Zk = mk (x) =
πk (θk |x)
Use of an approximation to the posterior
∗ ∗
Zk = mk (x) = .
πk (θk |x)
ˆ ∗



Case of latent variables

For missing variable z as in mixture models, natural Rao-Blackwell
estimate
T
∗ 1 ∗ (t)
πk (θk |x) = πk (θk |x, zk ) ,
T
t=1
(t)
where the zk ’s are Gibbs sampled latent variables




estimate
T
∗ 1 ∗ (t)
T
t=1
(t)
But convergence impaired by lack of label switching




estimate
T
∗ 1 ∗ (t)
T
t=1
(t)
But convergence impaired by lack of label switching
(C.) Simple MCMC does not work



Compensation for label switching
(t)
For mixture models, zk usually fails to visit all conﬁgurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (ρ(θk )|x) = πk (ρ(θk )|x)
k!
ρ∈S

for all ρ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!



Compensation for label switching
(t)
For mixture models, zk usually fails to visit all conﬁgurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (ρ(θk )|x) = πk (ρ(θk )|x)
k!
ρ∈S

for all ρ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Recover the theoretical symmetry by using
T
∗ 1 ∗ (t)
πk (θk |x) = πk (ρ(θk )|x, zk ) .
T k!
ρ∈Sk t=1

[Berkhof, Mechelen, & Gelman, 2003]



Galaxy dataset (k)
∗
Using only the original estimate, with θk as the MAP estimator,

log(mk (x)) = −105.1396
ˆ

for k = 3 (based on 103 simulations), while introducing the
permutations leads to

log(mk (x)) = −103.3479 = −105.1396 + log(3!)
ˆ



Galaxy dataset (k)
∗
Using only the original estimate, with θk as the MAP estimator,

log(mk (x)) = −105.1396
ˆ

for k = 3 (based on 103 simulations), while introducing the
permutations leads to

log(mk (x)) = −103.3479 = −105.1396 + log(3!)
ˆ

k 2 3 4 5 6 7 8
mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44

Estimations of the marginal likelihoods by the symmetrised Chib’s
approximation (based on 105 Gibbs iterations and, for k > 5, 100
permutations selected at random in Sk ).
[Lee, Marin, Mengersen & Robert, 2008]


Comparison between evidence approximations

1 Nested sampling: M = 1000 points, with 10 random walk
moves at each step, simulations from the constr’d prior and a
stopping rule at 95% of the observed maximum likelihood
2 T = 104 MCMC (=Gibbs) simulations producing
non-parametric estimates ϕ
3 Monte Carlo estimates Z1 , Z2 , Z3 using product of two
Gaussian kernels
4 numerical integration based on 850 × 950 grid [reference
value, conﬁrmed by Chib’s]



Comparison (cont’d)

Graph based on a sample of 10 observations for µ = 2 and
σ = 3/2 (150 replicas) V1=Nested sampling, V2=importance
sampling, V3=harmonic mean, V4=bridge sampling.
[Chopin & Robert, 2010]




[Lee, Marin, Mengersen & Robert, 2010]


ICMS Discussion, March 2010

Recomendados

Recomendados

Más contenido relacionado

Más de Christian Robert

Más de Christian Robert (20)

Último

Último (20)

ICMS Discussion, March 2010