SlideShare una empresa de Scribd logo
1 de 92
Descargar para leer sin conexión
My life as a mixture 
Christian P. Robert 
Universite Paris-Dauphine, Paris  University of Warwick, Coventry 
September 17, 2014 
bayesianstatistics@gmail.com
Your next Valencia meeting: 
I Objective Bayes section of ISBA 
major meeting: 
I O-Bayes 2015 in Valencia, 
Spain, June 1-4(+1), 2015 
I in memory of our friend Susie 
Bayarri 
I objective Bayes, limited 
information, partly de
ned and 
approximate models, tc 
I all 
avours of Bayesian analysis 
welcomed! 
I Spain in June, what else...?!
Outline 
Gibbs sampling 
weakly informative priors 
imperfect sampling 
SAME algorithm 
perfect sampling 
Bayes factor 
less informative prior 
no Bayes factor
birthdate: May 1989, Ottawa Civic Hospital 
Repartition of grey levels in an unprocessed chest radiograph 
[X, 1994]
Mixture models 
Structure of mixtures of distributions: 
x  fj with probability pj , 
for j = 1, 2, . . . , k, with overall density 
p1f1(x) +    + pk fk (x) . 
Usual case: parameterised components 
Xk 
i=1 
pi f (xji ) 
Xn 
i=1 
pi = 1 
where weights pi 's are distinguished from other parameters
Mixture models 
Structure of mixtures of distributions: 
x  fj with probability pj , 
for j = 1, 2, . . . , k, with overall density 
p1f1(x) +    + pk fk (x) . 
Usual case: parameterised components 
Xk 
i=1 
pi f (xji ) 
Xn 
i=1 
pi = 1 
where weights pi 's are distinguished from other parameters
Motivations 
I Dataset made of several latent/missing/unobserved 
groups/strata/subpopulations. Mixture structure due to the 
missing origin/allocation of each observation to a speci
c 
subpopulation/stratum. Inference on either the allocations 
(clustering) or on the parameters (i , pi ) or on the number of 
groups 
I Semiparametric perspective where mixtures are functional 
basis approximations of unknown distributions
Motivations 
I Dataset made of several latent/missing/unobserved 
groups/strata/subpopulations. Mixture structure due to the 
missing origin/allocation of each observation to a speci
c 
subpopulation/stratum. Inference on either the allocations 
(clustering) or on the parameters (i , pi ) or on the number of 
groups 
I Semiparametric perspective where mixtures are functional 
basis approximations of unknown distributions
License 
Dataset derived from [my] license plate image 
Grey levels concentrated on 256 values [later jittered] 
[Marin  X, 2007]
Likelihood 
For a sample of independent random variables (x1,    , xn), 
likelihood 
Yn 
i=1 
fp1f1(xi ) +    + pk fk (xi )g . 
Expanding this product involves 
kn 
elementary terms: prohibitive to compute in large samples. 
But likelihood still computable [pointwise] in O(kn) time.
Likelihood 
For a sample of independent random variables (x1,    , xn), 
likelihood 
Yn 
i=1 
fp1f1(xi ) +    + pk fk (xi )g . 
Expanding this product involves 
kn 
elementary terms: prohibitive to compute in large samples. 
But likelihood still computable [pointwise] in O(kn) time.
Normal mean benchmark 
Normal mixture 
p N(1, 1) + (1 - p)N(2, 1) 
with only means unknown (2-D representation possible) 
Identi
ability 
Parameters 1 and 2 identi
able: 1 
cannot be confused with 2 when p is 
dierent from 0.5. 
Presence of a spurious mode, 
understood by letting p go to 0.5
Normal mean benchmark 
Normal mixture 
p N(1, 1) + (1 - p)N(2, 1) 
with only means unknown (2-D representation possible) 
Identi
ability 
Parameters 1 and 2 identi
able: 1 
cannot be confused with 2 when p is 
dierent from 0.5. 
Presence of a spurious mode, 
understood by letting p go to 0.5
Bayesian inference on mixtures 
For any prior  (, p), posterior distribution of (, p) available up 
to a multiplicative constant 
(, pjx) / 
2 
4 
Yn 
i=1 
Xk 
j=1 
pj f (xi jj ) 
3 
5  (, p) 
at a cost of order O(kn) 
B Diculty 
Despite this, derivation of posterior characteristics like posterior 
expectations only possible in an exponential time of order O(kn)!
Bayesian inference on mixtures 
For any prior  (, p), posterior distribution of (, p) available up 
to a multiplicative constant 
(, pjx) / 
2 
4 
Yn 
i=1 
Xk 
j=1 
pj f (xi jj ) 
3 
5  (, p) 
at a cost of order O(kn) 
B Diculty 
Despite this, derivation of posterior characteristics like posterior 
expectations only possible in an exponential time of order O(kn)!
Missing variable representation 
Associate to each xi a missing/latent variable zi that indicates its 
component: 
zi jp  Mk (p1, . . . , pk ) 
and 
xi jzi ,   f (jzi ) . 
Completed likelihood 
`(, pjx, z) = 
Yn 
i=1 
pzi f (xi jzi ) , 
and 
(, pjx, z) / 
 
Yn 
i=1 
pzi f (xi jzi ) 
# 
 (, p) 
where z = (z1, . . . , zn)
Missing variable representation 
Associate to each xi a missing/latent variable zi that indicates its 
component: 
zi jp  Mk (p1, . . . , pk ) 
and 
xi jzi ,   f (jzi ) . 
Completed likelihood 
`(, pjx, z) = 
Yn 
i=1 
pzi f (xi jzi ) , 
and 
(, pjx, z) / 
 
Yn 
i=1 
pzi f (xi jzi ) 
# 
 (, p) 
where z = (z1, . . . , zn)
Gibbs sampling for mixture models 
Take advantage of the missing data structure: 
Algorithm 
I Initialization: choose p(0) and (0) arbitrarily 
I Step t. For t = 1, . . . 
1. Generate z(t) 
i (i = 1, . . . , n) from (j = 1, . . . , k) 
P 
 
z(t) 
i = j jp(t-1) 
j , (t-1) 
j , xi 
 
/ p(t-1) 
j f 
 
xi j(t-1) 
j 
 
2. Generate p(t) from (pjz(t)), 
3. Generate (t) from (jz(t), x). 
[Brooks  Gelman, 1990; Diebolt  X, 1990, 1994; Escobar  West, 1991]
Normal mean example (cont'd) 
Algorithm 
I Initialization. Choose (0) 
1 and (0) 
2 , 
I Step t. For t = 1, . . . 
1. Generate z(t) 
i (i = 1, . . . , n) from 
P 
 
z(t) 
i = 1 
 
= 1-P 
 
z(t) 
i = 2 
 
/ p exp 
 
- 
1 
2 
 
xi - (t-1) 
1 
2 
 
2. Compute n(t) 
j = 
Xn 
i=1 
i =j and (sx 
Iz(t) 
j )(t) = 
Xn 
i=1 
Iz(t) 
i =jxi 
3. Generate (t) 
j (j = 1, 2) from N 
  
 + (sx 
j )(t) 
 + n(t) 
j 
, 
1 
 + n(t) 
j 
! 
.
Normal mean example (cont'd) 
−1 0 1 2 3 4 
−1 0 1 2 m1 
3 4 
m2 
(a) initialised at random 
[X  Casella, 2009]
Normal mean example (cont'd) 
−1 0 1 2 3 4 
−1 0 1 2 m1 
3 4 
m2 
(a) initialised at random 
−2 0 2 4 
−2 0 2 4 
m1 
m2 
(b) initialised close to the 
lower mode 
[X  Casella, 2009]
License 
Consider k = 3 components, a D3(1=2, 1=2, 1=2) prior for the 
weights, a N(x, ^2=3) prior on the means i and a Ga(10, ^2) prior 
on the precisions -2 
i , where x and ^2 are the empirical mean and 
variance of License 
[Empirical Bayes] 
[Marin  X, 2007]
Outline 
Gibbs sampling 
weakly informative priors 
imperfect sampling 
SAME algorithm 
perfect sampling 
Bayes factor 
less informative prior 
no Bayes factor
weakly informative priors 
I possible symmetric empirical Bayes priors 
p  D(
, . . . , 
), i  N(^, ^ !2i 
), -2 
i  Ga(, ^) 
which can be replaced with hierarchical priors 
[Diebolt  X, 1990; Richardson  Green, 1997] 
I independent improper priors on j 's prohibited, thus standard 

at and Jereys priors impossible to use (except with the 
exclude-empty-component trick) 
[Diebolt  X, 1990; Wasserman, 1999]
weakly informative priors 
I Reparameterization to compact set for use of uniform priors 
i ! ei 
1 + ei 
, i ! i 
1 + i 
[Chopin, 2000] 
I dependent weakly informative priors 
p  D(k, . . . , 1), i  N(i-1, 2i 
-1), i  U([0, i-1]) 
[Mengersen  X, 1996; X  Titterington, 1998] 
I reference priors 
p  D(1, . . . , 1), i  N(0, (2i 
+ 20 
)=2), 2i 
 C+(0, 20 
) 
[Moreno  Liseo, 1999]
Re-ban on improper priors 
Dicult to use improper priors in the setting of mixtures because 
independent improper priors, 
 () = 
Yk 
i=1 
i (i ) , with 
Z 
i (i )di = 1 
end up, for all n's, with the property 
Z 
(, pjx)ddp = 1 
Reason 
There are (k - 1)n terms among the kn terms in the expansion 
that allocate no observation at all to the i-th component
Re-ban on improper priors 
Dicult to use improper priors in the setting of mixtures because 
independent improper priors, 
 () = 
Yk 
i=1 
i (i ) , with 
Z 
i (i )di = 1 
end up, for all n's, with the property 
Z 
(, pjx)ddp = 1 
Reason 
There are (k - 1)n terms among the kn terms in the expansion 
that allocate no observation at all to the i-th component
Outline 
Gibbs sampling 
weakly informative priors 
imperfect sampling 
SAME algorithm 
perfect sampling 
Bayes factor 
less informative prior 
no Bayes factor
Connected diculties 
1. Number of modes of the likelihood of order O(k!): 

c Maximization and even [MCMC] exploration of the 
posterior surface harder 
2. Under exchangeable priors on (, p) [prior invariant under 
permutation of the indices], all posterior marginals are 
identical: 

c Posterior expectation of 1 equal to posterior expectation 
of 2
Connected diculties 
1. Number of modes of the likelihood of order O(k!): 

c Maximization and even [MCMC] exploration of the 
posterior surface harder 
2. Under exchangeable priors on (, p) [prior invariant under 
permutation of the indices], all posterior marginals are 
identical: 

c Posterior expectation of 1 equal to posterior expectation 
of 2
License 
When Gibbs output does not (re)produce exchangeability, Gibbs 
sampler has failed to explored the whole parameter space: not 
enough energy to switch simultaneously enough component 
allocations at once 
[Marin  X, 2007]
Label switching paradox 
I We should observe the exchangeability of the components 
[label switching] to conclude about convergence of the Gibbs 
sampler. 
I If we observe it, then we do not know how to estimate the 
parameters. 
I If we do not, then we are uncertain about the convergence!!! 
[Celeux, Hurn  X, 2000] 
[Fruhwirth-Schnatter, 2001, 2004] 
[Holmes, Jasra  Stephens, 2005]
Label switching paradox 
I We should observe the exchangeability of the components 
[label switching] to conclude about convergence of the Gibbs 
sampler. 
I If we observe it, then we do not know how to estimate the 
parameters. 
I If we do not, then we are uncertain about the convergence!!! 
[Celeux, Hurn  X, 2000] 
[Fruhwirth-Schnatter, 2001, 2004] 
[Holmes, Jasra  Stephens, 2005]
Label switching paradox 
I We should observe the exchangeability of the components 
[label switching] to conclude about convergence of the Gibbs 
sampler. 
I If we observe it, then we do not know how to estimate the 
parameters. 
I If we do not, then we are uncertain about the convergence!!! 
[Celeux, Hurn  X, 2000] 
[Fruhwirth-Schnatter, 2001, 2004] 
[Holmes, Jasra  Stephens, 2005]
Constraints 
Usual reply to lack of identi
ability: impose constraints like 
1 6 . . . 6 k 
in the prior 
Mostly incompatible with the topology of the posterior surface: 
posterior expectations then depend on the choice of the 
constraints. 
Computational detail 
The constraint need be imposed during the simulation but can 
instead be imposed after simulation, reordering MCMC output 
according to constraints. [This avoids possible negative eects on 
convergence]
Constraints 
Usual reply to lack of identi
ability: impose constraints like 
1 6 . . . 6 k 
in the prior 
Mostly incompatible with the topology of the posterior surface: 
posterior expectations then depend on the choice of the 
constraints. 
Computational detail 
The constraint need be imposed during the simulation but can 
instead be imposed after simulation, reordering MCMC output 
according to constraints. [This avoids possible negative eects on 
convergence]
Constraints 
Usual reply to lack of identi
ability: impose constraints like 
1 6 . . . 6 k 
in the prior 
Mostly incompatible with the topology of the posterior surface: 
posterior expectations then depend on the choice of the 
constraints. 
Computational detail 
The constraint need be imposed during the simulation but can 
instead be imposed after simulation, reordering MCMC output 
according to constraints. [This avoids possible negative eects on 
convergence]
Relabeling towards the mode 
Selection of one of the k! modal regions of the posterior, 
post-simulation, by computing the approximate MAP 
(, p)(i) with i = arg max 
i=1,...,M 
 

 
(, p)(i)jx 
 
Pivotal Reordering 
At iteration i 2 f1, . . . ,Mg, 
1. Compute the optimal permutation 
i = arg min 
2Sk 
d 
 
 

 
((i), p(i)), ((i), p(i)) 
 
where d(, ) distance in the parameter space. 
2. Set ((i), p(i)) = i (((i), p(i))). 
[Celeux, 1998; Stephens, 2000; Celeux, Hurn  X, 2000]
Relabeling towards the mode 
Selection of one of the k! modal regions of the posterior, 
post-simulation, by computing the approximate MAP 
(, p)(i) with i = arg max 
i=1,...,M 
 

 
(, p)(i)jx 
 
Pivotal Reordering 
At iteration i 2 f1, . . . ,Mg, 
1. Compute the optimal permutation 
i = arg min 
2Sk 
d 
 
 

 
((i), p(i)), ((i), p(i)) 
 
where d(, ) distance in the parameter space. 
2. Set ((i), p(i)) = i (((i), p(i))). 
[Celeux, 1998; Stephens, 2000; Celeux, Hurn  X, 2000]
Loss functions for mixture estimation 
Global loss function that considers 
distance between predictives 
L(,^ 
) = 
Z 
X 
f(x) log 
 
f(x)=f^(x) 
	 
dx 
eliminates the labelling eect 
Similar solution for estimating clusters 
through allocation variables 
L(z, ^z) = 
X 
ij 
 
I[zi=zj ](1 - I[^zi=^zj ]) + I[^zi=^zj ](1 - I[zi=zj ]) 
 
. 
[Celeux, Hurn  X, 2000]
Outline 
Gibbs sampling 
weakly informative priors 
imperfect sampling 
SAME algorithm 
perfect sampling 
Bayes factor 
less informative prior 
no Bayes factor
MAP estimation 
For high-dimensional parameter space, 
diculty with marginal MAP (MMAP) estimates 
because nuisance parameters must be integrated out 
MMAP 
1 = arg1 max p (1jy) 
where 
p (1jy) = 
Z 
2 
p (1, 2jy) d2
MAP estimation 
SAME stands for State Augmentation for Marginal Estimation 
[Doucet, Godsill  X, 2001] 
Arti
cially augmented probability model whose marginal 
distribution is 
p
 (1jy) / p (1jy)
 
via replications of the nuisance parameters: 
I Replace 2 with 
 arti
cial replications, 
2 (1) , . . . , 2 (
)
MAP estimation 
SAME stands for State Augmentation for Marginal Estimation 
[Doucet, Godsill  X, 2001] 
Arti
cially augmented probability model whose marginal 
distribution is 
p
 (1jy) / p (1jy)
 
via replications of the nuisance parameters: 
I Treat the 2 (j)'s as distinct random variables: 
q
 (1, 2 (1) , . . . , 2 (
)jy) / 
Y
 
k=1 
p (1, 2 (k)jy)
MAP estimation 
SAME stands for State Augmentation for Marginal Estimation 
[Doucet, Godsill  X, 2001] 
Arti
cially augmented probability model whose marginal 
distribution is 
p
 (1jy) / p (1jy)
 
via replications of the nuisance parameters: 
I Use corresponding marginal for 1 
q
 (1jy) = 
Z 
q
 (1, 2 (1) , . . . , 2 (
)jy) d2 (1) . . . d2 (
) 
/ 
ZY
 
k=1 
p (1, 2 (k)jy) d2 (1) . . . d2 (
) 
= p
 (1jy)
MAP estimation 
SAME stands for State Augmentation for Marginal Estimation 
[Doucet, Godsill  X, 2001] 
Arti
cially augmented probability model whose marginal 
distribution is 
p
 (1jy) / p (1jy)
 
via replications of the nuisance parameters: 
I Build a MCMC algorithm in the augmented space, with 
invariant distribution 
q
 (1, 2 (1) , . . . , 2 (
)jy)
MAP estimation 
SAME stands for State Augmentation for Marginal Estimation 
[Doucet, Godsill  X, 2001] 
Arti
cially augmented probability model whose marginal 
distribution is 
p
 (1jy) / p (1jy)
 
via replications of the nuisance parameters: 
I Use simulated subsequence 

 
(i) 
1 ; i 2 N 
 
as drawn from marginal posterior p
 (1jy)
example: Galaxy dataset benchmark 
82 observations of galaxy velocities from 3 (?) groups 
Algorithm EM MCEM SAME 
Mean log-posterior 65.47 60.73 66.22 
Std dev of 2.31 4.48 0.02 
log-posterior 
[Doucet  X, 2002]
Really the SAME?! 
SAME algorithm re-invented in many guises: 
I Gaetan  Yao, 2003, Biometrika 
I Jacquier, Johannes  Polson, 2007, J. Econometrics 
I Lele, Dennis  Lutscher, 2007, Ecology Letters [data cloning] 
I ...
Outline 
Gibbs sampling 
weakly informative priors 
imperfect sampling 
SAME algorithm 
perfect sampling 
Bayes factor 
less informative prior 
no Bayes factor
Propp and Wilson's perfect sampler 
Diculty devising MCMC stopping rules: 
when should one stop an MCMC algorithm?! 
Principle: Coupling from the past 
rather than start at t = 0 and wait till t = +1, start at t = -1 
and wait till t = 0 
[Propp  Wilson, 1996] 

c Outcome at time t = 0 is stationnary
Propp and Wilson's perfect sampler 
Diculty devising MCMC stopping rules: 
when should one stop an MCMC algorithm?! 
Principle: Coupling from the past 
rather than start at t = 0 and wait till t = +1, start at t = -1 
and wait till t = 0 
[Propp  Wilson, 1996] 

c Outcome at time t = 0 is stationnary
CFTP Algorithm 
Algorithm (Coupling from the past) 
1. Start from the m possible values at time -t 
2. Run the m chains till time 0 (coupling allowed) 
3. Check if the chains are equal at time 0 
4. If not, start further back: t   2  t, using the same random 
numbers at time already simulated 
I requires a
nite state space 
I probability of merging chains must be high enough 
I hard to implement w/o a monotonicity in both state space 
and transition
CFTP Algorithm 
Algorithm (Coupling from the past) 
1. Start from the m possible values at time -t 
2. Run the m chains till time 0 (coupling allowed) 
3. Check if the chains are equal at time 0 
4. If not, start further back: t   2  t, using the same random 
numbers at time already simulated 
I requires a
nite state space 
I probability of merging chains must be high enough 
I hard to implement w/o a monotonicity in both state space 
and transition
Mixture models 
Simplest possible mixture structure 
pf0(x) + (1 - p)f1(x), 
with uniform prior on p. 
Algorithm (Data Augmentation Gibbs sampler) 
At iteration t: 
1. Generate n iid U(0, 1) rv's u(t) 
1 , . . . , u(t) 
n . 
2. Derive the indicator variables z(t) 
i as z(t) 
i = 0 i 
u(t) 
i 6 q(t-1) 
i = 
p(t-1)f0(xi ) 
p(t-1)f0(xi ) + (1 - p(t-1))f1(xi ) 
and compute 
m(t) = 
Xn 
i=1 
z(t) 
i . 
3. Simulate p(t)  Be(n + 1 - m(t), 1 + m(t)).
Mixture models 
Algorithm (CFTP Gibbs sampler) 
At iteration -t: 
1. Generate n iid uniform rv's u(-t) 
1 , . . . , u(-t) 
n . 
2. Partition [0, 1) into intervals [q[j ], q[j+1]). 
3. For each [q(-t) 
[j ] , q(-t) 
[j+1]), generate 
p(-t) 
j  Be(n - j + 1, j + 1). 
4. For each j = 0, 1, . . . , n, r(-t) 
j   p(-t) 
j 
5. For (` = 1, `  T, ` + +) r(-t+`) 
j   p(-t+`) 
k with k such that 
r(-t+`-1) 
j 2 [q(-t+`) 
[k] , q(-t+`) 
[k+1] ] 
6. Stop if the r (0) 
j 's (0 6 j 6 n) are all equal. Otherwise, t   2  t. 
[Hobert et al., 1999]
Mixture models 
Extension to the case k = 3: 
Sample of n = 35 observations from 
.23N(2.2, 1.44) + .62N(1.4, 0.49) + .15N(0.6, 0.64) 
[Hobert et al., 1999]
Outline 
Gibbs sampling 
weakly informative priors 
imperfect sampling 
SAME algorithm 
perfect sampling 
Bayes factor 
less informative prior 
no Bayes factor
Bayesian model choice 
Comparison of models Mi by Bayesian means: 
probabilise the entire model/parameter space 
I allocate probabilities pi to all models Mi 
I de
ne priors i (i ) for each parameter space i 
I compute 
(Mi jx) = 
pi 
Z 
i 
fi (xji )i (i )di 
X 
j 
pj 
Z 
j 
fj (xjj )j (j )dj
Bayesian model choice 
Comparison of models Mi by Bayesian means: 
Relies on a central notion: the evidence 
Zk = 
Z 
k 
k (k )Lk (k ) dk , 
aka the marginal likelihood.
Chib's representation 
Direct application of Bayes' theorem: given x  fk (xjk ) and 
k  k (k ), 
Zk = mk (x) = 
fk (xjk ) k (k ) 
k (k jx) 
Replace with an approximation to the posterior 
bZ 
k = cmk (x) = 
fk (xjk 
) k (k 
) 
^ k (k 
jx) 
. 
[Chib, 1995]
Chib's representation 
Direct application of Bayes' theorem: given x  fk (xjk ) and 
k  k (k ), 
Zk = mk (x) = 
fk (xjk ) k (k ) 
k (k jx) 
Replace with an approximation to the posterior 
bZ 
k = cmk (x) = 
fk (xjk 
) k (k 
) 
^ k (k 
jx) 
. 
[Chib, 1995]
Case of latent variables 
For missing variable z as in mixture models, natural Rao-Blackwell 
estimate 
ck (k 
jx) = 
1 
T 
XT 
t=1 
k (k 
jx, z(t) 
k ) , 
where the z(t) 
k 's are Gibbs sampled latent variables 
[Diebolt  Robert, 1990; Chib, 1995]
Compensation for label switching 
For mixture models, z(t) 
k usually fails to visit all con
gurations in a 
balanced way, despite the symmetry predicted by the theory 
Consequences on numerical approximation, biased by an order k! 
Recover the theoretical symmetry by using 
fk (k 
jx) = 
1 
T k! 
X 
2Sk 
XT 
t=1 
k ((k 
)jx, z(t) 
k ) . 
for all 's in Sk , set of all permutations of f1, . . . , kg 
[Berkhof, Mechelen,  Gelman, 2003]
Compensation for label switching 
For mixture models, z(t) 
k usually fails to visit all con
gurations in a 
balanced way, despite the symmetry predicted by the theory 
Consequences on numerical approximation, biased by an order k! 
Recover the theoretical symmetry by using 
fk (k 
jx) = 
1 
T k! 
X 
2Sk 
XT 
t=1 
k ((k 
)jx, z(t) 
k ) . 
for all 's in Sk , set of all permutations of f1, . . . , kg 
[Berkhof, Mechelen,  Gelman, 2003]
Galaxy dataset (k) 
Using Chib's estimate, with k 
as MAP estimator, 
log(^Z 
k (x)) = -105.1396 
for k = 3, while introducing permutations leads to 
log(^Z 
k (x)) = -103.3479 
Note that 
-105.1396 + log(3!) = -103.3479 
k 2 3 4 5 6 7 8 
Zk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 
Estimations of the marginal likelihoods by the symmetrised Chib's 
approximation (based on 105 Gibbs iterations and, for k  5, 100 permutations 
selected at random in Sk ). 
[Lee et al., 2008]
Galaxy dataset (k) 
Using Chib's estimate, with k 
as MAP estimator, 
log(^Z 
k (x)) = -105.1396 
for k = 3, while introducing permutations leads to 
log(^Z 
k (x)) = -103.3479 
Note that 
-105.1396 + log(3!) = -103.3479 
k 2 3 4 5 6 7 8 
Zk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 
Estimations of the marginal likelihoods by the symmetrised Chib's 
approximation (based on 105 Gibbs iterations and, for k  5, 100 permutations 
selected at random in Sk ). 
[Lee et al., 2008]
Galaxy dataset (k) 
Using Chib's estimate, with k 
as MAP estimator, 
log(^Z 
k (x)) = -105.1396 
for k = 3, while introducing permutations leads to 
log(^Z 
k (x)) = -103.3479 
Note that 
-105.1396 + log(3!) = -103.3479 
k 2 3 4 5 6 7 8 
Zk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 
Estimations of the marginal likelihoods by the symmetrised Chib's 
approximation (based on 105 Gibbs iterations and, for k  5, 100 permutations 
selected at random in Sk ). 
[Lee et al., 2008]
More ecient sampling 
Diculty with the explosive numbers of terms in 
fk (k 
jx) = 
1 
T k! 
X 
2Sk 
XT 
t=1 
k ((k 
)jx, z(t) 
k ) . 
when most terms are equal to zero... 
Iterative bridge sampling: 
bE 
(t)(k) = bE 
(t-1)(k)M-1 
1 
XM1 
l=1 
^(~l jx) 
M1q(~l ) + M2 ^(~l jx) 
. 
M-1 
2 
XM2 
m=1 
q(^m) 
M1q(^m) + M2 ^(^mjx) 
[Fruhwirth-Schnatter, 2004]
More ecient sampling 
Iterative bridge sampling: 
bE 
(t)(k) = bE 
(t-1)(k)M-1 
1 
XM1 
l=1 
^(~l jx) 
M1q(~l ) + M2 ^(~l jx) 
. 
M-1 
2 
XM2 
m=1 
q(^m) 
M1q(^m) + M2 ^(^mjx) 
[Fruhwirth-Schnatter, 2004] 
where 
q() = 
1 
J1 
XJ1 
j=1 
p(jz(j)) 
Yk 
i=1 
p(i j(j) 
ij , (j-1) 
ij , z(j), x)
More ecient sampling 
Iterative bridge sampling: 
bE 
(t)(k) = bE 
(t-1)(k)M-1 
1 
XM1 
l=1 
^(~l jx) 
M1q(~l ) + M2 ^(~l jx) 
. 
M-1 
2 
XM2 
m=1 
q(^m) 
M1q(^m) + M2 ^(^mjx) 
[Fruhwirth-Schnatter, 2004] 
or where 
q() = 
1 
k! 
X 
2S(k) 
p(j(zo)) 
Yk 
i=1 
p(i j(oi 
j ), (oi 
j ), (zo), x)

Más contenido relacionado

La actualidad más candente

k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Katsuya Ito
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdfHyungjoo Cho
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problemBigMC
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Conditional neural processes
Conditional neural processesConditional neural processes
Conditional neural processesKazuki Fujikawa
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
 

La actualidad más candente (20)

k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problem
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Ch7
Ch7Ch7
Ch7
 
masters-presentation
masters-presentationmasters-presentation
masters-presentation
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Conditional neural processes
Conditional neural processesConditional neural processes
Conditional neural processes
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli... Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 

Destacado

"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...Christian Robert
 
testing as a mixture estimation problem
testing as a mixture estimation problemtesting as a mixture estimation problem
testing as a mixture estimation problemChristian Robert
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityChristian Robert
 
folding Markov chains: the origaMCMC
folding Markov chains: the origaMCMCfolding Markov chains: the origaMCMC
folding Markov chains: the origaMCMCChristian Robert
 
Simulation (AMSI Public Lecture)
Simulation (AMSI Public Lecture)Simulation (AMSI Public Lecture)
Simulation (AMSI Public Lecture)Christian Robert
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmologyChristian Robert
 
Bayesian Data Analysis for Ecology
Bayesian Data Analysis for EcologyBayesian Data Analysis for Ecology
Bayesian Data Analysis for EcologyChristian Robert
 
Semi-automatic ABC: a discussion
Semi-automatic ABC: a discussionSemi-automatic ABC: a discussion
Semi-automatic ABC: a discussionChristian Robert
 
Introducing Monte Carlo Methods with R
Introducing Monte Carlo Methods with RIntroducing Monte Carlo Methods with R
Introducing Monte Carlo Methods with RChristian Robert
 
Testing as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorTesting as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorChristian Robert
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 
Computational methods for Bayesian model choice
Computational methods for Bayesian model choiceComputational methods for Bayesian model choice
Computational methods for Bayesian model choiceChristian Robert
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsChristian Robert
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math ColloquiumChristian Robert
 

Destacado (20)

Boston talk
Boston talkBoston talk
Boston talk
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...
 
testing as a mixture estimation problem
testing as a mixture estimation problemtesting as a mixture estimation problem
testing as a mixture estimation problem
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
 
folding Markov chains: the origaMCMC
folding Markov chains: the origaMCMCfolding Markov chains: the origaMCMC
folding Markov chains: the origaMCMC
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
Simulation (AMSI Public Lecture)
Simulation (AMSI Public Lecture)Simulation (AMSI Public Lecture)
Simulation (AMSI Public Lecture)
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
 
Shanghai tutorial
Shanghai tutorialShanghai tutorial
Shanghai tutorial
 
Bayesian Data Analysis for Ecology
Bayesian Data Analysis for EcologyBayesian Data Analysis for Ecology
Bayesian Data Analysis for Ecology
 
Semi-automatic ABC: a discussion
Semi-automatic ABC: a discussionSemi-automatic ABC: a discussion
Semi-automatic ABC: a discussion
 
Savage-Dickey paradox
Savage-Dickey paradoxSavage-Dickey paradox
Savage-Dickey paradox
 
Introducing Monte Carlo Methods with R
Introducing Monte Carlo Methods with RIntroducing Monte Carlo Methods with R
Introducing Monte Carlo Methods with R
 
Testing as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorTesting as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factor
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
Computational methods for Bayesian model choice
Computational methods for Bayesian model choiceComputational methods for Bayesian model choice
Computational methods for Bayesian model choice
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: Models
 
Ab cancun
Ab cancunAb cancun
Ab cancun
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
 

Similar a BAYSM'14, Wien, Austria

Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesPierre Jacob
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt msFaeco Bot
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012Zheng Mengdi
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...Alexander Decker
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMINGChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filternozomuhamada
 

Similar a BAYSM'14, Wien, Austria (20)

Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
Vancouver18
Vancouver18Vancouver18
Vancouver18
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
Unit II PPT.pptx
Unit II PPT.pptxUnit II PPT.pptx
Unit II PPT.pptx
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
 
Thesis defense
Thesis defenseThesis defense
Thesis defense
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter
 

Más de Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsChristian Robert
 

Más de Christian Robert (19)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 

Último

ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 

Último (20)

Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 

BAYSM'14, Wien, Austria

  • 1. My life as a mixture Christian P. Robert Universite Paris-Dauphine, Paris University of Warwick, Coventry September 17, 2014 bayesianstatistics@gmail.com
  • 2. Your next Valencia meeting: I Objective Bayes section of ISBA major meeting: I O-Bayes 2015 in Valencia, Spain, June 1-4(+1), 2015 I in memory of our friend Susie Bayarri I objective Bayes, limited information, partly de
  • 3. ned and approximate models, tc I all avours of Bayesian analysis welcomed! I Spain in June, what else...?!
  • 4. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor
  • 5. birthdate: May 1989, Ottawa Civic Hospital Repartition of grey levels in an unprocessed chest radiograph [X, 1994]
  • 6. Mixture models Structure of mixtures of distributions: x fj with probability pj , for j = 1, 2, . . . , k, with overall density p1f1(x) + + pk fk (x) . Usual case: parameterised components Xk i=1 pi f (xji ) Xn i=1 pi = 1 where weights pi 's are distinguished from other parameters
  • 7. Mixture models Structure of mixtures of distributions: x fj with probability pj , for j = 1, 2, . . . , k, with overall density p1f1(x) + + pk fk (x) . Usual case: parameterised components Xk i=1 pi f (xji ) Xn i=1 pi = 1 where weights pi 's are distinguished from other parameters
  • 8. Motivations I Dataset made of several latent/missing/unobserved groups/strata/subpopulations. Mixture structure due to the missing origin/allocation of each observation to a speci
  • 9. c subpopulation/stratum. Inference on either the allocations (clustering) or on the parameters (i , pi ) or on the number of groups I Semiparametric perspective where mixtures are functional basis approximations of unknown distributions
  • 10. Motivations I Dataset made of several latent/missing/unobserved groups/strata/subpopulations. Mixture structure due to the missing origin/allocation of each observation to a speci
  • 11. c subpopulation/stratum. Inference on either the allocations (clustering) or on the parameters (i , pi ) or on the number of groups I Semiparametric perspective where mixtures are functional basis approximations of unknown distributions
  • 12. License Dataset derived from [my] license plate image Grey levels concentrated on 256 values [later jittered] [Marin X, 2007]
  • 13. Likelihood For a sample of independent random variables (x1, , xn), likelihood Yn i=1 fp1f1(xi ) + + pk fk (xi )g . Expanding this product involves kn elementary terms: prohibitive to compute in large samples. But likelihood still computable [pointwise] in O(kn) time.
  • 14. Likelihood For a sample of independent random variables (x1, , xn), likelihood Yn i=1 fp1f1(xi ) + + pk fk (xi )g . Expanding this product involves kn elementary terms: prohibitive to compute in large samples. But likelihood still computable [pointwise] in O(kn) time.
  • 15. Normal mean benchmark Normal mixture p N(1, 1) + (1 - p)N(2, 1) with only means unknown (2-D representation possible) Identi
  • 16. ability Parameters 1 and 2 identi
  • 17. able: 1 cannot be confused with 2 when p is dierent from 0.5. Presence of a spurious mode, understood by letting p go to 0.5
  • 18. Normal mean benchmark Normal mixture p N(1, 1) + (1 - p)N(2, 1) with only means unknown (2-D representation possible) Identi
  • 19. ability Parameters 1 and 2 identi
  • 20. able: 1 cannot be confused with 2 when p is dierent from 0.5. Presence of a spurious mode, understood by letting p go to 0.5
  • 21. Bayesian inference on mixtures For any prior (, p), posterior distribution of (, p) available up to a multiplicative constant (, pjx) / 2 4 Yn i=1 Xk j=1 pj f (xi jj ) 3 5 (, p) at a cost of order O(kn) B Diculty Despite this, derivation of posterior characteristics like posterior expectations only possible in an exponential time of order O(kn)!
  • 22. Bayesian inference on mixtures For any prior (, p), posterior distribution of (, p) available up to a multiplicative constant (, pjx) / 2 4 Yn i=1 Xk j=1 pj f (xi jj ) 3 5 (, p) at a cost of order O(kn) B Diculty Despite this, derivation of posterior characteristics like posterior expectations only possible in an exponential time of order O(kn)!
  • 23. Missing variable representation Associate to each xi a missing/latent variable zi that indicates its component: zi jp Mk (p1, . . . , pk ) and xi jzi , f (jzi ) . Completed likelihood `(, pjx, z) = Yn i=1 pzi f (xi jzi ) , and (, pjx, z) / Yn i=1 pzi f (xi jzi ) # (, p) where z = (z1, . . . , zn)
  • 24. Missing variable representation Associate to each xi a missing/latent variable zi that indicates its component: zi jp Mk (p1, . . . , pk ) and xi jzi , f (jzi ) . Completed likelihood `(, pjx, z) = Yn i=1 pzi f (xi jzi ) , and (, pjx, z) / Yn i=1 pzi f (xi jzi ) # (, p) where z = (z1, . . . , zn)
  • 25. Gibbs sampling for mixture models Take advantage of the missing data structure: Algorithm I Initialization: choose p(0) and (0) arbitrarily I Step t. For t = 1, . . . 1. Generate z(t) i (i = 1, . . . , n) from (j = 1, . . . , k) P z(t) i = j jp(t-1) j , (t-1) j , xi / p(t-1) j f xi j(t-1) j 2. Generate p(t) from (pjz(t)), 3. Generate (t) from (jz(t), x). [Brooks Gelman, 1990; Diebolt X, 1990, 1994; Escobar West, 1991]
  • 26. Normal mean example (cont'd) Algorithm I Initialization. Choose (0) 1 and (0) 2 , I Step t. For t = 1, . . . 1. Generate z(t) i (i = 1, . . . , n) from P z(t) i = 1 = 1-P z(t) i = 2 / p exp - 1 2 xi - (t-1) 1 2 2. Compute n(t) j = Xn i=1 i =j and (sx Iz(t) j )(t) = Xn i=1 Iz(t) i =jxi 3. Generate (t) j (j = 1, 2) from N + (sx j )(t) + n(t) j , 1 + n(t) j ! .
  • 27. Normal mean example (cont'd) −1 0 1 2 3 4 −1 0 1 2 m1 3 4 m2 (a) initialised at random [X Casella, 2009]
  • 28. Normal mean example (cont'd) −1 0 1 2 3 4 −1 0 1 2 m1 3 4 m2 (a) initialised at random −2 0 2 4 −2 0 2 4 m1 m2 (b) initialised close to the lower mode [X Casella, 2009]
  • 29. License Consider k = 3 components, a D3(1=2, 1=2, 1=2) prior for the weights, a N(x, ^2=3) prior on the means i and a Ga(10, ^2) prior on the precisions -2 i , where x and ^2 are the empirical mean and variance of License [Empirical Bayes] [Marin X, 2007]
  • 30. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor
  • 31. weakly informative priors I possible symmetric empirical Bayes priors p D( , . . . , ), i N(^, ^ !2i ), -2 i Ga(, ^) which can be replaced with hierarchical priors [Diebolt X, 1990; Richardson Green, 1997] I independent improper priors on j 's prohibited, thus standard at and Jereys priors impossible to use (except with the exclude-empty-component trick) [Diebolt X, 1990; Wasserman, 1999]
  • 32. weakly informative priors I Reparameterization to compact set for use of uniform priors i ! ei 1 + ei , i ! i 1 + i [Chopin, 2000] I dependent weakly informative priors p D(k, . . . , 1), i N(i-1, 2i -1), i U([0, i-1]) [Mengersen X, 1996; X Titterington, 1998] I reference priors p D(1, . . . , 1), i N(0, (2i + 20 )=2), 2i C+(0, 20 ) [Moreno Liseo, 1999]
  • 33. Re-ban on improper priors Dicult to use improper priors in the setting of mixtures because independent improper priors, () = Yk i=1 i (i ) , with Z i (i )di = 1 end up, for all n's, with the property Z (, pjx)ddp = 1 Reason There are (k - 1)n terms among the kn terms in the expansion that allocate no observation at all to the i-th component
  • 34. Re-ban on improper priors Dicult to use improper priors in the setting of mixtures because independent improper priors, () = Yk i=1 i (i ) , with Z i (i )di = 1 end up, for all n's, with the property Z (, pjx)ddp = 1 Reason There are (k - 1)n terms among the kn terms in the expansion that allocate no observation at all to the i-th component
  • 35. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor
  • 36. Connected diculties 1. Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2. Under exchangeable priors on (, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of 1 equal to posterior expectation of 2
  • 37. Connected diculties 1. Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2. Under exchangeable priors on (, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of 1 equal to posterior expectation of 2
  • 38. License When Gibbs output does not (re)produce exchangeability, Gibbs sampler has failed to explored the whole parameter space: not enough energy to switch simultaneously enough component allocations at once [Marin X, 2007]
  • 39. Label switching paradox I We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. I If we observe it, then we do not know how to estimate the parameters. I If we do not, then we are uncertain about the convergence!!! [Celeux, Hurn X, 2000] [Fruhwirth-Schnatter, 2001, 2004] [Holmes, Jasra Stephens, 2005]
  • 40. Label switching paradox I We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. I If we observe it, then we do not know how to estimate the parameters. I If we do not, then we are uncertain about the convergence!!! [Celeux, Hurn X, 2000] [Fruhwirth-Schnatter, 2001, 2004] [Holmes, Jasra Stephens, 2005]
  • 41. Label switching paradox I We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. I If we observe it, then we do not know how to estimate the parameters. I If we do not, then we are uncertain about the convergence!!! [Celeux, Hurn X, 2000] [Fruhwirth-Schnatter, 2001, 2004] [Holmes, Jasra Stephens, 2005]
  • 42. Constraints Usual reply to lack of identi
  • 43. ability: impose constraints like 1 6 . . . 6 k in the prior Mostly incompatible with the topology of the posterior surface: posterior expectations then depend on the choice of the constraints. Computational detail The constraint need be imposed during the simulation but can instead be imposed after simulation, reordering MCMC output according to constraints. [This avoids possible negative eects on convergence]
  • 44. Constraints Usual reply to lack of identi
  • 45. ability: impose constraints like 1 6 . . . 6 k in the prior Mostly incompatible with the topology of the posterior surface: posterior expectations then depend on the choice of the constraints. Computational detail The constraint need be imposed during the simulation but can instead be imposed after simulation, reordering MCMC output according to constraints. [This avoids possible negative eects on convergence]
  • 46. Constraints Usual reply to lack of identi
  • 47. ability: impose constraints like 1 6 . . . 6 k in the prior Mostly incompatible with the topology of the posterior surface: posterior expectations then depend on the choice of the constraints. Computational detail The constraint need be imposed during the simulation but can instead be imposed after simulation, reordering MCMC output according to constraints. [This avoids possible negative eects on convergence]
  • 48. Relabeling towards the mode Selection of one of the k! modal regions of the posterior, post-simulation, by computing the approximate MAP (, p)(i) with i = arg max i=1,...,M (, p)(i)jx Pivotal Reordering At iteration i 2 f1, . . . ,Mg, 1. Compute the optimal permutation i = arg min 2Sk d ((i), p(i)), ((i), p(i)) where d(, ) distance in the parameter space. 2. Set ((i), p(i)) = i (((i), p(i))). [Celeux, 1998; Stephens, 2000; Celeux, Hurn X, 2000]
  • 49. Relabeling towards the mode Selection of one of the k! modal regions of the posterior, post-simulation, by computing the approximate MAP (, p)(i) with i = arg max i=1,...,M (, p)(i)jx Pivotal Reordering At iteration i 2 f1, . . . ,Mg, 1. Compute the optimal permutation i = arg min 2Sk d ((i), p(i)), ((i), p(i)) where d(, ) distance in the parameter space. 2. Set ((i), p(i)) = i (((i), p(i))). [Celeux, 1998; Stephens, 2000; Celeux, Hurn X, 2000]
  • 50. Loss functions for mixture estimation Global loss function that considers distance between predictives L(,^ ) = Z X f(x) log f(x)=f^(x) dx eliminates the labelling eect Similar solution for estimating clusters through allocation variables L(z, ^z) = X ij I[zi=zj ](1 - I[^zi=^zj ]) + I[^zi=^zj ](1 - I[zi=zj ]) . [Celeux, Hurn X, 2000]
  • 51. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor
  • 52. MAP estimation For high-dimensional parameter space, diculty with marginal MAP (MMAP) estimates because nuisance parameters must be integrated out MMAP 1 = arg1 max p (1jy) where p (1jy) = Z 2 p (1, 2jy) d2
  • 53. MAP estimation SAME stands for State Augmentation for Marginal Estimation [Doucet, Godsill X, 2001] Arti
  • 54. cially augmented probability model whose marginal distribution is p (1jy) / p (1jy) via replications of the nuisance parameters: I Replace 2 with arti
  • 55. cial replications, 2 (1) , . . . , 2 ( )
  • 56. MAP estimation SAME stands for State Augmentation for Marginal Estimation [Doucet, Godsill X, 2001] Arti
  • 57. cially augmented probability model whose marginal distribution is p (1jy) / p (1jy) via replications of the nuisance parameters: I Treat the 2 (j)'s as distinct random variables: q (1, 2 (1) , . . . , 2 ( )jy) / Y k=1 p (1, 2 (k)jy)
  • 58. MAP estimation SAME stands for State Augmentation for Marginal Estimation [Doucet, Godsill X, 2001] Arti
  • 59. cially augmented probability model whose marginal distribution is p (1jy) / p (1jy) via replications of the nuisance parameters: I Use corresponding marginal for 1 q (1jy) = Z q (1, 2 (1) , . . . , 2 ( )jy) d2 (1) . . . d2 ( ) / ZY k=1 p (1, 2 (k)jy) d2 (1) . . . d2 ( ) = p (1jy)
  • 60. MAP estimation SAME stands for State Augmentation for Marginal Estimation [Doucet, Godsill X, 2001] Arti
  • 61. cially augmented probability model whose marginal distribution is p (1jy) / p (1jy) via replications of the nuisance parameters: I Build a MCMC algorithm in the augmented space, with invariant distribution q (1, 2 (1) , . . . , 2 ( )jy)
  • 62. MAP estimation SAME stands for State Augmentation for Marginal Estimation [Doucet, Godsill X, 2001] Arti
  • 63. cially augmented probability model whose marginal distribution is p (1jy) / p (1jy) via replications of the nuisance parameters: I Use simulated subsequence (i) 1 ; i 2 N as drawn from marginal posterior p (1jy)
  • 64. example: Galaxy dataset benchmark 82 observations of galaxy velocities from 3 (?) groups Algorithm EM MCEM SAME Mean log-posterior 65.47 60.73 66.22 Std dev of 2.31 4.48 0.02 log-posterior [Doucet X, 2002]
  • 65. Really the SAME?! SAME algorithm re-invented in many guises: I Gaetan Yao, 2003, Biometrika I Jacquier, Johannes Polson, 2007, J. Econometrics I Lele, Dennis Lutscher, 2007, Ecology Letters [data cloning] I ...
  • 66. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor
  • 67. Propp and Wilson's perfect sampler Diculty devising MCMC stopping rules: when should one stop an MCMC algorithm?! Principle: Coupling from the past rather than start at t = 0 and wait till t = +1, start at t = -1 and wait till t = 0 [Propp Wilson, 1996] c Outcome at time t = 0 is stationnary
  • 68. Propp and Wilson's perfect sampler Diculty devising MCMC stopping rules: when should one stop an MCMC algorithm?! Principle: Coupling from the past rather than start at t = 0 and wait till t = +1, start at t = -1 and wait till t = 0 [Propp Wilson, 1996] c Outcome at time t = 0 is stationnary
  • 69. CFTP Algorithm Algorithm (Coupling from the past) 1. Start from the m possible values at time -t 2. Run the m chains till time 0 (coupling allowed) 3. Check if the chains are equal at time 0 4. If not, start further back: t 2 t, using the same random numbers at time already simulated I requires a
  • 70. nite state space I probability of merging chains must be high enough I hard to implement w/o a monotonicity in both state space and transition
  • 71. CFTP Algorithm Algorithm (Coupling from the past) 1. Start from the m possible values at time -t 2. Run the m chains till time 0 (coupling allowed) 3. Check if the chains are equal at time 0 4. If not, start further back: t 2 t, using the same random numbers at time already simulated I requires a
  • 72. nite state space I probability of merging chains must be high enough I hard to implement w/o a monotonicity in both state space and transition
  • 73. Mixture models Simplest possible mixture structure pf0(x) + (1 - p)f1(x), with uniform prior on p. Algorithm (Data Augmentation Gibbs sampler) At iteration t: 1. Generate n iid U(0, 1) rv's u(t) 1 , . . . , u(t) n . 2. Derive the indicator variables z(t) i as z(t) i = 0 i u(t) i 6 q(t-1) i = p(t-1)f0(xi ) p(t-1)f0(xi ) + (1 - p(t-1))f1(xi ) and compute m(t) = Xn i=1 z(t) i . 3. Simulate p(t) Be(n + 1 - m(t), 1 + m(t)).
  • 74. Mixture models Algorithm (CFTP Gibbs sampler) At iteration -t: 1. Generate n iid uniform rv's u(-t) 1 , . . . , u(-t) n . 2. Partition [0, 1) into intervals [q[j ], q[j+1]). 3. For each [q(-t) [j ] , q(-t) [j+1]), generate p(-t) j Be(n - j + 1, j + 1). 4. For each j = 0, 1, . . . , n, r(-t) j p(-t) j 5. For (` = 1, ` T, ` + +) r(-t+`) j p(-t+`) k with k such that r(-t+`-1) j 2 [q(-t+`) [k] , q(-t+`) [k+1] ] 6. Stop if the r (0) j 's (0 6 j 6 n) are all equal. Otherwise, t 2 t. [Hobert et al., 1999]
  • 75. Mixture models Extension to the case k = 3: Sample of n = 35 observations from .23N(2.2, 1.44) + .62N(1.4, 0.49) + .15N(0.6, 0.64) [Hobert et al., 1999]
  • 76. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor
  • 77. Bayesian model choice Comparison of models Mi by Bayesian means: probabilise the entire model/parameter space I allocate probabilities pi to all models Mi I de
  • 78. ne priors i (i ) for each parameter space i I compute (Mi jx) = pi Z i fi (xji )i (i )di X j pj Z j fj (xjj )j (j )dj
  • 79. Bayesian model choice Comparison of models Mi by Bayesian means: Relies on a central notion: the evidence Zk = Z k k (k )Lk (k ) dk , aka the marginal likelihood.
  • 80. Chib's representation Direct application of Bayes' theorem: given x fk (xjk ) and k k (k ), Zk = mk (x) = fk (xjk ) k (k ) k (k jx) Replace with an approximation to the posterior bZ k = cmk (x) = fk (xjk ) k (k ) ^ k (k jx) . [Chib, 1995]
  • 81. Chib's representation Direct application of Bayes' theorem: given x fk (xjk ) and k k (k ), Zk = mk (x) = fk (xjk ) k (k ) k (k jx) Replace with an approximation to the posterior bZ k = cmk (x) = fk (xjk ) k (k ) ^ k (k jx) . [Chib, 1995]
  • 82. Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate ck (k jx) = 1 T XT t=1 k (k jx, z(t) k ) , where the z(t) k 's are Gibbs sampled latent variables [Diebolt Robert, 1990; Chib, 1995]
  • 83. Compensation for label switching For mixture models, z(t) k usually fails to visit all con
  • 84. gurations in a balanced way, despite the symmetry predicted by the theory Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using fk (k jx) = 1 T k! X 2Sk XT t=1 k ((k )jx, z(t) k ) . for all 's in Sk , set of all permutations of f1, . . . , kg [Berkhof, Mechelen, Gelman, 2003]
  • 85. Compensation for label switching For mixture models, z(t) k usually fails to visit all con
  • 86. gurations in a balanced way, despite the symmetry predicted by the theory Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using fk (k jx) = 1 T k! X 2Sk XT t=1 k ((k )jx, z(t) k ) . for all 's in Sk , set of all permutations of f1, . . . , kg [Berkhof, Mechelen, Gelman, 2003]
  • 87. Galaxy dataset (k) Using Chib's estimate, with k as MAP estimator, log(^Z k (x)) = -105.1396 for k = 3, while introducing permutations leads to log(^Z k (x)) = -103.3479 Note that -105.1396 + log(3!) = -103.3479 k 2 3 4 5 6 7 8 Zk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib's approximation (based on 105 Gibbs iterations and, for k 5, 100 permutations selected at random in Sk ). [Lee et al., 2008]
  • 88. Galaxy dataset (k) Using Chib's estimate, with k as MAP estimator, log(^Z k (x)) = -105.1396 for k = 3, while introducing permutations leads to log(^Z k (x)) = -103.3479 Note that -105.1396 + log(3!) = -103.3479 k 2 3 4 5 6 7 8 Zk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib's approximation (based on 105 Gibbs iterations and, for k 5, 100 permutations selected at random in Sk ). [Lee et al., 2008]
  • 89. Galaxy dataset (k) Using Chib's estimate, with k as MAP estimator, log(^Z k (x)) = -105.1396 for k = 3, while introducing permutations leads to log(^Z k (x)) = -103.3479 Note that -105.1396 + log(3!) = -103.3479 k 2 3 4 5 6 7 8 Zk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib's approximation (based on 105 Gibbs iterations and, for k 5, 100 permutations selected at random in Sk ). [Lee et al., 2008]
  • 90. More ecient sampling Diculty with the explosive numbers of terms in fk (k jx) = 1 T k! X 2Sk XT t=1 k ((k )jx, z(t) k ) . when most terms are equal to zero... Iterative bridge sampling: bE (t)(k) = bE (t-1)(k)M-1 1 XM1 l=1 ^(~l jx) M1q(~l ) + M2 ^(~l jx) . M-1 2 XM2 m=1 q(^m) M1q(^m) + M2 ^(^mjx) [Fruhwirth-Schnatter, 2004]
  • 91. More ecient sampling Iterative bridge sampling: bE (t)(k) = bE (t-1)(k)M-1 1 XM1 l=1 ^(~l jx) M1q(~l ) + M2 ^(~l jx) . M-1 2 XM2 m=1 q(^m) M1q(^m) + M2 ^(^mjx) [Fruhwirth-Schnatter, 2004] where q() = 1 J1 XJ1 j=1 p(jz(j)) Yk i=1 p(i j(j) ij , (j-1) ij , z(j), x)
  • 92. More ecient sampling Iterative bridge sampling: bE (t)(k) = bE (t-1)(k)M-1 1 XM1 l=1 ^(~l jx) M1q(~l ) + M2 ^(~l jx) . M-1 2 XM2 m=1 q(^m) M1q(^m) + M2 ^(^mjx) [Fruhwirth-Schnatter, 2004] or where q() = 1 k! X 2S(k) p(j(zo)) Yk i=1 p(i j(oi j ), (oi j ), (zo), x)
  • 93. Further eciency After de-switching (un-switching?), representation of importance function as q() = 1 Jk! XJ j=1 X 2Sk (j('(j)), x) = 1 k! X 2Sk h() where h associated with particular mode of q Assuming generations ((1), . . . , (T)) hc () how many of the h((t)) are non-zero?
  • 94. Sparsity for the sum Contribution of each term relative to q() () = h() k!q() = P hi () 2Sk h() and importance of permutation evaluated by b Ehc [i ()] = 1 M XM l=1 i ((l)) , (l) hc () Approximate set A(k) S(k) consist of [1, , n] for the smallest n that satis
  • 95. es the condition ^ n = 1 M XM l=1
  • 96.
  • 97.
  • 99.
  • 100.
  • 101. dual importance sampling with approximation DIS2A 1 Randomly select fz(j), (j)gJj =1 from Gibbs sample and un-switch Construct q() 2 Choose hc () and generate particles f(t)gTt =1 hc () 3 Construction of approximation ~q() using
  • 102. rst M-sample 3.1 Compute bE hc [1 ()], ,bE hc [k! ()] 3.2 Reorder the 's such that bE hc [1 ()] bE hc [k! ()]. 3.3 Initially set n = 1 and compute ~qn((t))'s and b n. If b n=1 , go to Step 4. Otherwise increase n = n + 1 4 Replace q((1)), . . . , q((T)) with ~q((1)), . . . , ~q((T)) to estimate bE [Lee X, 2014]
  • 103. illustrations k k! jA(k)j (A) 3 6 1.0000 0.1675 4 24 2.7333 0.1148 Fishery data k k! jA(k)j (A) 3 6 1.000 0.1675 4 24 15.7000 0.6545 6 720 298.1200 0.4146 Galaxy data Table : Mean estimates of approximate set sizes, jA(k)j, and the reduction rate of a number of evaluated h-terms (A) for (a)
  • 104. shery and (b) galaxy datasets
  • 105. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor
  • 106. Jereys priors for mixtures [teaser] True Jereys prior for mixtures of distributions de
  • 107. ned as
  • 108.
  • 109. E rTrlog f (Xj)
  • 110.
  • 111. I O(k) matrix I unavailable in closed form except special cases I unidimensional integrals approximated by Monte Carlo tools [Grazian [talk tomorrow] et al., 2014+]
  • 112. Diculties I complexity grows in O(k2) I signi
  • 113. cant computing requirement (reduced by delayed acceptance) [Banterle et al., 2014] I dier from component-wise Jereys [Diebolt X, 1990; Stoneking, 2014] I when is the posterior proper? I how to check properness via MCMC outputs?
  • 114. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor
  • 115. Diculties with Bayes factors I delicate calibration towards supporting a given hypothesis or model I long-lasting impact of prior modelling, despite overall consistency I discontinuity in the use of improper priors in most settings I binary outcome more suited for immediate decision than for model evaluation I related impossibility to ascertain mis
  • 116. t or outliers I missing assesment of uncertainty associated with the decision I dicult computation of marginal likelihoods in most settings
  • 117. Reformulation I Representation of the test problem as a two-component mixture estimation problem where the weights are formally equal to 0 or 1 I Mixture model thus contains both models under comparison as extreme cases I Inspired by consistency result of Rousseau and Mengersen (2011) on over
  • 118. tting mixtures I Use of posteror distribution of the weight of a model instead of single-digit posterior probability [Kamari [see poster] et al., 2014+]
  • 119. Construction of Bayes tests Given two statistical models, M1 : x f1(xj1) , 1 2 1 and M2 : x f2(xj2 , 2 2 2 , ) embed both models within an encompassing mixture model M : x f1(xj1) + (1 - )f2(xj2) , 0 6 6 1 . (1) Both models as special cases of the mixture model, one for = 1 and the other for = 0 c Test as inference on
  • 120. Arguments I substituting estimate of the weight for posterior probability of model M1 produces an equally convergent indicator of which model is true while removing the need of often arti
  • 121. cial prior probabilities on model indices I interpretation at least as natural as for the posterior probability, while avoiding the zero-one loss setting I highly problematic computation of marginal likelihoods bypassed by standard algorithms for mixture estimation I straightforward extension to collection of models allows to consider all models at once I posterior on evaluates thoroughly strength of support for a given model, compared with single digit Bayes factor I mixture model acknowledges possibility that both models [or none] could be acceptable
  • 122. Arguments I standard prior modelling can be reproduced here but improper priors now acceptable, when both models reparameterised towards common-meaning parameters, e.g. location and scale I using same parameters on both components is essential: opposition between components is not an issue with dierent parameter values I parameters of components, 1 and 2, integrated out by Monte Carlo I contrary to common testing settings, data signal lack of agreement with either model when posterior on away from both 0 and 1 I in most settings, approach easily calibrated by parametric boostrap providing posterior of under each model and prior predictive error
  • 123. Toy examples (1) Test of a Poisson P() versus a geometric Geo(p) [as a number of failures, starting at zero] Same parameter used in Poisson P() and geometric Geo(p) with p = 1=1+ Improper noninformative prior () = 1= is valid Posterior on conditional to allocation vector ( j x, ) / exp(-n1())( Pni =1 xi-1)( + 1)-(n2+s2()) and Be(n1 + a0, n2 + a0)
  • 124. Toy examples (1) .1 .2 .3 .4 .5 1 0.00 0.02 0.04 0.06 0.08 lBF= −509684 Posterior of Poisson weight when a0 = .1, .2, .3, .4, .5, 1 and sample of geometric G(0.5) 105 observations
  • 125. Toy examples (2) Normal N(, 1) model versus double-exponential L(, p p 2) [Scale 2 is intentionaly chosen to make both distributions share the same variance] Location parameter can be shared by both models with a single at prior (). Beta distributions B(a0, a0) are compared wrt their hyperparameter a0
  • 126. Toy examples (2) Posterior of double-exponential weight for L(0, p 2) data, with 5, . . . , 103 observations and 105 Gibbs iterations
  • 127. Toy examples (2) Posterior of Normal weight for N(0, .72) data with 103 observations and 104 Gibbs iterations
  • 128. Toy examples (2) Posterior of Normal weight for N(0, 1) data with 103 observations and 104 Gibbs iterations
  • 129. Toy examples (2) Posterior of normal weight for double-exponential data, with 103 observations and 104 Gibbs iterations
  • 130. Danke schon! Enjoy BAYSM 2014!