1. Likelihood-free Methods
Likelihood-free Methods
Christian P. Robert
Universit´ Paris-Dauphine & CREST
e
http://www.ceremade.dauphine.fr/~xian
February 16, 2012
3. Likelihood-free Methods
Introduction
Approximate Bayesian computation
Introduction
Monte Carlo basics
Population Monte Carlo
AMIS
The Metropolis-Hastings Algorithm
simulation-based methods in Econometrics
Genetics of ABC
Approximate Bayesian computation
4. Likelihood-free Methods
Introduction
Monte Carlo basics
General purpose
Given a density π known up to a normalizing constant, and an
integrable function h, compute
h(x)˜ (x)µ(dx)
π
Π(h) = h(x)π(x)µ(dx) =
π (x)µ(dx)
˜
when h(x)˜ (x)µ(dx) is intractable.
π
5. Likelihood-free Methods
Introduction
Monte Carlo basics
Monte Carlo basics
Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by
N
ΠMC (h) = N −1
ˆ
N h(xi ).
i=1
ˆ as
LLN: ΠMC (h) −→ Π(h)
N
If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞,
√ L
CLT: ˆ
N ΠMC (h) − Π(h)
N N 0, Π [h − Π(h)]2 .
6. Likelihood-free Methods
Introduction
Monte Carlo basics
Monte Carlo basics
Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by
N
ΠMC (h) = N −1
ˆ
N h(xi ).
i=1
ˆ as
LLN: ΠMC (h) −→ Π(h)
N
If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞,
√ L
CLT: ˆ
N ΠMC (h) − Π(h)
N N 0, Π [h − Π(h)]2 .
Caveat
Often impossible or inefficient to simulate directly from Π
7. Likelihood-free Methods
Introduction
Monte Carlo basics
Importance Sampling
For Q proposal distribution such that Q(dx) = q(x)µ(dx),
alternative representation
Π(h) = h(x){π/q}(x)q(x)µ(dx).
8. Likelihood-free Methods
Introduction
Monte Carlo basics
Importance Sampling
For Q proposal distribution such that Q(dx) = q(x)µ(dx),
alternative representation
Π(h) = h(x){π/q}(x)q(x)µ(dx).
Principle
Generate an iid sample x1 , . . . , xN ∼ Q and estimate Π(h) by
N
ΠIS (h) = N −1
ˆ
Q,N h(xi ){π/q}(xi ).
i=1
9. Likelihood-free Methods
Introduction
Monte Carlo basics
Importance Sampling (convergence)
Then
ˆ as
LLN: ΠIS (h) −→ Π(h)
Q,N and if Q((hπ/q)2 ) < ∞,
√ L
CLT: ˆ
N(ΠIS (h) − Π(h))
Q,N N 0, Q{(hπ/q − Π(h))2 } .
10. Likelihood-free Methods
Introduction
Monte Carlo basics
Importance Sampling (convergence)
Then
ˆ as
LLN: ΠIS (h) −→ Π(h)
Q,N and if Q((hπ/q)2 ) < ∞,
√ L
CLT: ˆ
N(ΠIS (h) − Π(h))
Q,N N 0, Q{(hπ/q − Π(h))2 } .
Caveat
ˆ
If normalizing constant unknown, impossible to use ΠIS
Q,N
Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x|θ)π(θ).
11. Likelihood-free Methods
Introduction
Monte Carlo basics
Self-Normalised Importance Sampling
Self normalized version
N −1 N
ˆ
ΠSNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ).
Q,N
i=1 i=1
12. Likelihood-free Methods
Introduction
Monte Carlo basics
Self-Normalised Importance Sampling
Self normalized version
N −1 N
ˆ
ΠSNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ).
Q,N
i=1 i=1
ˆ as
LLN : ΠSNIS (h) −→ Π(h)
Q,N
and if Π((1 + h2 )(π/q)) < ∞,
√ L
CLT : ˆ
N(ΠSNIS (h) − Π(h))
Q,N N 0, π {(π/q)(h − Π(h)}2 ) .
13. Likelihood-free Methods
Introduction
Monte Carlo basics
Self-Normalised Importance Sampling
Self normalized version
N −1 N
ˆ
ΠSNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ).
Q,N
i=1 i=1
ˆ as
LLN : ΠSNIS (h) −→ Π(h)
Q,N
and if Π((1 + h2 )(π/q)) < ∞,
√ L
CLT : ˆ
N(ΠSNIS (h) − Π(h))
Q,N N 0, π {(π/q)(h − Π(h)}2 ) .
c The quality of the SNIS approximation depends on the
choice of Q
14. Likelihood-free Methods
Introduction
Monte Carlo basics
Iterated importance sampling
Introduction of an algorithmic temporal dimension :
(t) (t−1)
xi ∼ qt (x|xi ) i = 1, . . . , n, t = 1, . . .
and
n
ˆ 1 (t) (t)
It = i h(xi )
n
i=1
is still unbiased for
(t)
(t) πt (xi )
i = (t) (t−1)
, i = 1, . . . , n
qt (xi |xi )
15. Likelihood-free Methods
Introduction
Population Monte Carlo
PMC: Population Monte Carlo Algorithm
At time t = 0
iid
Generate (xi,0 )1≤i≤N ∼ Q0
Set ωi,0 = {π/q0 }(xi,0 )
iid
Generate (Ji,0 )1≤i≤N ∼ M(1, (¯ i,0 )1≤i≤N )
ω
Set xi,0 = xJi ,0
˜
16. Likelihood-free Methods
Introduction
Population Monte Carlo
PMC: Population Monte Carlo Algorithm
At time t = 0
iid
Generate (xi,0 )1≤i≤N ∼ Q0
Set ωi,0 = {π/q0 }(xi,0 )
iid
Generate (Ji,0 )1≤i≤N ∼ M(1, (¯ i,0 )1≤i≤N )
ω
Set xi,0 = xJi ,0
˜
At time t (t = 1, . . . , T ),
ind
Generate xi,t ∼ Qi,t (˜i,t−1 , ·)
x
Set ωi,t = {π(xi,t )/qi,t (˜i,t−1 , xi,t )}
x
iid
Generate (Ji,t )1≤i≤N ∼ M(1, (¯ i,t )1≤i≤N )
ω
Set xi,t = xJi,t ,t .
˜
[Capp´, Douc, Guillin, Marin, & CPR, 2009, Stat.& Comput.]
e
17. Likelihood-free Methods
Introduction
Population Monte Carlo
Notes on PMC
After T iterations of PMC, PMC estimator of Π(h) given by
T N
¯ 1
ΠPMC (h) =
N,T
¯
ωi,t h(xi,t ).
T
t=1 i=1
18. Likelihood-free Methods
Introduction
Population Monte Carlo
Notes on PMC
After T iterations of PMC, PMC estimator of Π(h) given by
T N
¯ 1
ΠPMC (h) =
N,T
¯
ωi,t h(xi,t ).
T
t=1 i=1
1. ωi,t means normalising over whole sequence of simulations
¯
2. Qi,t ’s chosen arbitrarily under support constraint
3. Qi,t ’s may depend on whole sequence of simulations
4. natural solution for ABC
19. Likelihood-free Methods
Introduction
Population Monte Carlo
Improving quality
The efficiency of the SNIS approximation depends on the choice of
Q, ranging from optimal
q(x) ∝ |h(x) − Π(h)|π(x)
to useless
ˆ
var ΠSNIS (h) = +∞
Q,N
20. Likelihood-free Methods
Introduction
Population Monte Carlo
Improving quality
The efficiency of the SNIS approximation depends on the choice of
Q, ranging from optimal
q(x) ∝ |h(x) − Π(h)|π(x)
to useless
ˆ
var ΠSNIS (h) = +∞
Q,N
Example (PMC=adaptive importance sampling)
Population Monte Carlo is producing a sequence of proposals Qt
aiming at improving efficiency
ˆ ˆ
Kull(π, qt ) ≤ Kull(π, qt−1 ) or var ΠSNIS (h) ≤ var ΠSNIS ,∞ (h)
Qt ,∞ Qt−1
[Capp´, Douc, Guillin, Marin, Robert, 04, 07a, 07b, 08]
e
21. Likelihood-free Methods
Introduction
AMIS
Multiple Importance Sampling
Reycling: given several proposals Q1 , . . . , QT , for 1 ≤ t ≤ T
generate an iid sample
t t
x1 , . . . , xN ∼ Qt
and estimate Π(h) by
T N
ΠMIS (h) = T −1
ˆ
Q,N N −1
h(xit )ωit
t=1 i=1
where
π(xit )
ωit = correct...
qt (xit )
22. Likelihood-free Methods
Introduction
AMIS
Multiple Importance Sampling
Reycling: given several proposals Q1 , . . . , QT , for 1 ≤ t ≤ T
generate an iid sample
t t
x1 , . . . , xN ∼ Qt
and estimate Π(h) by
T N
ΠMIS (h) = T −1
ˆ
Q,N N −1
h(xit )ωit
t=1 i=1
where
π(xit )
ωit = T
still correct!
T −1 =1 q (xit )
23. Likelihood-free Methods
Introduction
AMIS
Mixture representation
Deterministic mixture correction of the weights proposed by Owen
and Zhou (JASA, 2000)
The corresponding estimator is still unbiased [if not
self-normalised]
All particles are on the same weighting scale rather than their
own
Large variance proposals Qt do not take over
Variance reduction thanks to weight stabilization & recycling
[K.o.] removes the randomness in the component choice
[=Rao-Blackwellisation]
24. Likelihood-free Methods
Introduction
AMIS
Global adaptation
Global Adaptation
At iteration t = 1, · · · , T ,
ˆ
1. For 1 ≤ i ≤ N1 , generate xit ∼ T3 (ˆt−1 , Σt−1 )
µ
2. Calculate the mixture importance weight of particle xit
ωit = π xit δit
where
t−1
δit = ˆ ˆ
qT (3) xit ; µl , Σl
l=0
25. Likelihood-free Methods
Introduction
AMIS
Backward reweighting
3. If t ≥ 2, actualize the weights of all past particles, xil
1≤l ≤t −1
ωil = π xit δil
where
ˆ
δil = δil + qT (3) xil ; µt−1 , Σt−1
ˆ
ˆ
4. Compute IS estimates of target mean and variance µt and Σt ,
ˆ
where
t N1 t N1
µt
ˆj = ωil (xj )li ωil . . .
l=1 i=1 l=1 i=1
26. Likelihood-free Methods
Introduction
AMIS
A toy example
10
0
−10
−20
−30
−40 −20 0 20 40
Banana shape benchmark: marginal distribution of (x1 , x2 ) for the parameters
2
σ1 = 100 and b = 0.03. Contours represent 60% (red), 90% (black) and 99.9%
(blue) confidence regions in the marginal space.
27. Likelihood-free Methods
Introduction
AMIS
A toy example
p=5 p=10 p=20
q
1e+05
20000 40000 60000 80000
60000
8e+04
6e+04
20000
q
4e+04
q
q q q
0
Banana shape example: boxplots of 10 replicate ESSs for the AMIS scheme
(left) and the NOT-AMIS scheme (right) for p = 5, 10, 20.
28. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis-Hastings Algorithm
Introduction
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
The Metropolis–Hastings algorithm
The random walk Metropolis-Hastings algorithm
simulation-based methods in Econometrics
Genetics of ABC
Approximate Bayesian computation
29. Likelihood-free Methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains
Epiphany! It is not necessary to use a sample from the distribution
f to approximate the integral
I= h(x)f (x)dx ,
30. Likelihood-free Methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains
Epiphany! It is not necessary to use a sample from the distribution
f to approximate the integral
I= h(x)f (x)dx ,
Principle: Obtain X1 , . . . , Xn ∼ f (approx) without directly
simulating from f , using an ergodic Markov chain with stationary
distribution f
[Metropolis et al., 1953]
31. Likelihood-free Methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains (2)
Idea
For an arbitrary starting value x (0) , an ergodic chain (X (t) ) is
generated using a transition kernel with stationary distribution f
32. Likelihood-free Methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains (2)
Idea
For an arbitrary starting value x (0) , an ergodic chain (X (t) ) is
generated using a transition kernel with stationary distribution f
Insures the convergence in distribution of (X (t) ) to a random
variable from f .
For a “large enough” T0 , X (T0 ) can be considered as
distributed from f
Produce a dependent sample X (T0 ) , X (T0 +1) , . . ., which is
generated from f , sufficient for most approximation purposes.
33. Likelihood-free Methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains (2)
Idea
For an arbitrary starting value x (0) , an ergodic chain (X (t) ) is
generated using a transition kernel with stationary distribution f
Insures the convergence in distribution of (X (t) ) to a random
variable from f .
For a “large enough” T0 , X (T0 ) can be considered as
distributed from f
Produce a dependent sample X (T0 ) , X (T0 +1) , . . ., which is
generated from f , sufficient for most approximation purposes.
Problem: How can one build a Markov chain with a given
stationary distribution?
34. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
The Metropolis–Hastings algorithm
Basics
The algorithm uses the objective (target) density
f
and a conditional density
q(y |x)
called the instrumental (or proposal) distribution
35. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
The MH algorithm
Algorithm (Metropolis–Hastings)
Given x (t) ,
1. Generate Yt ∼ q(y |x (t) ).
2. Take
Yt with prob. ρ(x (t) , Yt ),
X (t+1) =
x (t) with prob. 1 − ρ(x (t) , Yt ),
where
f (y ) q(x|y )
ρ(x, y ) = min ,1 .
f (x) q(y |x)
36. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Features
Independent of normalizing constants for both f and q(·|x)
(ie, those constants independent of x)
Never move to values with f (y ) = 0
The chain (x (t) )t may take the same value several times in a
row, even though f is a density wrt Lebesgue measure
The sequence (yt )t is usually not a Markov chain
37. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Convergence properties
The M-H Markov chain is reversible, with invariant/stationary
density f since it satisfies the detailed balance condition
f (y ) K (y , x) = f (x) K (x, y )
38. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Convergence properties
The M-H Markov chain is reversible, with invariant/stationary
density f since it satisfies the detailed balance condition
f (y ) K (y , x) = f (x) K (x, y )
If
q(y |x) > 0 for every (x, y ),
the chain is Harris recurrent and
T
1
lim h(X (t) ) = h(x)df (x) a.e. f .
T →∞ T
t=1
lim K n (x, ·)µ(dx) − f =0
n→∞
TV
for every initial distribution µ
39. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Random walk Metropolis–Hastings
Use of a local perturbation as proposal
Yt = X (t) + εt ,
where εt ∼ g , independent of X (t) .
The instrumental density is now of the form g (y − x) and the
Markov chain is a random walk if we take g to be symmetric
g (x) = g (−x)
40. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Algorithm (Random walk Metropolis)
Given x (t)
1. Generate Yt ∼ g (y − x (t) )
2. Take
f (Yt )
Y with prob. min 1, ,
(t+1) t
X = f (x (t) )
(t)
x otherwise.
41. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
RW-MH on mixture posterior distribution
3
2
µ2
1
0
X
−1
−1 0 1 2 3
µ1
Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1)
42. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Acceptance rate
A high acceptance rate is not indication of efficiency since the
random walk may be moving “too slowly” on the target surface
43. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Acceptance rate
A high acceptance rate is not indication of efficiency since the
random walk may be moving “too slowly” on the target surface
If x (t) and yt are “too close”, i.e. f (x (t) ) f (yt ), yt is accepted
with probability
f (yt )
min ,1 1.
f (x (t) )
and acceptance rate high
44. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Acceptance rate
A high acceptance rate is not indication of efficiency since the
random walk may be moving “too slowly” on the target surface
If average acceptance rate low, the proposed values f (yt ) tend to
be small wrt f (x (t) ), i.e. the random walk [not the algorithm!]
moves quickly on the target surface often reaching its boundaries
45. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Rule of thumb
In small dimensions, aim at an average acceptance rate of 50%. In
large dimensions, at an average acceptance rate of 25%.
[Gelman,Gilks and Roberts, 1995]
46. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Noisy AR(1)
Target distribution of x given x1 , x2 and y is
−1 τ2
exp (x − ϕx1 )2 + (x2 − ϕx)2 + (y − x 2 )2 .
2τ 2 σ2
For a Gaussian random walk with scale ω small enough, the
random walk never jumps to the other mode. But if the scale ω is
sufficiently large, the Markov chain explores both modes and give a
satisfactory approximation of the target distribution.
47. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Noisy AR(2)
Markov chain based on a random walk with scale ω = .1.
48. Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Noisy AR(3)
Markov chain based on a random walk with scale ω = .5.
49. Likelihood-free Methods
simulation-based methods in Econometrics
Meanwhile, in a Gaulish village...
Similar exploration of simulation-based techniques in Econometrics
Simulated method of moments
Method of simulated moments
Simulated pseudo-maximum-likelihood
Indirect inference
[Gouri´roux & Monfort, 1996]
e
50. Likelihood-free Methods
simulation-based methods in Econometrics
Simulated method of moments
o
Given observations y1:n from a model
yt = r (y1:(t−1) , t , θ) , t ∼ g (·)
simulate 1:n , derive
yt (θ) = r (y1:(t−1) , t , θ)
and estimate θ by
n
arg min (yto − yt (θ))2
θ
t=1
51. Likelihood-free Methods
simulation-based methods in Econometrics
Simulated method of moments
o
Given observations y1:n from a model
yt = r (y1:(t−1) , t , θ) , t ∼ g (·)
simulate 1:n , derive
yt (θ) = r (y1:(t−1) , t , θ)
and estimate θ by
n n 2
arg min yto − yt (θ)
θ
t=1 t=1
52. Likelihood-free Methods
simulation-based methods in Econometrics
Method of simulated moments
Given a statistic vector K (y ) with
Eθ [K (Yt )|y1:(t−1) ] = k(y1:(t−1) ; θ)
find an unbiased estimator of k(y1:(t−1) ; θ),
˜
k( t , y1:(t−1) ; θ)
Estimate θ by
n S
arg min K (yt ) − ˜ t
k( s , y1:(t−1) ; θ)/S
θ
t=1 s=1
[Pakes & Pollard, 1989]
53. Likelihood-free Methods
simulation-based methods in Econometrics
Indirect inference
ˆ
Minimise (in θ) the distance between estimators β based on
pseudo-models for genuine observations and for observations
simulated under the true model and the parameter θ.
[Gouri´roux, Monfort, & Renault, 1993;
e
Smith, 1993; Gallant & Tauchen, 1996]
54. Likelihood-free Methods
simulation-based methods in Econometrics
Indirect inference (PML vs. PSE)
Example of the pseudo-maximum-likelihood (PML)
ˆ
β(y) = arg max log f (yt |β, y1:(t−1) )
β
t
leading to
ˆ ˆ
arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
θ
when
ys (θ) ∼ f (y|θ) s = 1, . . . , S
55. Likelihood-free Methods
simulation-based methods in Econometrics
Indirect inference (PML vs. PSE)
Example of the pseudo-score-estimator (PSE)
2
ˆ ∂ log f
β(y) = arg min (yt |β, y1:(t−1) )
β
t
∂β
leading to
ˆ ˆ
arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
θ
when
ys (θ) ∼ f (y|θ) s = 1, . . . , S
56. Likelihood-free Methods
simulation-based methods in Econometrics
Consistent indirect inference
...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...
57. Likelihood-free Methods
simulation-based methods in Econometrics
Consistent indirect inference
...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...
Consistency depending on the criterion and on the asymptotic
identifiability of θ
[Gouri´roux, Monfort, 1996, p. 66]
e
58. Likelihood-free Methods
simulation-based methods in Econometrics
AR(2) vs. MA(1) example
true (AR) model
yt = t −θ t−1
and [wrong!] auxiliary (MA) model
yt = β1 yt−1 + β2 yt−2 + ut
R code
x=eps=rnorm(250)
x[2:250]=x[2:250]-0.5*x[1:249]
simeps=rnorm(250)
propeta=seq(-.99,.99,le=199)
dist=rep(0,199)
bethat=as.vector(arima(x,c(2,0,0),incl=FALSE)$coef)
for (t in 1:199)
dist[t]=sum((as.vector(arima(c(simeps[1],simeps[2:250]-propeta[t]*
simeps[1:249]),c(2,0,0),incl=FALSE)$coef)-bethat)^2)
59. Likelihood-free Methods
simulation-based methods in Econometrics
AR(2) vs. MA(1) example
One sample:
0.8
0.6
distance
0.4
0.2
0.0
−1.0 −0.5 0.0 0.5 1.0
60. Likelihood-free Methods
simulation-based methods in Econometrics
AR(2) vs. MA(1) example
Many samples:
6
5
4
3
2
1
0
0.2 0.4 0.6 0.8 1.0
61. Likelihood-free Methods
simulation-based methods in Econometrics
Choice of pseudo-model
Pick model such that
ˆ
1. β(θ) not flat
(i.e. sensitive to changes in θ)
ˆ
2. β(θ) not dispersed (i.e. robust agains changes in ys (θ))
[Frigessi & Heggland, 2004]
62. Likelihood-free Methods
simulation-based methods in Econometrics
ABC using indirect inference
We present a novel approach for developing summary statistics
for use in approximate Bayesian computation (ABC) algorithms by
using indirect inference(...) In the indirect inference approach to
ABC the parameters of an auxiliary model fitted to the data become
the summary statistics. Although applicable to any ABC technique,
we embed this approach within a sequential Monte Carlo algorithm
that is completely adaptive and requires very little tuning(...)
[Drovandi, Pettitt & Faddy, 2011]
c Indirect inference provides summary statistics for ABC...
63. Likelihood-free Methods
simulation-based methods in Econometrics
ABC using indirect inference
...the above result shows that, in the limit as h → 0, ABC will
be more accurate than an indirect inference method whose auxiliary
statistics are the same as the summary statistic that is used for
ABC(...) Initial analysis showed that which method is more
accurate depends on the true value of θ.
[Fearnhead and Prangle, 2012]
c Indirect inference provides estimates rather than global inference...
64. Likelihood-free Methods
Genetics of ABC
Genetic background of ABC
ABC is a recent computational technique that only requires being
able to sample from the likelihood f (·|θ)
This technique stemmed from population genetics models, about
15 years ago, and population geneticists still contribute
significantly to methodological developments of ABC.
[Griffith & al., 1997; Tavar´ & al., 1999]
e
65. Likelihood-free Methods
Genetics of ABC
Population genetics
[Part derived from the teaching material of Raphael Leblois, ENS Lyon, November 2010]
Describe the genotypes, estimate the alleles frequencies,
determine their distribution among individuals, populations
and between populations;
Predict and understand the evolution of gene frequencies in
populations as a result of various factors.
c Analyses the effect of various evolutive forces (mutation, drift,
migration, selection) on the evolution of gene frequencies in time
and space.
66. Likelihood-free Methods
Genetics of ABC
A[B]ctors
Les mécanismes de l’évolution
Towards an explanation of the mechanisms of evolutionary
changes, mathematical models ofl’évolution ont to be developed
•! Les mathématiques de evolution started
in the years 1920-1930.
commencé à être développés dans les
années 1920-1930
Ronald A Fisher (1890-1962) Sewall Wright (1889-1988) John BS Haldane (1892-1964)
67. Likelihood-free Methods
Genetics of ABC
Wright-Fisher model de
Le modèle Wright-Fisher
•! En l’absence de mutation et de
sélection,A population of constant
les fréquences
alléliquessize, in which individuals
dérivent (augmentent
et diminuent) inévitablement
reproduce at the same time.
jusqu’à la fixation d’un allèle
Each gene in a generation is
a copy of a gene of the
•! La dérivepreviousdonc à la
conduit generation.
perte de variation génétique à
l’intérieur In the absence of mutation
des populations
and selection, allele
frequencies derive inevitably
until the fixation of an
allele.
68. Likelihood-free Methods
Genetics of ABC
Coalescent theory
[Kingman, 1982; Tajima, Tavar´, &tc]
e
!"#$%&'(('")**+$,-'".'"/010234%'".'5"*$*%()23$15"6"
Coalescence theory interested in the genealogy of a sample of
!!7**+$,-'",()5534%'" common ancestor of the sample.
"
genes back in time to the " "!"7**+$,-'"8",$)('5,'1,'"9"
69. Likelihood-free Methods
Genetics of ABC
Common ancestor
Les différentes
lignées fusionnent
(coalescent) au fur
et à mesure que
l’on remonte vers le
Time of coalescence
passé
(T)
Modélisation du processus de dérive génétique
The different lineages mergeen “remontant dans lein the past.
when we go back temps”
jusqu’à l’ancêtre commun d’un échantillon de gènes 6
70. Likelihood-free Methods
Genetics of ABC
Sous l’hypothèse de neutralité des marqueurs génétiques étudiés,
les mutations sont indépendantes de la généalogie
Neutral généalogie ne dépend que des processus démographiques
i.e. la mutations
On construit donc la généalogie selon les paramètres
démographiques (ex. N),
puis on ajoute aassumption les
Under the posteriori of
mutations sur les différentes
neutrality, the mutations
branches,independentau feuilles de
are du MRCA of the
l’arbregenealogy.
On obtient ainsi des données de
We construct the genealogy
polymorphisme sous les modèles
according to the
démographiques et mutationnels
demographic parameters,
considérés
then we add a posteriori the
mutations. 20
71. Likelihood-free Methods
Genetics of ABC
Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem: most of the time, we can not calculate the likelihood of
the polymorphism data f (y|θ).
72. Likelihood-free Methods
Genetics of ABC
Untractable likelihood
Missing (too missing!) data structure:
f (y|θ) = f (y|G , θ)f (G |θ)dG
G
The genealogies are considered as nuisance parameters.
This problematic thus differs from the phylogenetic approach
where the tree is the parameter of interesst.
73. Likelihood-free Methods
Genetics of ABC
A genuine !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
example of application
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
94
Pygmies populations: do they have a common origin? Is there a
lot of exchanges between pygmies and non-pygmies populations?
74. Likelihood-free Methods
Genetics of ABC
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Alternative scenarios
Différents scénarios possibles, choix de scenario par ABC
Verdu et al. 2009
96
75. 1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+
Likelihood-free Methods
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03
Genetics of ABC
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.
Différentsresults
Simulation scénarios possibles, choix de scenari
Différents scénarios possibles, choix de scenario par ABC
c Scenario 1A is chosen. largement soutenu par rapport aux
Le scenario 1a est
autres ! plaide pour une origine commune des
Le scenario 1a est largement soutenu par
populations pygmées d’Afrique de l’Ouest rap
Verdu e
76. Likelihood-free Methods
Genetics of ABC
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
Most likely scenario
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Scénario
évolutif :
on « raconte »
une histoire à
partir de ces
inférences
Verdu et al. 2009
99
78. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Untractable likelihoods
Cases when the likelihood function f (y|θ) is unavailable and when
the completion step
f (y|θ) = f (y, z|θ) dz
Z
is impossible or too costly because of the dimension of z
c MCMC cannot be implemented!
80. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Illustrations
Example
Stochastic volatility model: for Highest weight trajectories
t = 1, . . . , T ,
0.4
0.2
yt = exp(zt ) t , zt = a+bzt−1 +σηt ,
0.0
−0.2
T very large makes it difficult to
−0.4
include z within the simulated 0 200 400
t
600 800 1000
parameters
81. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Illustrations
Example
Potts model: if y takes values on a grid Y of size k n and
f (y|θ) ∝ exp θ Iyl =yi
l∼i
where l∼i denotes a neighbourhood relation, n moderately large
prohibits the computation of the normalising constant
82. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Illustrations
Example
Inference on CMB: in cosmology, study of the Cosmic Microwave
Background via likelihoods immensely slow to computate (e.g
WMAP, Plank), because of numerically costly spectral transforms
[Data is a Fortran program]
[Kilbinger et al., 2010, MNRAS]
83. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Illustrations
Example
Phylogenetic tree: in population
genetics, reconstitution of a common
ancestor from a sample of genes via
a phylogenetic tree that is close to
impossible to integrate out
[100 processor days with 4
parameters]
[Cornuet et al., 2009, Bioinformatics]
84. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
85. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
86. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
ABC algorithm
For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y.
[Tavar´ et al., 1997]
e
87. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Why does it work?!
The proof is trivial:
f (θi ) ∝ π(θi )f (z|θi )Iy (z)
z∈D
∝ π(θi )f (y|θi )
= π(θi |y) .
[Accept–Reject 101]
88. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Earlier occurrence
‘Bayesian statistics and Monte Carlo methods are ideally
suited to the task of passing many models over one
dataset’
[Don Rubin, Annals of Statistics, 1984]
Note Rubin (1984) does not promote this algorithm for
likelihood-free simulation but frequentist intuition on posterior
distributions: parameters from posteriors are more likely to be
those that could have generated the data.
89. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
A as A...pproximative
When y is a continuous random variable, equality z = y is replaced
with a tolerance condition,
(y, z) ≤
where is a distance
90. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
A as A...pproximative
When y is a continuous random variable, equality z = y is replaced
with a tolerance condition,
(y, z) ≤
where is a distance
Output distributed from
π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )
[Pritchard et al., 1999]
91. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
ABC algorithm
Algorithm 1 Likelihood-free rejection sampler 2
for i = 1 to N do
repeat
generate θ from the prior distribution π(·)
generate z from the likelihood f (·|θ )
until ρ{η(z), η(y)} ≤
set θi = θ
end for
where η(y) defines a (not necessarily sufficient) statistic
92. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
93. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
94. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (first attempt)
What happens when → 0?
95. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ > 0, there exists 0 such that < 0 implies
π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)µ(B )
∈
A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ)µ(B )
96. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ > 0, there exists 0 such that < 0 implies
π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)
XX )
µ(BX
∈
A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ)
µ(B )
XXX
97. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ 0, there exists 0 such that 0 implies
π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)
XX )
µ(BX
∈
A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ) X
µ(B )
X
X
[Proof extends to other continuous-in-0 kernels K ]
98. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (second attempt)
What happens when → 0?
99. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (second attempt)
What happens when → 0?
For B ⊂ Θ, we have
A f (z|θ)dz f (z|θ)π(θ)dθ
,y B
π(θ)dθ = dz
B A ,y ×Θ
π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
B f (z|θ)π(θ)dθ m(z)
= dz
A ,y
m(z) A ,y ×Θ π(θ)f (z|θ)dzdθ
m(z)
= π(B|z) dz
A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
which indicates convergence for a continuous π(B|z).
100. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
101. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
102. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g -prior modelling:
β ∼ N3 (0, n XT X)−1
103. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g -prior modelling:
β ∼ N3 (0, n XT X)−1
Use of importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
104. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Pima Indian benchmark
80
100
1.0
80
60
0.8
60
0.6
Density
Density
Density
40
40
0.4
20
20
0.2
0.0
0
0
−0.005 0.010 0.020 0.030 −0.05 −0.03 −0.01 −1.0 0.0 1.0 2.0
Figure: Comparison between density estimates of the marginals on β1
(left), β2 (center) and β3 (right) from ABC rejection samples (red) and
MCMC samples (black)
.
105. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
MA example
Back to the MA(q) model
q
xt = t + ϑi t−i
i=1
Simple prior: uniform over the inverse [real and complex] roots in
q
Q(u) = 1 − ϑi u i
i=1
under the identifiability conditions
106. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
MA example
Back to the MA(q) model
q
xt = t + ϑi t−i
i=1
Simple prior: uniform prior over the identifiability zone, e.g.
triangle for MA(2)
107. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1 , ϑ2 ) in the triangle
2. generating an iid sequence ( t )−qt≤T
3. producing a simulated series (xt )1≤t≤T
108. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1 , ϑ2 ) in the triangle
2. generating an iid sequence ( t )−qt≤T
3. producing a simulated series (xt )1≤t≤T
Distance: basic distance between the series
T
ρ((xt )1≤t≤T , (xt )1≤t≤T ) = (xt − xt )2
t=1
or distance between summary statistics like the q autocorrelations
T
τj = xt xt−j
t=j+1
109. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Comparison of distance impact
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
110. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Comparison of distance impact
4
1.5
3
1.0
2
0.5
1
0.0
0
0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5
θ1 θ2
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
111. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Comparison of distance impact
4
1.5
3
1.0
2
0.5
1
0.0
0
0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5
θ1 θ2
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
112. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Homonomy
The ABC algorithm is not to be confused with the ABC algorithm
The Artificial Bee Colony algorithm is a swarm based meta-heuristic
algorithm that was introduced by Karaboga in 2005 for optimizing
numerical problems. It was inspired by the intelligent foraging
behavior of honey bees. The algorithm is specifically based on the
model proposed by Tereshko and Loengarov (2005) for the foraging
behaviour of honey bee colonies. The model consists of three
essential components: employed and unemployed foraging bees, and
food sources. The first two components, employed and unemployed
foraging bees, search for rich food sources (...) close to their hive.
The model also defines two leading modes of behaviour (...):
recruitment of foragers to rich food sources resulting in positive
feedback and abandonment of poor sources by foragers causing
negative feedback.
[Karaboga, Scholarpedia]
113. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
ABC advances
Simulating from the prior is often poor in efficiency
114. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
115. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
116. Likelihood-free Methods
Approximate Bayesian computation
ABC basics
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
117. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP
Better usage of [prior] simulations by
adjustement: instead of throwing away
θ such that ρ(η(z), η(y)) , replace
θ’s with locally regressed transforms
(use with BIC)
θ∗ = θ − {η(z) − η(y)}T β
ˆ [Csill´ry et al., TEE, 2010]
e
ˆ
where β is obtained by [NP] weighted least square regression on
(η(z) − η(y)) with weights
Kδ {ρ(η(z), η(y))}
[Beaumont et al., 2002, Genetics]
118. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) :
weight directly simulation by
Kδ {ρ(η(z(θ)), η(y))}
or
S
1
Kδ {ρ(η(zs (θ)), η(y))}
S
s=1
[consistent estimate of f (η|θ)]
119. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) :
weight directly simulation by
Kδ {ρ(η(z(θ)), η(y))}
or
S
1
Kδ {ρ(η(zs (θ)), η(y))}
S
s=1
[consistent estimate of f (η|θ)]
Curse of dimensionality: poor estimate when d = dim(η) is large...
120. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (density estimation)
Use of the kernel weights
Kδ {ρ(η(z(θ)), η(y))}
leads to the NP estimate of the posterior expectation
i θi Kδ {ρ(η(z(θi )), η(y))}
i Kδ {ρ(η(z(θi )), η(y))}
[Blum, JASA, 2010]
121. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (density estimation)
Use of the kernel weights
Kδ {ρ(η(z(θ)), η(y))}
leads to the NP estimate of the posterior conditional density
˜
Kb (θi − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
[Blum, JASA, 2010]
122. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (density estimations)
Other versions incorporating regression adjustments
˜
Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
123. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (density estimations)
Other versions incorporating regression adjustments
˜
Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d )
g
c
var(ˆ (θ|y)) =
g (1 + oP (1))
nbδ d
[Blum, JASA, 2010]
124. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (density estimations)
Other versions incorporating regression adjustments
˜
Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d )
g
c
var(ˆ (θ|y)) =
g (1 + oP (1))
nbδ d
[standard NP calculations]
128. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NCH (2)
Why neural network?
fights curse of dimensionality
selects relevant summary statistics
provides automated dimension reduction
offers a model choice capability
improves upon multinomial logistic
[Blum Fran¸ois, 2009]
c
129. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-MCMC
Markov chain (θ(t) ) created via the transition function
θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
π(θ )Kω (t) |θ )
θ (t+1)
= and u ∼ U(0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,
ω (θ
(t)
θ otherwise,
130. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-MCMC
Markov chain (θ(t) ) created via the transition function
θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
π(θ )Kω (t) |θ )
θ (t+1)
= and u ∼ U(0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,
ω (θ
(t)
θ otherwise,
has the posterior π(θ|y ) as stationary distribution
[Marjoram et al, 2003]
131. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-MCMC (2)
Algorithm 2 Likelihood-free MCMC sampler
Use Algorithm 1 to get (θ(0) , z(0) )
for t = 1 to N do
Generate θ from Kω ·|θ(t−1) ,
Generate z from the likelihood f (·|θ ),
Generate u from U[0,1] ,
π(θ )Kω (θ(t−1) |θ )
if u ≤ I
π(θ(t−1) Kω (θ |θ(t−1) ) A ,y (z ) then
set (θ(t) , z(t) ) = (θ , z )
else
(θ(t) , z(t) )) = (θ(t−1) , z(t−1) ),
end if
end for
132. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Why does it work?
Acceptance probability does not involve calculating the likelihood
and
π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
×
π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ )
π(θ ) XX|θ IA ,y (z )
)
f (z X X
=
π(θ (t−1) ) f (z(t−1) |θ (t−1) ) IA ,y (z(t−1) )
q(θ (t−1) |θ ) f (z(t−1) |θ (t−1) )
×
q(θ |θ (t−1) ) X|θ
X )
f (z XX
133. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Why does it work?
Acceptance probability does not involve calculating the likelihood
and
π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
×
π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ )
π(θ ) XX|θ IA ,y (z )
)
f (z X X
= hh (
π(θ (t−1) ) ((hhhhh) IA ,y (z(t−1) )
f (z(t−1) |θ (t−1)
((( h ((
hh (
q(θ (t−1) |θ ) ((hhhhh)
f (z(t−1) |θ (t−1)
((
((( h
×
q(θ |θ (t−1) ) X|θ
X )
f (z XX
134. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Why does it work?
Acceptance probability does not involve calculating the likelihood
and
π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
×
π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ )
π(θ ) X|θ IA ,y (z )
X )
f (z X X
= hh (
π(θ (t−1) ) ((hhhhh) XX(t−1) )
f (z(t−1) |θ (t−1) IAX
(( X
((( h ,y (zX
X
hh (
q(θ (t−1) |θ ) ((hhhhh)
f (z(t−1) |θ (t−1)
((
((( h
×
q(θ |θ (t−1) ) X|θ
X )
f (z XX
π(θ )q(θ (t−1) |θ )
= IA ,y (z )
π(θ (t−1) q(θ |θ (t−1) )
135. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
Case of
1 1
x ∼ N (θ, 1) + N (θ, 1)
2 2
under prior θ ∼ N (0, 10)
136. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
Case of
1 1
x ∼ N (θ, 1) + N (θ, 1)
2 2
under prior θ ∼ N (0, 10)
ABC sampler
thetas=rnorm(N,sd=10)
zed=sample(c(1,-1),N,rep=TRUE)*thetas+rnorm(N,sd=1)
eps=quantile(abs(zed-x),.01)
abc=thetas[abs(zed-x)eps]
137. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
Case of
1 1
x ∼ N (θ, 1) + N (θ, 1)
2 2
under prior θ ∼ N (0, 10)
ABC-MCMC sampler
metas=rep(0,N)
metas[1]=rnorm(1,sd=10)
zed[1]=x
for (t in 2:N){
metas[t]=rnorm(1,mean=metas[t-1],sd=5)
zed[t]=rnorm(1,mean=(1-2*(runif(1).5))*metas[t],sd=1)
if ((abs(zed[t]-x)eps)||(runif(1)dnorm(metas[t],sd=10)/dnorm(metas[t-1],sd=10))){
metas[t]=metas[t-1]
zed[t]=zed[t-1]}
}
139. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x =2
0.30
0.25
0.20
0.15
0.10
0.05
0.00
−4 −2 0 2 4
θ
140. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x = 10
141. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x = 10
0.25
0.20
0.15
0.10
0.05
0.00
−10 −5 0 5 10
θ
142. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x = 50
143. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x = 50
0.10
0.08
0.06
0.04
0.02
0.00
−40 −20 0 20 40
θ
144. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and x when z ∼ f (z|θ)
145. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and x when z ∼ f (z|θ)
Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
approximation.
146. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ details
Multidimensional distances ρk (k = 1, . . . , K ) and errors
k = ρk (ηk (z), ηk (y)), with
ˆ 1
k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K [{ k −ρk (ηk (zb ), ηk (y))}/hk ]
Bhk
b
ˆ
then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
147. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ details
Multidimensional distances ρk (k = 1, . . . , K ) and errors
k = ρk (ηk (z), ηk (y)), with
ˆ 1
k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K [{ k −ρk (ηk (zb ), ηk (y))}/hk ]
Bhk
b
ˆ
then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
ABCµ involves acceptance probability
ˆ
π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
ˆ
π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
148. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ multiple errors
[ c Ratmann et al., PNAS, 2009]
149. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ for model choice
[ c Ratmann et al., PNAS, 2009]
150. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Questions about ABCµ
For each model under comparison, marginal posterior on used to
assess the fit of the model (HPD includes 0 or not).
151. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Questions about ABCµ
For each model under comparison, marginal posterior on used to
assess the fit of the model (HPD includes 0 or not).
Is the data informative about ? [Identifiability]
How is the prior π( ) impacting the comparison?
How is using both ξ( |x0 , θ) and π ( ) compatible with a
standard probability model? [remindful of Wilkinson ]
Where is the penalisation for complexity in the model
comparison?
[X, Mengersen Chen, 2010, PNAS]
152. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC
Another sequential version producing a sequence of Markov
(t) (t)
transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T )
153. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC
Another sequential version producing a sequence of Markov
(t) (t)
transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T )
ABC-PRC Algorithm
(t−1)
1. Select θ at random from previous θi ’s with probabilities
(t−1)
ωi (1 ≤ i ≤ N).
2. Generate
(t) (t)
θi ∼ Kt (θ|θ ) , x ∼ f (x|θi ) ,
3. Check that (x, y ) , otherwise start again.
[Sisson et al., 2007]
154. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Why PRC?
Partial rejection control: Resample from a population of weighted
particles by pruning away particles with weights below threshold C ,
replacing them by new particles obtained by propagating an
existing particle by an SMC step and modifying the weights
accordinly.
[Liu, 2001]
155. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Why PRC?
Partial rejection control: Resample from a population of weighted
particles by pruning away particles with weights below threshold C ,
replacing them by new particles obtained by propagating an
existing particle by an SMC step and modifying the weights
accordinly.
[Liu, 2001]
PRC justification in ABC-PRC:
Suppose we then implement the PRC algorithm for some
c 0 such that only identically zero weights are smaller
than c
Trouble is, there is no such c...
156. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC weight
(t)
Probability ωi computed as
(t) (t) (t) (t)
ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 ,
where Lt−1 is an arbitrary transition kernel.
157. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC weight
(t)
Probability ωi computed as
(t) (t) (t) (t)
ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 ,
where Lt−1 is an arbitrary transition kernel.
In case
Lt−1 (θ |θ) = Kt (θ|θ ) ,
all weights are equal under a uniform prior.
158. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC weight
(t)
Probability ωi computed as
(t) (t) (t) (t)
ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 ,
where Lt−1 is an arbitrary transition kernel.
In case
Lt−1 (θ |θ) = Kt (θ|θ ) ,
all weights are equal under a uniform prior.
Inspired from Del Moral et al. (2006), who use backward kernels
Lt−1 in SMC to achieve unbiasedness
159. Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC bias
Lack of unbiasedness of the method