Call Girls Kozhikode - 9332606886 Our call girls are sure to provide you with...
Talk in Telecom-Paris, Nov. 15, 2011
1. Vanilla Rao–Blackwellisation of
Metropolis–Hastings algorithms
Christian P. Robert
Universit´ Paris-Dauphine, IuF, and CREST
e
Joint works with Randal Douc, Pierre Jacob and Murray Smith
xian@ceremade.dauphine.fr
November 16, 2011
1 / 36
2. Main themes
1 Rao–Blackwellisation on MCMC
2 Can be performed in any Hastings Metropolis algorithm
3 Asymptotically more efficient than usual MCMC with a
controlled additional computing
4 Takes advantage of parallel capacities at a very basic level
(GPUs)
2 / 36
3. Main themes
1 Rao–Blackwellisation on MCMC
2 Can be performed in any Hastings Metropolis algorithm
3 Asymptotically more efficient than usual MCMC with a
controlled additional computing
4 Takes advantage of parallel capacities at a very basic level
(GPUs)
2 / 36
4. Main themes
1 Rao–Blackwellisation on MCMC
2 Can be performed in any Hastings Metropolis algorithm
3 Asymptotically more efficient than usual MCMC with a
controlled additional computing
4 Takes advantage of parallel capacities at a very basic level
(GPUs)
2 / 36
5. Main themes
1 Rao–Blackwellisation on MCMC
2 Can be performed in any Hastings Metropolis algorithm
3 Asymptotically more efficient than usual MCMC with a
controlled additional computing
4 Takes advantage of parallel capacities at a very basic level
(GPUs)
2 / 36
10. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hastings algorithm
1 We wish to approximate
h(x)π(x)dx
I = = h(x)¯ (x)dx
π
π(x)dx
2 π(x) is known but not π(x)dx.
1 n
3 Approximate I with δ = n t=1 h(x (t) ) where (x (t) ) is a Markov
chain with limiting distribution π .
¯
4 Convergence obtained from Law of Large Numbers or CLT for
Markov chains.
5 / 36
11. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hastings algorithm
1 We wish to approximate
h(x)π(x)dx
I = = h(x)¯ (x)dx
π
π(x)dx
2 π(x) is known but not π(x)dx.
1 n
3 Approximate I with δ = n t=1 h(x (t) ) where (x (t) ) is a Markov
chain with limiting distribution π .
¯
4 Convergence obtained from Law of Large Numbers or CLT for
Markov chains.
5 / 36
12. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hastings algorithm
1 We wish to approximate
h(x)π(x)dx
I = = h(x)¯ (x)dx
π
π(x)dx
2 π(x) is known but not π(x)dx.
1 n
3 Approximate I with δ = n t=1 h(x (t) ) where (x (t) ) is a Markov
chain with limiting distribution π .
¯
4 Convergence obtained from Law of Large Numbers or CLT for
Markov chains.
5 / 36
13. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hastings algorithm
1 We wish to approximate
h(x)π(x)dx
I = = h(x)¯ (x)dx
π
π(x)dx
2 π(x) is known but not π(x)dx.
1 n
3 Approximate I with δ = n t=1 h(x (t) ) where (x (t) ) is a Markov
chain with limiting distribution π .
¯
4 Convergence obtained from Law of Large Numbers or CLT for
Markov chains.
5 / 36
14. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied: π is the
¯
stationary distribution of (x (t) ).
The accepted candidates are simulated with the rejection algorithm.
6 / 36
15. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied: π is the
¯
stationary distribution of (x (t) ).
The accepted candidates are simulated with the rejection algorithm.
6 / 36
16. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied:
π(x)q(y |x)α(x, y ) = π(y )q(x|y )α(y , x).
π is the stationary distribution of (x (t) ).
¯
The accepted candidates are simulated with the rejection algorithm.
6 / 36
17. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied:
π(x)q(y |x)α(x, y ) = π(y )q(x|y )α(y , x).
π is the stationary distribution of (x (t) ).
¯
The accepted candidates are simulated with the rejection algorithm.
6 / 36
18. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Some properties of the HM algorithm
1 Alternative representation of the estimator δ is
n MN
1 1
δ= h(x (t) ) = ni h(zi ) ,
n t=1
N
i=1
where
zi ’s are the accepted yj ’s,
MN is the number of accepted yj ’s till time N,
ni is the number of times zi appears in the sequence (x (t) )t .
7 / 36
19. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate from q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
this is the transition of the HM algorithm.The transition kernel q
˜
admits π as a stationary distribution:
˜
π (x)˜ (y |x) =
˜ q
8 / 36
20. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate from q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
this is the transition of the HM algorithm.The transition kernel q
˜
admits π as a stationary distribution:
˜
π (x)˜ (y |x) =
˜ q
8 / 36
21. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate from q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
this is the transition of the HM algorithm.The transition kernel q
˜
admits π as a stationary distribution:
˜
π (x)˜ (y |x) =
˜ q
8 / 36
22. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate from q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
this is the transition of the HM algorithm.The transition kernel q
˜
admits π as a stationary distribution:
˜
π(x)p(x) α(x, y )q(y |x)
π (x)˜ (y |x) =
˜ q
π(u)p(u)du p(x)
π (x)
˜ q (y |x)
˜
8 / 36
23. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate from q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
this is the transition of the HM algorithm.The transition kernel q
˜
admits π as a stationary distribution:
˜
π(x)α(x, y )q(y |x)
π (x)˜ (y |x) =
˜ q
π(u)p(u)du
8 / 36
24. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate from q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
this is the transition of the HM algorithm.The transition kernel q
˜
admits π as a stationary distribution:
˜
π(y )α(y , x)q(x|y )
π (x)˜ (y |x) =
˜ q
π(u)p(u)du
8 / 36
25. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate from q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
this is the transition of the HM algorithm.The transition kernel q
˜
admits π as a stationary distribution:
˜
π (x)˜ (y |x) = π (y )˜ (x|y ) ,
˜ q ˜ q
8 / 36
26. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probability
parameter
p(zi ) := α(zi , y ) q(y |zi ) dy ; (1)
4 ˜
(zi )i is a Markov chain with transition kernel Q(z, dy ) = q (y |z)dy
˜
and stationary distribution π such that
˜
q (·|z) ∝ α(z, ·) q(·|z)
˜ and π (·) ∝ π(·)p(·) .
˜
9 / 36
27. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probability
parameter
p(zi ) := α(zi , y ) q(y |zi ) dy ; (1)
4 ˜
(zi )i is a Markov chain with transition kernel Q(z, dy ) = q (y |z)dy
˜
and stationary distribution π such that
˜
q (·|z) ∝ α(z, ·) q(·|z)
˜ and π (·) ∝ π(·)p(·) .
˜
9 / 36
28. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probability
parameter
p(zi ) := α(zi , y ) q(y |zi ) dy ; (1)
4 ˜
(zi )i is a Markov chain with transition kernel Q(z, dy ) = q (y |z)dy
˜
and stationary distribution π such that
˜
q (·|z) ∝ α(z, ·) q(·|z)
˜ and π (·) ∝ π(·)p(·) .
˜
9 / 36
29. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probability
parameter
p(zi ) := α(zi , y ) q(y |zi ) dy ; (1)
4 ˜
(zi )i is a Markov chain with transition kernel Q(z, dy ) = q (y |z)dy
˜
and stationary distribution π such that
˜
q (·|z) ∝ α(z, ·) q(·|z)
˜ and π (·) ∝ π(·)p(·) .
˜
9 / 36
30. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Old bottle, new wine [or vice-versa]
zi−1
10 / 36
31. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Old bottle, new wine [or vice-versa]
indep
zi−1 zi
indep
ni−1
10 / 36
32. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Old bottle, new wine [or vice-versa]
indep indep
zi−1 zi zi+1
indep indep
ni−1 ni
10 / 36
33. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Old bottle, new wine [or vice-versa]
indep indep
zi−1 zi zi+1
indep indep
ni−1 ni
n MN
1 1
δ= h(x (t) ) = ni h(zi ) .
n t=1
N
i=1
10 / 36
34. Metropolis Hastings revisited
Rao–Blackwellisation
Rao-Blackwellisation (2)
Old bottle, new wine [or vice-versa]
indep indep
zi−1 zi zi+1
indep indep
ni−1 ni
n MN
1 1
δ= h(x (t) ) = ni h(zi ) .
n t=1
N
i=1
10 / 36
39. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Importance sampling perspective
1 A natural idea:
MN h(zi ) MN π(zi )
i=1 i=1 h(zi )
p(zi ) π (zi )
˜
δ∗ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜
2 But p not available in closed form.
3 The geometric ni is the replacement obvious solution that is used in
the original Metropolis–Hastings estimate since E[ni ] = 1/p(zi ).
12 / 36
40. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
The Bernoulli factory
The crude estimate of 1/p(zi ),
∞
ni = 1 + I {u ≥ α(zi , y )} ,
j=1 ≤j
can be improved:
Lemma (Douc & X., AoS, 2011)
If (yj )j is an iid sequence with distribution q(y |zi ), the quantity
∞
ˆ
ξi = 1 + {1 − α(zi , y )}
j=1 ≤j
is an unbiased estimator of 1/p(zi ) which variance, conditional on zi , is
lower than the conditional variance of ni , {1 − p(zi )}/p 2 (zi ).
13 / 36
41. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Rao-Blackwellised, for sure?
∞
ˆ
ξi = 1 + {1 − α(zi , y )}
j=1 ≤j
1 Infinite sum but finite with at least positive probability:
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
For example: take a symmetric random walk as a proposal.
2 What if we wish to be sure that the sum is finite?
Finite horizon improvement:
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {u ≥ α(zi , y )}
j=1 1≤ ≤k∧j k+1≤ ≤j
14 / 36
42. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Rao-Blackwellised, for sure?
∞
ˆ
ξi = 1 + {1 − α(zi , y )}
j=1 ≤j
1 Infinite sum but finite with at least positive probability:
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
For example: take a symmetric random walk as a proposal.
2 What if we wish to be sure that the sum is finite?
Finite horizon improvement:
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {u ≥ α(zi , y )}
j=1 1≤ ≤k∧j k+1≤ ≤j
14 / 36
43. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Variance improvement
Proposition (Douc & X., AoS, 2011)
If (yj )j is an iid sequence with distribution q(y |zi ) and (uj )j is an iid
uniform sequence, for any k ≥ 0, the quantity
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {u ≥ α(zi , y )}
j=1 1≤ ≤k∧j k+1≤ ≤j
is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
terms.
15 / 36
44. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Variance improvement
Proposition (Douc & X., AoS, 2011)
If (yj )j is an iid sequence with distribution q(y |zi ) and (uj )j is an iid
uniform sequence, for any k ≥ 0, the quantity
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {u ≥ α(zi , y )}
j=1 1≤ ≤k∧j k+1≤ ≤j
is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
terms. Moreover, for k ≥ 1,
ˆ 1 − p(zi ) 1 − (1 − 2p(zi ) + r (zi ))k 2 − p(zi )
V ξik zi = 2 (z )
− (p(zi ) − r (zi )) ,
p i 2p(zi ) − r (zi ) p 2 (zi )
where p(zi ) := α(zi , y ) q(y |zi ) dy . and r (zi ) := α2 (zi , y ) q(y |zi ) dy .
15 / 36
45. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Variance improvement
Proposition (Douc & X., AoS, 2011)
If (yj )j is an iid sequence with distribution q(y |zi ) and (uj )j is an iid
uniform sequence, for any k ≥ 0, the quantity
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {u ≥ α(zi , y )}
j=1 1≤ ≤k∧j k+1≤ ≤j
is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
terms. Therefore, we have
ˆ ˆ ˆ
V ξi zi ≤ V ξik zi ≤ V ξi0 zi = V [ni | zi ] .
15 / 36
47. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
not indep
zi−1 zi
not indep
ˆk
ξi−1
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {u ≥ α(zi , y )}
j=1 1≤ ≤k∧j k+1≤ ≤j
16 / 36
48. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
not indep not indep
zi−1 zi zi+1
not indep not indep
ˆk
ξi−1 ˆ
ξik
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {u ≥ α(zi , y )}
j=1 1≤ ≤k∧j k+1≤ ≤j
16 / 36
49. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
not indep not indep
zi−1 zi zi+1
not indep not indep
ˆk
ξi−1 ˆ
ξik
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi
16 / 36
50. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
not indep not indep
zi−1 zi zi+1
not indep not indep
ˆk
ξi−1 ˆ
ξik
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi
16 / 36
51. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}.
17 / 36
52. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}. Assume
that there exists a positive function ϕ ≥ 1 such that
M
i=1 h(zi )/p(zi ) P
∀h ∈ Cϕ , M
−→ π(h)
i=1 1/p(zi )
Theorem (Douc & X., AoS, 2011)
Under the assumption that π(p) > 0, the following convergence property holds:
i) If h is in Cϕ , then
k P
δM −→M→∞ π(h) ( Consistency)
17 / 36
53. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}.
Assume that there exists a positive function ψ such that
√ M
i=1 h(zi )/p(zi ) L
∀h ∈ Cψ , M M
− π(h) −→ N (0, Γ(h))
i=1 1/p(zi )
Theorem (Douc & X., AoS, 2011)
Under the assumption that π(p) > 0, the following convergence property
holds:
ii) If, in addition, h2 /p ∈ Cϕ and h ∈ Cψ , then
√ k L
M(δM − π(h)) −→M→∞ N (0, Vk [h − π(h)]) , ( Clt)
where Vk (h) := π(p) ˆ
π(dz)V ξik z h2 (z)p(z) + Γ(h) . 17 / 36
54. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
i
NCh (x)
∀h ∈ Cζ , Px sup [h(zi ) − π (h)] > ≤
˜ 2
0≤i≤N j=0
Theorem (Douc & X., AoS, 2011)
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N
t=1 h(x (t) ) N→+∞
MN − π(h) −→ N (0, V0 [h − π(h)]) ,
N
where MN is defined by 18 / 36
55. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
i
NCh (x)
∀h ∈ Cζ , Px sup [h(zi ) − π (h)] > ≤
˜ 2
0≤i≤N j=0
Moreover, assume that ∃φ ≥ 1 such that for any starting point x,
˜ P
∀h ∈ Cφ , Q n (x, h) −→ π (h) = π(ph)/π(p) ,
˜
Theorem (Douc & X., AoS, 2011)
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N 18 / 36
56. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
i
NCh (x)
∀h ∈ Cζ , Px sup [h(zi ) − π (h)] > ≤
˜ 2
0≤i≤N j=0
Moreover, assume that ∃φ ≥ 1 such that for any starting point x,
˜ P
∀h ∈ Cφ , Q n (x, h) −→ π (h) = π(ph)/π(p) ,
˜
Theorem (Douc & X., AoS, 2011)
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N 18 / 36
57. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
i
NCh (x)
∀h ∈ Cζ , Px sup [h(zi ) − π (h)] > ≤
˜ 2
0≤i≤N j=0
˜ P
∀h ∈ Cφ , Q n (x, h) −→ π (h) = π(ph)/π(p) ,
˜
Theorem (Douc & X., AoS, 2011)
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N
t=1 h(x (t) ) N→+∞
MN − π(h) −→ N (0, V0 [h − π(h)]) ,
N
where MN is defined by 18 / 36
58. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Theorem (Douc & X., AoS, 2011)
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N
t=1 h(x (t) ) N→+∞
MN − π(h) −→ N (0, V0 [h − π(h)]) ,
N
where MN is defined by
MN MN +1
ˆ
ξi0 ≤ N < ˆ
ξi0 .
i=1 i=1
18 / 36
59. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Variance gain (1)
h(x) x x2 IX >0 p(x)
τ = .1 0.971 0.953 0.957 0.207
τ =2 0.965 0.942 0.875 0.861
τ =5 0.913 0.982 0.785 0.826
τ =7 0.899 0.982 0.768 0.820
Ratios of the empirical variances of δ ∞ and δ estimating E[h(X )]:
100 MCMC iterations over 103 replications of a random walk Gaussian
proposal with scale τ .
19 / 36
60. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Illustration (1)
Figure: Overlay of the variations of 250 iid realisations of the estimates
δ (gold) and δ ∞ (grey) of E[X ] = 0 for 1000 iterations, along with the
90% interquantile range for the estimates δ (brown) and δ ∞ (pink), in
the setting of a random walk Gaussian proposal with scale τ = 10.
20 / 36
61. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Extra computational effort
median mean q.8 q.9 time
τ = .25 0.0 8.85 4.9 13 4.2
τ = .50 0.0 6.76 4 11 2.25
τ = 1.0 0.25 6.15 4 10 2.5
τ = 2.0 0.20 5.90 3.5 8.5 4.5
Additional computing effort due: median and mean numbers of additional
iterations, 80% and 90% quantiles for the additional iterations, and ratio
of the average R computing times obtained over 105 simulations
21 / 36
62. Formal importance sampling
Metropolis Hastings revisited
Variance reduction
Rao–Blackwellisation
Asymptotic results
Rao-Blackwellisation (2)
Illustrations
Illustration (2)
Figure: Overlay of the variations of 500 iid realisations of the estimates
δ (deep grey), δ ∞ (medium grey) and of the importance sampling version
(light grey) of E[X ] = 10 when X ∼ Exp(.1) for 100 iterations, along
with the 90% interquantile ranges (same colour code), in the setting of
an independent exponential proposal with scale µ = 0.02. 22 / 36
63. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Outline
1 Metropolis Hastings revisited
2 Rao–Blackwellisation
Formal importance sampling
Variance reduction
Asymptotic results
Illustrations
3 Rao-Blackwellisation (2)
Independent case
General MH algorithms
23 / 36
64. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Integrating out white noise
In Casella+X. (1996) paper, averaging of possible past and future
histories (by integrating out uniforms) to improve weights of accepted
values
Rao–Blackwellised weight on proposed values yt
p
(i)
ϕt = δt ξtj
j=t
t−1
with δ0 = 1 δt = δj ξj(t−1) ρjt
j=0
j
and ξtt = 1 , ξtj = (1 − ρtu )
u=t+1
occurence survivals of the yt ’s, associated with Metropolis–Hastings ratio
ωt = π(yt )/µ(yt ) , ρtu = ωu /ωt ∧ 1 .
24 / 36
65. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Integrating out white noise
In Casella+X. (1996) paper, averaging of possible past and future
histories (by integrating out uniforms) to improve weights of accepted
values
Rao–Blackwellised weight on proposed values yt
p
(i)
ϕt = δt ξtj
j=t
t−1
with δ0 = 1 δt = δj ξj(t−1) ρjt
j=0
j
and ξtt = 1 , ξtj = (1 − ρtu )
u=t+1
occurence survivals of the yt ’s, associated with Metropolis–Hastings ratio
ωt = π(yt )/µ(yt ) , ρtu = ωu /ωt ∧ 1 .
24 / 36
66. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Integrating out white noise
Potentialy large variance improvement but cost of O(T 2 )...
Possible recovery of efficiency thanks to parallelisation:
Moving from ( 1 , . . . , p ) towards...
( (1) , . . . , (p) )
by averaging over ”all” possible orders
25 / 36
67. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Integrating out white noise
Potentialy large variance improvement but cost of O(T 2 )...
Possible recovery of efficiency thanks to parallelisation:
Moving from ( 1 , . . . , p ) towards...
( (1) , . . . , (p) )
by averaging over ”all” possible orders
25 / 36
68. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Integrating out white noise
Potentialy large variance improvement but cost of O(T 2 )...
Possible recovery of efficiency thanks to parallelisation:
Moving from ( 1 , . . . , p ) towards...
( (1) , . . . , (p) )
by averaging over ”all” possible orders
25 / 36
69. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Case of the independent Metropolis–Hastings algorithm
Starting at time t with p processors and a pool of p proposed values,
(y1 , . . . , yp )
use processors to examine in parallel p different “histories”
26 / 36
70. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Case of the independent Metropolis–Hastings algorithm
Starting at time t with p processors and a pool of p proposed values,
(y1 , . . . , yp )
use processors to examine in parallel p different “histories”
26 / 36
71. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Improvement
The standard estimator τ1 of Eπ [h(X )]
ˆ
p
1
τ1 (xt , y1:p ) =
ˆ h(xt+k )
p
k=1
is necessarily dominated by the average
p
1
τ2 (xt , y1:p ) =
ˆ nk h(yk )
p2
k=0
where y0 = xt and n0 is the number of times xt is repeated.
27 / 36
72. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Further Rao-Blackwellisation
E.g., use of the Metropolis–Hastings weights wj : j being the index such
that xt+i−1 = yj , update of the weights at each time t + i:
wj = wj + 1 − ρ(xt+i−1 , yi )
wi = wi + ρ(xt+i−1 , yi )
resulting into a more stable estimator
p
1
τ3 (xt , y1:p ) =
ˆ wk h(yk )
p2
k=0
E.g., Casella+X. (1996)
p
1
τ4 (xt , y1:p ) =
ˆ ϕk h(yk )
p2
k=0
28 / 36
73. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Further Rao-Blackwellisation
E.g., use of the Metropolis–Hastings weights wj : j being the index such
that xt+i−1 = yj , update of the weights at each time t + i:
wj = wj + 1 − ρ(xt+i−1 , yi )
wi = wi + ρ(xt+i−1 , yi )
resulting into a more stable estimator
p
1
τ3 (xt , y1:p ) =
ˆ wk h(yk )
p2
k=0
E.g., Casella+X. (1996)
p
1
τ4 (xt , y1:p ) =
ˆ ϕk h(yk )
p2
k=0
28 / 36
74. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Markovian continuity
The Markov validity of the chain is not jeopardised! The chain continues
(j)
by picking one sequence at random and taking the corresponding xt+p as
starting point of the next parallel block.
29 / 36
75. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Markovian continuity
The Markov validity of the chain is not jeopardised! The chain continues
(j)
by picking one sequence at random and taking the corresponding xt+p as
starting point of the next parallel block.
29 / 36
76. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Impact of Rao-Blackwellisations
Comparison of
τ1 basic IMH estimator of Eπ [h(X )],
ˆ
τ2 improving by averaging over permutations of proposed values and
ˆ
using p times more uniforms
τ3 improving upon τ2 by basic Rao-Blackwell argument,
ˆ ˆ
τ4 improving upon τ2 by integrating out ancillary uniforms, at a cost
ˆ ˆ
of O(p 2 ).
30 / 36
77. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Illustration
Variations of estimates based on RB and standard versions of parallel
chains and on a standard MCMC chain for the mean and variance of the
target N (0, 1) distribution (based on 10, 000 independent replicas).
31 / 36
78. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Illustration
Variations of estimates based on RB and standard versions of parallel
chains and on a standard MCMC chain for the mean and variance of the
target N (0, 1) distribution (based on 10, 000 independent replicas).
31 / 36
79. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Illustration
Variations of estimates based on RB and standard versions of parallel
chains and on a standard MCMC chain for the mean and variance of the
target N (0, 1) distribution (based on 10, 000 independent replicas).
31 / 36
80. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Illustration
Variations of estimates based on RB and standard versions of parallel
chains and on a standard MCMC chain for the mean and variance of the
target N (0, 1) distribution (based on 10, 000 independent replicas).
31 / 36
81. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Impact of the order
Parallelisation allows for the partial integration of the uniforms
What about the permutation order?
Comparison of
τ2N with no permutation,
ˆ
τ2C with circular permutations,
ˆ
τ2R with random permutations,
ˆ
τ2H with half-random permutations,
ˆ
τ2S with stratified permutations,
ˆ
32 / 36
82. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Impact of the order
Parallelisation allows for the partial integration of the uniforms
What about the permutation order?
32 / 36
83. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Impact of the order
Parallelisation allows for the partial integration of the uniforms
What about the permutation order?
32 / 36
84. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Impact of the order
Parallelisation allows for the partial integration of the uniforms
What about the permutation order?
32 / 36
85. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Impact of the order
Parallelisation allows for the partial integration of the uniforms
What about the permutation order?
32 / 36
86. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Importance target
Comparison with the ultimate importance sampling
33 / 36
87. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Importance target
Comparison with the ultimate importance sampling
33 / 36
88. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Importance target
Comparison with the ultimate importance sampling
33 / 36
89. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Importance target
Comparison with the ultimate importance sampling
33 / 36
90. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Extension to the general case
Same principle can be applied to any Markov update: if
xt+1 = Ψ(xt , t )
then generate
( 1, . . . , p)
in advance and distribute to the p processors in different permutation
orders
Plus use of Douc & X’s (2011) Rao–Blackwellisation ξikˆ
34 / 36
91. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Extension to the general case
Same principle can be applied to any Markov update: if
xt+1 = Ψ(xt , t )
then generate
( 1, . . . , p)
in advance and distribute to the p processors in different permutation
orders
Plus use of Douc & X’s (2011) Rao–Blackwellisation ξikˆ
34 / 36
92. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Implementation
(j)
Similar run of p parallel chains (xt+i ), use of averages
p p
(1:p) 1 (j)
τ2 (x1:p ) =
ˆ nk h(xt+k )
p2
k=1 j=1
and selection of new starting value at random at time t + p:
35 / 36
93. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Implementation
(j)
Similar run of p parallel chains (xt+i ), use of averages
p p
(1:p) 1 (j)
τ2 (x1:p ) =
ˆ nk h(xt+k )
p2
k=1 j=1
and selection of new starting value at random at time t + p:
35 / 36
94. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Illustration
Variations of estimates based on RB and standard versions of parallel
chains and on a standard MCMC chain for the mean and variance of the
target distribution (based on p = 64 parallel processors, 50 blocs of p
MCMC steps and 500 independent replicas).
1.3
0.10
1.2
0.05
1.1
0.00
1.0
−0.05
0.9
−0.10
RB par org RB par org
36 / 36
95. Metropolis Hastings revisited
Independent case
Rao–Blackwellisation
General MH algorithms
Rao-Blackwellisation (2)
Illustration
Variations of estimates based on RB and standard versions of parallel
chains and on a standard MCMC chain for the mean and variance of the
target distribution (based on p = 64 parallel processors, 50 blocs of p
MCMC steps and 500 independent replicas).
1.3
0.10
1.2
0.05
1.1
0.00
1.0
−0.05
0.9
−0.10
RB par org RB par org
36 / 36