1. On resolving the Savage–Dickey paradox
On resolving the Savage–Dickey paradox
Christian P. Robert
Universit´ Paris Dauphine & CREST-INSEE
e
http://www.ceremade.dauphine.fr/~xian
Frontiers...
San Antonio, March 19, 2010
Joint work with J.-M. Marin
2. On resolving the Savage–Dickey paradox
Outline
1 Importance sampling solutions compared
2 The Savage–Dickey ratio
3. On resolving the Savage–Dickey paradox
Outline
1 Importance sampling solutions compared
2 The Savage–Dickey ratio
Happy B’day, Jim!!!
4. On resolving the Savage–Dickey paradox
Evidence
Bayesian model choice and hypothesis testing relies on a similar
quantity, the evidence
Zk = πk (θk )Lk (θk ) dθk , k = 1, 2, . . .
Θk
aka the marginal likelihood.
[Jeffreys, 1939]
5. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Importance sampling solutions
1 Importance sampling solutions compared
Regular importance
Bridge sampling
Harmonic means
Chib’s representation
2 The Savage–Dickey ratio
6. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Bayes factor approximation
When approximating the Bayes factor
f0 (x|θ0 )π0 (θ0 )dθ0
Θ0
B01 =
f1 (x|θ1 )π1 (θ1 )dθ1
Θ1
use of importance functions ϕ0 and ϕ1 and
n−1
0
n0 i i i
i=1 f0 (x|θ0 )π0 (θ0 )/ϕ0 (θ0 )
B01 =
n−1
1
n1 i i i
i=1 f1 (x|θ1 )π1 (θ1 )/ϕ1 (θ1 )
7. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g-prior modelling:
β ∼ N3 (0, n XT X)−1
Use of importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
8. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g-prior modelling:
β ∼ N3 (0, n XT X)−1
Use of importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
9. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g-prior modelling:
β ∼ N3 (0, n XT X)−1
Use of importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
10. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g-prior modelling:
β ∼ N3 (0, n XT X)−1
Use of importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
11. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Diabetes in Pima Indian women
Comparison of the variation of the Bayes factor approximations
based on 100 replicas for 20, 000 simulations from the prior and
the above MLE importance sampler
5
4
q
3
2
Monte Carlo Importance sampling
13. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Optimal bridge sampling
The optimal choice of auxiliary function is
1
α ∝
n1 π1 (θ|x) + n2 π2 (θ|x)
leading to
n1
1 π2 (θ1i |x)
˜
n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
i=1
B12 ≈ n2
1 π1 (θ2i |x)
˜
n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
i=1
14. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Extension to varying dimensions
When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
joint distribution
π1 (θ1 |x) × ω(ψ|θ1 , x)
on Θ2 so that
π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
˜
B12 =
π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
˜
π1 (θ1 )ω(ψ|θ1 )
˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
π
= Eπ2 =
π2 (θ1 , ψ)
˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
π
for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
15. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Extension to varying dimensions
When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
joint distribution
π1 (θ1 |x) × ω(ψ|θ1 , x)
on Θ2 so that
π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
˜
B12 =
π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
˜
π1 (θ1 )ω(ψ|θ1 )
˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
π
= Eπ2 =
π2 (θ1 , ψ)
˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
π
for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
16. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Illustration for the Pima Indian dataset
Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
pseudo-posterior and mixture of both MLE approximations on β3
in bridge sampling estimate
17. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Illustration for the Pima Indian dataset
Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
pseudo-posterior and mixture of both MLE approximations on β3
in bridge sampling estimate
5
4
q
q
3
q
2
MC Bridge IS
18. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
The original harmonic mean estimator
When θki ∼ πk (θ|x),
T
1 1
T L(θkt |x)
t=1
is an unbiased estimator of 1/mk (x)
[Newton & Raftery, 1994]
Highly dangerous: Most often leads to an infinite variance!!!
19. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
The original harmonic mean estimator
When θki ∼ πk (θ|x),
T
1 1
T L(θkt |x)
t=1
is an unbiased estimator of 1/mk (x)
[Newton & Raftery, 1994]
Highly dangerous: Most often leads to an infinite variance!!!
20. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
“The Worst Monte Carlo Method Ever”
“The good news is that the Law of Large Numbers guarantees that this
estimator is consistent ie, it will very likely be very close to the correct
answer if you use a sufficiently large number of points from the posterior
distribution.
The bad news is that the number of points required for this estimator to
get close to the right answer will often be greater than the number of
atoms in the observable universe. The even worse news is that it’s easy
for people to not realize this, and to na¨ ıvely accept estimates that are
nowhere close to the correct value of the marginal likelihood.”
[Radford Neal’s blog, Aug. 23, 2008]
21. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Approximating Zk from a posterior sample
Use of the [harmonic mean] identity
ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1
Eπk x = dθk =
πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk
no matter what the proposal ϕ(·) is.
[Gelfand & Dey, 1994; Bartolucci et al., 2006]
Direct exploitation of the MCMC output
22. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Approximating Zk from a posterior sample
Use of the [harmonic mean] identity
ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1
Eπk x = dθk =
πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk
no matter what the proposal ϕ(·) is.
[Gelfand & Dey, 1994; Bartolucci et al., 2006]
Direct exploitation of the MCMC output
23. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance sampling
constraints: ϕ(θ) must have lighter (rather than fatter) tails than
πk (θk )Lk (θk ) for the approximation
T (t)
1 ϕ(θk )
Z1k = 1 (t) (t)
T πk (θk )Lk (θk )
t=1
to have a finite variance.
E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
24. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance sampling
constraints: ϕ(θ) must have lighter (rather than fatter) tails than
πk (θk )Lk (θk ) for the approximation
T (t)
1 ϕ(θk )
Z1k = 1 (t) (t)
T πk (θk )Lk (θk )
t=1
to have a finite variance.
E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
25. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
HPD indicator as ϕ
Use the convex hull of MCMC simulations corresponding to the
10% HPD region (easily derived!) and ϕ as indicator:
10
ϕ(θ) = Id(θ,θ(t) )≤
T
t∈HPD
26. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Diabetes in Pima Indian women (cont’d)
Comparison of the variation of the Bayes factor approximations
based on 100 replicas for 20, 000 simulations for a simulation from
the above harmonic mean sampler and importance samplers
3.116
q
3.114
3.112
3.110
3.108
3.106
3.104
q
3.102
Harmonic mean Importance sampling
27. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
Zk = mk (x) =
πk (θk |x)
Use of an approximation to the posterior
∗ ∗
fk (x|θk ) πk (θk )
Zk = mk (x) = .
ˆ ∗
πk (θk |x)
28. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
Zk = mk (x) =
πk (θk |x)
Use of an approximation to the posterior
∗ ∗
fk (x|θk ) πk (θk )
Zk = mk (x) = .
ˆ ∗
πk (θk |x)
29. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwell
estimate
T
∗ 1 ∗ (t)
πk (θk |x) = πk (θk |x, zk ) ,
T
t=1
(t)
where the zk ’s are Gibbs sampled latent variables
30. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Compensation for label switching
(t)
For mixture models, zk usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x)
k!
σ∈S
for all σ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Recover the theoretical symmetry by using
T
∗ 1 ∗ (t)
πk (θk |x) = πk (σ(θk )|x, zk ) .
T k!
σ∈Sk t=1
[Berkhof, Mechelen, & Gelman, 2003]
31. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Compensation for label switching
(t)
For mixture models, zk usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x)
k!
σ∈S
for all σ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Recover the theoretical symmetry by using
T
∗ 1 ∗ (t)
πk (θk |x) = πk (σ(θk )|x, zk ) .
T k!
σ∈Sk t=1
[Berkhof, Mechelen, & Gelman, 2003]
32. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Case of the probit model
For the completion by z,
1
π (θ|x) =
ˆ π(θ|x, z (t) )
T t
is a simple average of normal densities
q
0.0255
q
q q
q
0.0250
0.0245
0.0240
q
q
Chib's method importance sampling
33. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
1 Importance sampling solutions compared
2 The Savage–Dickey ratio
Measure-theoretic aspects
Computational implications
34. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
1 Importance sampling solutions compared
2 The Savage–Dickey ratio
Measure-theoretic aspects
Computational implications
35. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
The Savage–Dickey ratio representation
Special representation of the Bayes factor used for simulation
Original version (Dickey, AoMS, 1971)
36. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
The Savage–Dickey ratio representation
Special representation of the Bayes factor used for simulation
Original version (Dickey, AoMS, 1971)
37. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Savage’s density ratio theorem
Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance
parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that
π1 (ψ|θ0 ) = π0 (ψ)
then
π1 (θ0 |x)
B01 = ,
π1 (θ0 )
with the obvious notations
π1 (θ) = π1 (θ, ψ)dψ , π1 (θ|x) = π1 (θ, ψ|x)dψ ,
[Dickey, 1971; Verdinelli & Wasserman, 1995]
38. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Rephrased
“Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ),
Z
f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) ,
i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average
of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model.
Applying Bayes’ theorem to the right-hand side of [the above] we get
‹
f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 )
and hence the Bayes factor is given by
‹ ‹
B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) .
the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”
[O’Hagan & Forster, 1996]
39. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Rephrased
“Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ),
Z
f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) ,
i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average
of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model.
Applying Bayes’ theorem to the right-hand side of [the above] we get
‹
f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 )
and hence the Bayes factor is given by
‹ ‹
B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) .
the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”
[O’Hagan & Forster, 1996]
40. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Measure-theoretic difficulty
Representation depends on the choice of versions of conditional
densities:
π0 (ψ)f (x|θ0 , ψ) dψ
B01 = [by definition]
π1 (θ, ψ)f (x|θ, ψ) dψdθ
π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
= [specific version of π1 (ψ|θ0 )
π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
and arbitrary version of π1 (θ0 )]
π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
= [specific version of π1 (θ0 , ψ)]
m1 (x)π1 (θ0 )
π1 (θ0 |x)
= [version dependent]
π1 (θ0 )
41. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Measure-theoretic difficulty
Representation depends on the choice of versions of conditional
densities:
π0 (ψ)f (x|θ0 , ψ) dψ
B01 = [by definition]
π1 (θ, ψ)f (x|θ, ψ) dψdθ
π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
= [specific version of π1 (ψ|θ0 )
π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
and arbitrary version of π1 (θ0 )]
π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
= [specific version of π1 (θ0 , ψ)]
m1 (x)π1 (θ0 )
π1 (θ0 |x)
= [version dependent]
π1 (θ0 )
42. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Choice of density version
c Dickey’s (1971) condition is not a condition:
If
π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ
=
π1 (θ0 ) m1 (x)
is chosen as a version, then Savage–Dickey’s representation holds
43. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Choice of density version
c Dickey’s (1971) condition is not a condition:
If
π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ
=
π1 (θ0 ) m1 (x)
is chosen as a version, then Savage–Dickey’s representation holds
44. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Savage–Dickey paradox
Verdinelli-Wasserman extension:
π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
B01 = E
π1 (θ0 ) π1 (ψ|θ0 )
similarly depends on choices of versions...
...but Monte Carlo implementation relies on specific versions of all
densities without making mention of it
[Chen, Shao & Ibrahim, 2000]
45. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Savage–Dickey paradox
Verdinelli-Wasserman extension:
π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
B01 = E
π1 (θ0 ) π1 (ψ|θ0 )
similarly depends on choices of versions...
...but Monte Carlo implementation relies on specific versions of all
densities without making mention of it
[Chen, Shao & Ibrahim, 2000]
46. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
A computational exploitation
Starting from the (instrumental) prior
π1 (θ, ψ) = π1 (θ)π0 (ψ)
˜
define the associated posterior
π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
˜ ˜
and impose the choice of version
π1 (θ0 |x)
˜ π0 (ψ)f (x|θ0 , ψ) dψ
=
π0 (θ0 ) m1 (x)
˜
Then
π1 (θ0 |x) m1 (x)
˜ ˜
B01 =
π1 (θ0 ) m1 (x)
47. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
A computational exploitation
Starting from the (instrumental) prior
π1 (θ, ψ) = π1 (θ)π0 (ψ)
˜
define the associated posterior
π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
˜ ˜
and impose the choice of version
π1 (θ0 |x)
˜ π0 (ψ)f (x|θ0 , ψ) dψ
=
π0 (θ0 ) m1 (x)
˜
Then
π1 (θ0 |x) m1 (x)
˜ ˜
B01 =
π1 (θ0 ) m1 (x)
48. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
First ratio
If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then
˜
1
π1 (θ0 |x, ψ (t) )
˜
T t
converges to π1 (θ0 |x) provided the right version is used in θ0
˜
π1 (θ0 )f (x|θ0 , ψ)
π1 (θ0 |x, ψ) =
˜
π1 (θ)f (x|θ, ψ) dθ
49. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Rao–Blackwellisation with latent variables
When π1 (θ0 |x, ψ) unavailable, replace with
˜
T
1
π1 (θ0 |x, z (t) , ψ (t) )
˜
T
t=1
via data completion by latent variable z such that
f (x|θ, ψ) = ˜
f (x, z|θ, ψ) dz
˜
and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
˜
form, including the normalising constant, based on version
π1 (θ0 |x, z, ψ)
˜ ˜
f (x, z|θ0 , ψ)
= .
π1 (θ0 ) ˜
f (x, z|θ, ψ)π1 (θ) dθ
50. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Rao–Blackwellisation with latent variables
When π1 (θ0 |x, ψ) unavailable, replace with
˜
T
1
π1 (θ0 |x, z (t) , ψ (t) )
˜
T
t=1
via data completion by latent variable z such that
f (x|θ, ψ) = ˜
f (x, z|θ, ψ) dz
˜
and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
˜
form, including the normalising constant, based on version
π1 (θ0 |x, z, ψ)
˜ ˜
f (x, z|θ0 , ψ)
= .
π1 (θ0 ) ˜
f (x, z|θ, ψ)π1 (θ) dθ
51. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (1)
Since m1 (x)/m1 (x) is unknown, apparent failure!
˜
Use of the bridge identity
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
Eπ1 (θ,ψ|x)
˜
= Eπ1 (θ,ψ|x)
˜
=
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
to (biasedly) estimate m1 (x)/m1 (x) by
˜
T
π1 (ψ (t) |θ(t) )
T
t=1
π0 (ψ (t) )
based on the same sample from π1 .
˜
52. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (1)
Since m1 (x)/m1 (x) is unknown, apparent failure!
˜
Use of the bridge identity
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
Eπ1 (θ,ψ|x)
˜
= Eπ1 (θ,ψ|x)
˜
=
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
to (biasedly) estimate m1 (x)/m1 (x) by
˜
T
π1 (ψ (t) |θ(t) )
T
t=1
π0 (ψ (t) )
based on the same sample from π1 .
˜
53. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (2)
Alternative identity
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) =
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
¯ ¯
suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and
the ratio estimate
T
1 ¯ ¯ ¯
π0 (ψ (t) ) π1 (ψ (t) |θ(t) )
T
t=1
Resulting unbiased estimate:
(t) , ψ (t) ) T ¯
1 t π1 (θ0 |x, z
˜ 1 π0 (ψ (t) )
B01 = ¯ ¯
T π1 (θ0 ) T
t=1
π1 (ψ (t) |θ(t) )
54. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (2)
Alternative identity
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) =
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
¯ ¯
suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and
the ratio estimate
T
1 ¯ ¯ ¯
π0 (ψ (t) ) π1 (ψ (t) |θ(t) )
T
t=1
Resulting unbiased estimate:
(t) , ψ (t) ) T ¯
1 t π1 (θ0 |x, z
˜ 1 π0 (ψ (t) )
B01 = ¯ ¯
T π1 (θ0 ) T
t=1
π1 (ψ (t) |θ(t) )
55. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Difference with Verdinelli–Wasserman representation
The above leads to the representation
π1 (θ0 |x) π1 (θ,ψ|x) π0 (ψ)
˜
B01 = E
π1 (θ0 ) π1 (ψ|θ)
shows how our approach differs from Verdinelli and Wasserman’s
π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
B01 = E
π1 (θ0 ) π1 (ψ|θ0 )
[for referees only!!]
56. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Diabetes in Pima Indian women (cont’d)
Comparison of the variation of the Bayes factor approximations
based on 100 replicas for 20, 000 simulations for a simulation from
the above importance, Chib’s, Savage–Dickey’s and bridge
samplers
q
3.4
3.2
3.0
2.8
q
IS Chib Savage−Dickey Bridge