SlideShare una empresa de Scribd logo
1 de 56
Descargar para leer sin conexión
On resolving the Savage–Dickey paradox




                   On resolving the Savage–Dickey paradox

                                         Christian P. Robert

                                 Universit´ Paris Dauphine & CREST-INSEE
                                          e
                                 http://www.ceremade.dauphine.fr/~xian


                                          Frontiers...
                                  San Antonio, March 19, 2010
                                  Joint work with J.-M. Marin
On resolving the Savage–Dickey paradox




Outline



      1    Importance sampling solutions compared

      2    The Savage–Dickey ratio
On resolving the Savage–Dickey paradox




Outline


      1    Importance sampling solutions compared

      2    The Savage–Dickey ratio




                                         Happy B’day, Jim!!!
On resolving the Savage–Dickey paradox




Evidence



      Bayesian model choice and hypothesis testing relies on a similar
      quantity, the evidence

                        Zk =             πk (θk )Lk (θk ) dθk ,   k = 1, 2, . . .
                                    Θk

      aka the marginal likelihood.
                                                                           [Jeffreys, 1939]
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared




Importance sampling solutions


      1    Importance sampling solutions compared
             Regular importance
             Bridge sampling
             Harmonic means
             Chib’s representation

      2    The Savage–Dickey ratio
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Bayes factor approximation

      When approximating the Bayes factor

                                                    f0 (x|θ0 )π0 (θ0 )dθ0
                                               Θ0
                                     B01 =
                                                    f1 (x|θ1 )π1 (θ1 )dθ1
                                               Θ1

      use of importance functions ϕ0 and ϕ1 and

                                         n−1
                                          0
                                               n0         i       i        i
                                               i=1 f0 (x|θ0 )π0 (θ0 )/ϕ0 (θ0 )
                            B01 =
                                         n−1
                                          1
                                               n1         i       i        i
                                               i=1 f1 (x|θ1 )π1 (θ1 )/ϕ1 (θ1 )
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Probit modelling on Pima Indian women
      Example (R benchmark)
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


      Probability of diabetes function of above variables
                              P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:
                                           β ∼ N3 (0, n XT X)−1
      Use of importance function inspired from the MLE estimate
      distribution
                                         ˆ ˆ
                                β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Probit modelling on Pima Indian women
      Example (R benchmark)
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


      Probability of diabetes function of above variables
                              P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:
                                           β ∼ N3 (0, n XT X)−1
      Use of importance function inspired from the MLE estimate
      distribution
                                         ˆ ˆ
                                β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Probit modelling on Pima Indian women
      Example (R benchmark)
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


      Probability of diabetes function of above variables
                              P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:
                                           β ∼ N3 (0, n XT X)−1
      Use of importance function inspired from the MLE estimate
      distribution
                                         ˆ ˆ
                                β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Probit modelling on Pima Indian women
      Example (R benchmark)
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


      Probability of diabetes function of above variables
                              P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:
                                           β ∼ N3 (0, n XT X)−1
      Use of importance function inspired from the MLE estimate
      distribution
                                         ˆ ˆ
                                β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Diabetes in Pima Indian women
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations from the prior and
      the above MLE importance sampler
                                     5
                                     4




                                                                 q
                                     3
                                     2




                                           Monte Carlo   Importance sampling
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Bridge sampling

      General identity:

                                         π2 (θ|x)α(θ)π1 (θ|x)dθ
                                         ˜
                  B12 =                                                 ∀ α(·)
                                         π1 (θ|x)α(θ)π2 (θ|x)dθ
                                         ˜

                                           n1
                                   1
                                                 π2 (θ1i |x)α(θ1i )
                                                 ˜
                                   n1
                                           i=1
                            ≈               n2                        θji ∼ πj (θ|x)
                                   1
                                                 π1 (θ2i |x)α(θ2i )
                                                 ˜
                                   n2
                                           i=1

                                                                                   Back later!
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Optimal bridge sampling

      The optimal choice of auxiliary function is
                                                           1
                                     α ∝
                                               n1 π1 (θ|x) + n2 π2 (θ|x)

      leading to
                                              n1
                                         1                    π2 (θ1i |x)
                                                               ˜
                                         n1         n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
                                              i=1
                           B12 ≈               n2
                                         1                    π1 (θ2i |x)
                                                               ˜
                                         n2         n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
                                              i=1
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Extension to varying dimensions

      When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
      pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
      joint distribution
                             π1 (θ1 |x) × ω(ψ|θ1 , x)
      on Θ2 so that

                               π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
                               ˜
                B12 =
                              π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜

                                  π1 (θ1 )ω(ψ|θ1 )
                                  ˜                  Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                                         π
                       = Eπ2                       =
                                     π2 (θ1 , ψ)
                                     ˜                 Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                            π

        for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Extension to varying dimensions

      When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
      pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
      joint distribution
                             π1 (θ1 |x) × ω(ψ|θ1 , x)
      on Θ2 so that

                               π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
                               ˜
                B12 =
                              π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜

                                  π1 (θ1 )ω(ψ|θ1 )
                                  ˜                  Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                                         π
                       = Eπ2                       =
                                     π2 (θ1 , ψ)
                                     ˜                 Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                            π

        for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Illustration for the Pima Indian dataset




      Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
      pseudo-posterior and mixture of both MLE approximations on β3
      in bridge sampling estimate
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Illustration for the Pima Indian dataset

      Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
      pseudo-posterior and mixture of both MLE approximations on β3
      in bridge sampling estimate
                                         5
                                         4




                                                    q


                                                           q
                                         3




                                                    q
                                         2




                                             MC   Bridge   IS
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



The original harmonic mean estimator



      When θki ∼ πk (θ|x),
                                               T
                                           1            1
                                           T         L(θkt |x)
                                               t=1

      is an unbiased estimator of 1/mk (x)
                                                                 [Newton & Raftery, 1994]

      Highly dangerous: Most often leads to an infinite variance!!!
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



The original harmonic mean estimator



      When θki ∼ πk (θ|x),
                                               T
                                           1            1
                                           T         L(θkt |x)
                                               t=1

      is an unbiased estimator of 1/mk (x)
                                                                 [Newton & Raftery, 1994]

      Highly dangerous: Most often leads to an infinite variance!!!
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



“The Worst Monte Carlo Method Ever”


      “The good news is that the Law of Large Numbers guarantees that this
      estimator is consistent ie, it will very likely be very close to the correct
      answer if you use a sufficiently large number of points from the posterior
      distribution.
      The bad news is that the number of points required for this estimator to
      get close to the right answer will often be greater than the number of
      atoms in the observable universe. The even worse news is that it’s easy
      for people to not realize this, and to na¨  ıvely accept estimates that are
      nowhere close to the correct value of the marginal likelihood.”
                                           [Radford Neal’s blog, Aug. 23, 2008]
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                     ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                         dθk =
                πk (θk )Lk (θk )           πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                     ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                         dθk =
                πk (θk )Lk (θk )           πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                               T               (t)
                                           1             ϕ(θk )
                                 Z1k = 1                 (t)         (t)
                                           T         πk (θk )Lk (θk )
                                               t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                               T               (t)
                                           1             ϕ(θk )
                                 Z1k = 1                 (t)         (t)
                                           T         πk (θk )Lk (θk )
                                               t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



HPD indicator as ϕ


      Use the convex hull of MCMC simulations corresponding to the
      10% HPD region (easily derived!) and ϕ as indicator:



                    10
       ϕ(θ) =                      Id(θ,θ(t) )≤
                    T
                          t∈HPD
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



Diabetes in Pima Indian women (cont’d)

      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above harmonic mean sampler and importance samplers
                                         3.116
                                                       q
                                         3.114
                                         3.112
                                         3.110
                                         3.108
                                         3.106
                                         3.104




                                                                         q
                                         3.102




                                                 Harmonic mean   Importance sampling
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                          ∗       ∗
                                                  fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                       .
                                                       ˆ ∗
                                                      πk (θk |x)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                          ∗       ∗
                                                  fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                       .
                                                       ˆ ∗
                                                      πk (θk |x)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Case of latent variables



      For missing variable z as in mixture models, natural Rao-Blackwell
      estimate
                                       T
                            ∗       1          ∗      (t)
                       πk (θk |x) =       πk (θk |x, zk ) ,
                                    T
                                           t=1
                             (t)
      where the zk ’s are Gibbs sampled latent variables
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Compensation for label switching
                                           (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                             1
                       πk (θk |x) = πk (σ(θk )|x) =                     πk (σ(θk )|x)
                                                             k!
                                                                  σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                         T
                                  ∗           1                       ∗       (t)
                             πk (θk |x) =                      πk (σ(θk )|x, zk ) .
                                             T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Compensation for label switching
                                           (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                             1
                       πk (θk |x) = πk (σ(θk )|x) =                     πk (σ(θk )|x)
                                                             k!
                                                                  σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                         T
                                  ∗           1                       ∗       (t)
                             πk (θk |x) =                      πk (σ(θk )|x, zk ) .
                                             T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Case of the probit model
      For the completion by z,
                                                             1
                                         π (θ|x) =
                                         ˆ                              π(θ|x, z (t) )
                                                             T      t

      is a simple average of normal densities

                                                                                 q
                                           0.0255




                                                         q
                                                         q                       q
                                                         q
                                           0.0250
                                           0.0245
                                           0.0240




                                                         q
                                                         q



                                                    Chib's method       importance sampling
On resolving the Savage–Dickey paradox
      The Savage–Dickey ratio




1    Importance sampling solutions compared

2    The Savage–Dickey ratio
       Measure-theoretic aspects
       Computational implications
On resolving the Savage–Dickey paradox
      The Savage–Dickey ratio




1    Importance sampling solutions compared

2    The Savage–Dickey ratio
       Measure-theoretic aspects
       Computational implications
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



The Savage–Dickey ratio representation




      Special representation of the Bayes factor used for simulation

      Original version (Dickey, AoMS, 1971)
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



The Savage–Dickey ratio representation


      Special representation of the Bayes factor used for simulation

      Original version (Dickey, AoMS, 1971)
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Savage’s density ratio theorem
      Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance
      parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that

                                            π1 (ψ|θ0 ) = π0 (ψ)

      then
                                                    π1 (θ0 |x)
                                            B01 =              ,
                                                     π1 (θ0 )
      with the obvious notations

                 π1 (θ) =           π1 (θ, ψ)dψ ,   π1 (θ|x) =     π1 (θ, ψ|x)dψ ,

                                         [Dickey, 1971; Verdinelli & Wasserman, 1995]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Rephrased
      “Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ),

                                       Z
                            f0 (x) =       f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) ,


      i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average
      of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model.
      Applying Bayes’ theorem to the right-hand side of [the above] we get

                                                                        ‹
                                               f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 )


      and hence the Bayes factor is given by

                                                     ‹                   ‹
                                           B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) .



      the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”

                                                                          [O’Hagan & Forster, 1996]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Rephrased
      “Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ),

                                       Z
                            f0 (x) =       f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) ,


      i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average
      of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model.
      Applying Bayes’ theorem to the right-hand side of [the above] we get

                                                                        ‹
                                               f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 )


      and hence the Bayes factor is given by

                                                     ‹                   ‹
                                           B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) .



      the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”

                                                                          [O’Hagan & Forster, 1996]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Measure-theoretic difficulty
      Representation depends on the choice of versions of conditional
      densities:
                       π0 (ψ)f (x|θ0 , ψ) dψ
      B01 =                                             [by definition]
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ
                     π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
              =                                         [specific version of π1 (ψ|θ0 )
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
                                                       and arbitrary version of π1 (θ0 )]
                  π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
              =                                        [specific version of π1 (θ0 , ψ)]
                        m1 (x)π1 (θ0 )
                π1 (θ0 |x)
              =                                        [version dependent]
                 π1 (θ0 )
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Measure-theoretic difficulty
      Representation depends on the choice of versions of conditional
      densities:
                       π0 (ψ)f (x|θ0 , ψ) dψ
      B01 =                                             [by definition]
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ
                     π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
              =                                         [specific version of π1 (ψ|θ0 )
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
                                                       and arbitrary version of π1 (θ0 )]
                  π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
              =                                        [specific version of π1 (θ0 , ψ)]
                        m1 (x)π1 (θ0 )
                π1 (θ0 |x)
              =                                        [version dependent]
                 π1 (θ0 )
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Choice of density version



                       c Dickey’s (1971) condition is not a condition:
      If
                                  π1 (θ0 |x)     π0 (ψ)f (x|θ0 , ψ) dψ
                                             =
                                   π1 (θ0 )            m1 (x)
      is chosen as a version, then Savage–Dickey’s representation holds
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Choice of density version



                       c Dickey’s (1971) condition is not a condition:
      If
                                  π1 (θ0 |x)     π0 (ψ)f (x|θ0 , ψ) dψ
                                             =
                                   π1 (θ0 )            m1 (x)
      is chosen as a version, then Savage–Dickey’s representation holds
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Savage–Dickey paradox


      Verdinelli-Wasserman extension:
                                         π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                  E
                                          π1 (θ0 )                 π1 (ψ|θ0 )

      similarly depends on choices of versions...
      ...but Monte Carlo implementation relies on specific versions of all
      densities without making mention of it
                                          [Chen, Shao & Ibrahim, 2000]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Savage–Dickey paradox


      Verdinelli-Wasserman extension:
                                         π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                  E
                                          π1 (θ0 )                 π1 (ψ|θ0 )

      similarly depends on choices of versions...
      ...but Monte Carlo implementation relies on specific versions of all
      densities without making mention of it
                                          [Chen, Shao & Ibrahim, 2000]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



A computational exploitation
      Starting from the (instrumental) prior

                                         π1 (θ, ψ) = π1 (θ)π0 (ψ)
                                         ˜

      define the associated posterior

                            π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
                            ˜                                    ˜

      and impose the choice of version

                                  π1 (θ0 |x)
                                  ˜               π0 (ψ)f (x|θ0 , ψ) dψ
                                             =
                                   π0 (θ0 )             m1 (x)
                                                        ˜

      Then
                                                 π1 (θ0 |x) m1 (x)
                                                 ˜          ˜
                                         B01 =
                                                  π1 (θ0 ) m1 (x)
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



A computational exploitation
      Starting from the (instrumental) prior

                                         π1 (θ, ψ) = π1 (θ)π0 (ψ)
                                         ˜

      define the associated posterior

                            π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
                            ˜                                    ˜

      and impose the choice of version

                                  π1 (θ0 |x)
                                  ˜               π0 (ψ)f (x|θ0 , ψ) dψ
                                             =
                                   π0 (θ0 )             m1 (x)
                                                        ˜

      Then
                                                 π1 (θ0 |x) m1 (x)
                                                 ˜          ˜
                                         B01 =
                                                  π1 (θ0 ) m1 (x)
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



First ratio


      If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then
                                                      ˜
                                         1
                                                    π1 (θ0 |x, ψ (t) )
                                                    ˜
                                         T    t

      converges to π1 (θ0 |x) provided the right version is used in θ0
                   ˜

                                                      π1 (θ0 )f (x|θ0 , ψ)
                                  π1 (θ0 |x, ψ) =
                                  ˜
                                                      π1 (θ)f (x|θ, ψ) dθ
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Rao–Blackwellisation with latent variables
      When π1 (θ0 |x, ψ) unavailable, replace with
           ˜
                                             T
                                         1
                                                   π1 (θ0 |x, z (t) , ψ (t) )
                                                   ˜
                                         T
                                             t=1

      via data completion by latent variable z such that

                                     f (x|θ, ψ) =          ˜
                                                           f (x, z|θ, ψ) dz

                                            ˜
      and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
                ˜
      form, including the normalising constant, based on version

                            π1 (θ0 |x, z, ψ)
                            ˜                                ˜
                                                            f (x, z|θ0 , ψ)
                                             =                                  .
                                π1 (θ0 )                 ˜
                                                         f (x, z|θ, ψ)π1 (θ) dθ
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Rao–Blackwellisation with latent variables
      When π1 (θ0 |x, ψ) unavailable, replace with
           ˜
                                             T
                                         1
                                                   π1 (θ0 |x, z (t) , ψ (t) )
                                                   ˜
                                         T
                                             t=1

      via data completion by latent variable z such that

                                     f (x|θ, ψ) =          ˜
                                                           f (x, z|θ, ψ) dz

                                            ˜
      and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
                ˜
      form, including the normalising constant, based on version

                            π1 (θ0 |x, z, ψ)
                            ˜                                ˜
                                                            f (x, z|θ0 , ψ)
                                             =                                  .
                                π1 (θ0 )                 ˜
                                                         f (x, z|θ, ψ)π1 (θ) dθ
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Bridge revival (1)

      Since m1 (x)/m1 (x) is unknown, apparent failure!
            ˜
      Use of the bridge identity

                          π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
      Eπ1 (θ,ψ|x)
       ˜
                                                = Eπ1 (θ,ψ|x)
                                                   ˜
                                                                       =
                         π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                         ˜

      to (biasedly) estimate m1 (x)/m1 (x) by
                             ˜
                                             T
                                                   π1 (ψ (t) |θ(t) )
                                         T
                                             t=1
                                                     π0 (ψ (t) )

      based on the same sample from π1 .
                                    ˜
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Bridge revival (1)

      Since m1 (x)/m1 (x) is unknown, apparent failure!
            ˜
      Use of the bridge identity

                          π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
      Eπ1 (θ,ψ|x)
       ˜
                                                = Eπ1 (θ,ψ|x)
                                                   ˜
                                                                       =
                         π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                         ˜

      to (biasedly) estimate m1 (x)/m1 (x) by
                             ˜
                                             T
                                                   π1 (ψ (t) |θ(t) )
                                         T
                                             t=1
                                                     π0 (ψ (t) )

      based on the same sample from π1 .
                                    ˜
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Bridge revival (2)
      Alternative identity

                         π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                         ˜
      Eπ1 (θ,ψ|x)                               = Eπ1 (θ,ψ|x)          =
                          π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)

                                      ¯ ¯
      suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and
      the ratio estimate
                                             T
                                         1             ¯           ¯ ¯
                                                   π0 (ψ (t) ) π1 (ψ (t) |θ(t) )
                                         T
                                             t=1

      Resulting unbiased estimate:

                                                          (t) , ψ (t) )       T            ¯
                                  1      t π1 (θ0 |x, z
                                           ˜                              1           π0 (ψ (t) )
                   B01 =                                                                ¯ ¯
                                  T              π1 (θ0 )                 T
                                                                              t=1
                                                                                    π1 (ψ (t) |θ(t) )
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Bridge revival (2)
      Alternative identity

                         π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                         ˜
      Eπ1 (θ,ψ|x)                               = Eπ1 (θ,ψ|x)          =
                          π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)

                                      ¯ ¯
      suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and
      the ratio estimate
                                             T
                                         1             ¯           ¯ ¯
                                                   π0 (ψ (t) ) π1 (ψ (t) |θ(t) )
                                         T
                                             t=1

      Resulting unbiased estimate:

                                                          (t) , ψ (t) )       T            ¯
                                  1      t π1 (θ0 |x, z
                                           ˜                              1           π0 (ψ (t) )
                   B01 =                                                                ¯ ¯
                                  T              π1 (θ0 )                 T
                                                                              t=1
                                                                                    π1 (ψ (t) |θ(t) )
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Difference with Verdinelli–Wasserman representation


      The above leads to the representation

                                          π1 (θ0 |x) π1 (θ,ψ|x) π0 (ψ)
                                          ˜
                                  B01 =             E
                                           π1 (θ0 )             π1 (ψ|θ)

      shows how our approach differs from Verdinelli and Wasserman’s

                                         π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                  E
                                          π1 (θ0 )                 π1 (ψ|θ0 )

                                                                        [for referees only!!]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above importance, Chib’s, Savage–Dickey’s and bridge
      samplers

                                                                             q
                                         3.4
                                         3.2
                                         3.0
                                         2.8




                                                                             q



                                               IS   Chib   Savage−Dickey   Bridge

Más contenido relacionado

La actualidad más candente

Tele3113 wk1wed
Tele3113 wk1wedTele3113 wk1wed
Tele3113 wk1wedVin Voro
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Christian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
BlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed modelBlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed modelKyusonLim
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Christian Robert
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measuresJulyan Arbel
 
Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschChristian Robert
 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1KyusonLim
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapterChristian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 

La actualidad más candente (20)

Nested sampling
Nested samplingNested sampling
Nested sampling
 
ABC model choice
ABC model choiceABC model choice
ABC model choice
 
Tele3113 wk1wed
Tele3113 wk1wedTele3113 wk1wed
Tele3113 wk1wed
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Edinburgh, Bayes-250
Edinburgh, Bayes-250Edinburgh, Bayes-250
Edinburgh, Bayes-250
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
12 - Overview
12 - Overview12 - Overview
12 - Overview
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
BlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed modelBlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed model
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
 
Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi Künsch
 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 

Similar a Savage-Dickey paradox

CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
Robustness under Independent Contamination Model
Robustness under Independent Contamination ModelRobustness under Independent Contamination Model
Robustness under Independent Contamination Modelrusmike
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Alex (Oleksiy) Varfolomiyev
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2zukun
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Deb Roy
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2Robin Ryder
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingSSA KPI
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slideszukun
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesPierre Jacob
 
Integration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingIntegration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingUSC
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfgrssieee
 
Luca Pozzi 5thBCC 2012
Luca Pozzi 5thBCC 2012Luca Pozzi 5thBCC 2012
Luca Pozzi 5thBCC 2012Luca Pozzi
 
Pr 2-bayesian decision
Pr 2-bayesian decisionPr 2-bayesian decision
Pr 2-bayesian decisionshivamsoni123
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論岳華 杜
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 

Similar a Savage-Dickey paradox (20)

CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Robustness under Independent Contamination Model
Robustness under Independent Contamination ModelRobustness under Independent Contamination Model
Robustness under Independent Contamination Model
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
T tests anovas and regression
T tests anovas and regressionT tests anovas and regression
T tests anovas and regression
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
Integration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingIntegration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modeling
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
 
Luca Pozzi 5thBCC 2012
Luca Pozzi 5thBCC 2012Luca Pozzi 5thBCC 2012
Luca Pozzi 5thBCC 2012
 
Pr 2-bayesian decision
Pr 2-bayesian decisionPr 2-bayesian decision
Pr 2-bayesian decision
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 

Más de Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 

Más de Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Último

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 

Último (20)

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 

Savage-Dickey paradox

  • 1. On resolving the Savage–Dickey paradox On resolving the Savage–Dickey paradox Christian P. Robert Universit´ Paris Dauphine & CREST-INSEE e http://www.ceremade.dauphine.fr/~xian Frontiers... San Antonio, March 19, 2010 Joint work with J.-M. Marin
  • 2. On resolving the Savage–Dickey paradox Outline 1 Importance sampling solutions compared 2 The Savage–Dickey ratio
  • 3. On resolving the Savage–Dickey paradox Outline 1 Importance sampling solutions compared 2 The Savage–Dickey ratio Happy B’day, Jim!!!
  • 4. On resolving the Savage–Dickey paradox Evidence Bayesian model choice and hypothesis testing relies on a similar quantity, the evidence Zk = πk (θk )Lk (θk ) dθk , k = 1, 2, . . . Θk aka the marginal likelihood. [Jeffreys, 1939]
  • 5. On resolving the Savage–Dickey paradox Importance sampling solutions compared Importance sampling solutions 1 Importance sampling solutions compared Regular importance Bridge sampling Harmonic means Chib’s representation 2 The Savage–Dickey ratio
  • 6. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions ϕ0 and ϕ1 and n−1 0 n0 i i i i=1 f0 (x|θ0 )π0 (θ0 )/ϕ0 (θ0 ) B01 = n−1 1 n1 i i i i=1 f1 (x|θ1 )π1 (θ1 )/ϕ1 (θ1 )
  • 7. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 8. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 9. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 10. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 11. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Diabetes in Pima Indian women Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations from the prior and the above MLE importance sampler 5 4 q 3 2 Monte Carlo Importance sampling
  • 12. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Bridge sampling General identity: π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1 Back later!
  • 13. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Optimal bridge sampling The optimal choice of auxiliary function is 1 α ∝ n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1
  • 14. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ2 = π2 (θ1 , ψ) ˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
  • 15. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ2 = π2 (θ1 , ψ) ˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
  • 16. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate
  • 17. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate 5 4 q q 3 q 2 MC Bridge IS
  • 18. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 19. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 20. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to na¨ ıvely accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 21. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 22. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 23. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 24. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 25. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means HPD indicator as ϕ Use the convex hull of MCMC simulations corresponding to the 10% HPD region (easily derived!) and ϕ as indicator: 10 ϕ(θ) = Id(θ,θ(t) )≤ T t∈HPD
  • 26. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above harmonic mean sampler and importance samplers 3.116 q 3.114 3.112 3.110 3.108 3.106 3.104 q 3.102 Harmonic mean Importance sampling
  • 27. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 28. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 29. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables
  • 30. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 31. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 32. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities q 0.0255 q q q q 0.0250 0.0245 0.0240 q q Chib's method importance sampling
  • 33. On resolving the Savage–Dickey paradox The Savage–Dickey ratio 1 Importance sampling solutions compared 2 The Savage–Dickey ratio Measure-theoretic aspects Computational implications
  • 34. On resolving the Savage–Dickey paradox The Savage–Dickey ratio 1 Importance sampling solutions compared 2 The Savage–Dickey ratio Measure-theoretic aspects Computational implications
  • 35. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects The Savage–Dickey ratio representation Special representation of the Bayes factor used for simulation Original version (Dickey, AoMS, 1971)
  • 36. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects The Savage–Dickey ratio representation Special representation of the Bayes factor used for simulation Original version (Dickey, AoMS, 1971)
  • 37. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Savage’s density ratio theorem Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that π1 (ψ|θ0 ) = π0 (ψ) then π1 (θ0 |x) B01 = , π1 (θ0 ) with the obvious notations π1 (θ) = π1 (θ, ψ)dψ , π1 (θ|x) = π1 (θ, ψ|x)dψ , [Dickey, 1971; Verdinelli & Wasserman, 1995]
  • 38. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Rephrased “Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ), Z f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) , i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model. Applying Bayes’ theorem to the right-hand side of [the above] we get ‹ f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 ) and hence the Bayes factor is given by ‹ ‹ B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) . the ratio of the posterior to prior densities at φ = φ0 under the augmented model.” [O’Hagan & Forster, 1996]
  • 39. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Rephrased “Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ), Z f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) , i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model. Applying Bayes’ theorem to the right-hand side of [the above] we get ‹ f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 ) and hence the Bayes factor is given by ‹ ‹ B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) . the ratio of the posterior to prior densities at φ = φ0 under the augmented model.” [O’Hagan & Forster, 1996]
  • 40. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Measure-theoretic difficulty Representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 ) π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) and arbitrary version of π1 (θ0 )] π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = [version dependent] π1 (θ0 )
  • 41. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Measure-theoretic difficulty Representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 ) π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) and arbitrary version of π1 (θ0 )] π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = [version dependent] π1 (θ0 )
  • 42. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Choice of density version c Dickey’s (1971) condition is not a condition: If π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ = π1 (θ0 ) m1 (x) is chosen as a version, then Savage–Dickey’s representation holds
  • 43. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Choice of density version c Dickey’s (1971) condition is not a condition: If π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ = π1 (θ0 ) m1 (x) is chosen as a version, then Savage–Dickey’s representation holds
  • 44. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Savage–Dickey paradox Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) similarly depends on choices of versions... ...but Monte Carlo implementation relies on specific versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 45. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Savage–Dickey paradox Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) similarly depends on choices of versions... ...but Monte Carlo implementation relies on specific versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 46. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications A computational exploitation Starting from the (instrumental) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose the choice of version π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 47. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications A computational exploitation Starting from the (instrumental) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose the choice of version π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 48. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications First ratio If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then ˜ 1 π1 (θ0 |x, ψ (t) ) ˜ T t converges to π1 (θ0 |x) provided the right version is used in θ0 ˜ π1 (θ0 )f (x|θ0 , ψ) π1 (θ0 |x, ψ) = ˜ π1 (θ)f (x|θ, ψ) dθ
  • 49. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Rao–Blackwellisation with latent variables When π1 (θ0 |x, ψ) unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1 via data completion by latent variable z such that f (x|θ, ψ) = ˜ f (x, z|θ, ψ) dz ˜ and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed ˜ form, including the normalising constant, based on version π1 (θ0 |x, z, ψ) ˜ ˜ f (x, z|θ0 , ψ) = . π1 (θ0 ) ˜ f (x, z|θ, ψ)π1 (θ) dθ
  • 50. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Rao–Blackwellisation with latent variables When π1 (θ0 |x, ψ) unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1 via data completion by latent variable z such that f (x|θ, ψ) = ˜ f (x, z|θ, ψ) dz ˜ and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed ˜ form, including the normalising constant, based on version π1 (θ0 |x, z, ψ) ˜ ˜ f (x, z|θ0 , ψ) = . π1 (θ0 ) ˜ f (x, z|θ, ψ)π1 (θ) dθ
  • 51. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the bridge identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 52. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the bridge identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 53. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and the ratio estimate T 1 ¯ ¯ ¯ π0 (ψ (t) ) π1 (ψ (t) |θ(t) ) T t=1 Resulting unbiased estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = ¯ ¯ T π1 (θ0 ) T t=1 π1 (ψ (t) |θ(t) )
  • 54. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and the ratio estimate T 1 ¯ ¯ ¯ π0 (ψ (t) ) π1 (ψ (t) |θ(t) ) T t=1 Resulting unbiased estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = ¯ ¯ T π1 (θ0 ) T t=1 π1 (ψ (t) |θ(t) )
  • 55. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Difference with Verdinelli–Wasserman representation The above leads to the representation π1 (θ0 |x) π1 (θ,ψ|x) π0 (ψ) ˜ B01 = E π1 (θ0 ) π1 (ψ|θ) shows how our approach differs from Verdinelli and Wasserman’s π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) [for referees only!!]
  • 56. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above importance, Chib’s, Savage–Dickey’s and bridge samplers q 3.4 3.2 3.0 2.8 q IS Chib Savage−Dickey Bridge