SlideShare una empresa de Scribd logo
1 de 129
Descargar para leer sin conexión
How approximate is Approximate Bayesian
            Computation?

              Christian P. Robert
      ISBA IWCBTA, Varanasi, Jan. 9, 2013
      Joint work with J.-M. Cornuet, J.-M. Marin,
  K.L. Mengersen, N. Pillai, P. Pudlo and J. Rousseau
Advertisment



   MCMSki IV to be held in Chamonix Mt Blanc, France, from
   Monday, Jan. 6 to Wed., Jan. 8, 2014
   All aspects of MCMC++ theory and methodology
   Parallel (invited and contributed) sessions: call for proposals on
   website http://www.pages.drexel.edu/ mwl25/mcmski/
Outline




Unavailable likelihoods

ABC methods

ABC as an inference machine

ABCel
Intractable likelihood



   Case of a well-defined statistical model where the likelihood
   function
                         (θ|y) = f (y1 , . . . , yn |θ)


       is (really!) not available in closed form
       can (easily!) be neither completed nor demarginalised
       cannot be estimated by an unbiased estimator
    c Prohibits direct implementation of a generic MCMC algorithm
   like Metropolis–Hastings
Intractable likelihood



   Case of a well-defined statistical model where the likelihood
   function
                         (θ|y) = f (y1 , . . . , yn |θ)


       is (really!) not available in closed form
       can (easily!) be neither completed nor demarginalised
       cannot be estimated by an unbiased estimator
    c Prohibits direct implementation of a generic MCMC algorithm
   like Metropolis–Hastings
The abc alternative




   Approximations to the original B problem
       Degrading the precision down to a tolerance ε
       Replacing the likelihood with a non-parametric approximation
       Summarising the data with insufficient statistics
The abc alternative




   Approximations to the original B problem
       Degrading the precision down to a tolerance ε
       Replacing the likelihood with a non-parametric approximation
       Summarising the data with insufficient statistics
The abc alternative




   Approximations to the original B problem
       Degrading the precision down to a tolerance ε
       Replacing the likelihood with a non-parametric approximation
       Summarising the data with insufficient statistics
Different worries about abc



   Impact on B inference
       a mere computational issue (that will eventually end up being
       solved by more powerful computers, &tc, even if too costly in
       the short term, as for regular Monte Carlo methods)
       an inferential issue (opening opportunities for new inference
       machine, with legitimity different than for classical B
       approach)
       a Bayesian conundrum (how closely related to the/a B
       approach?)
Different worries about abc



   Impact on B inference
       a mere computational issue (that will eventually end up being
       solved by more powerful computers, &tc, even if too costly in
       the short term, as for regular Monte Carlo methods)
       an inferential issue (opening opportunities for new inference
       machine, with legitimity different than for classical B
       approach)
       a Bayesian conundrum (how closely related to the/a B
       approach?)
Different worries about abc



   Impact on B inference
       a mere computational issue (that will eventually end up being
       solved by more powerful computers, &tc, even if too costly in
       the short term, as for regular Monte Carlo methods)
       an inferential issue (opening opportunities for new inference
       machine, with legitimity different than for classical B
       approach)
       a Bayesian conundrum (how closely related to the/a B
       approach?)
Econom’ections


   Similar exploration of simulation-based and approximation
   techniques in Econometrics
       Simulated method of moments
       Method of simulated moments
       Simulated pseudo-maximum-likelihood
       Indirect inference
                                       [Gouri´roux & Monfort, 1996]
                                             e

   even though motivation is partly-defined models rather than
   complex likelihoods
Econom’ections


   Similar exploration of simulation-based and approximation
   techniques in Econometrics
       Simulated method of moments
       Method of simulated moments
       Simulated pseudo-maximum-likelihood
       Indirect inference
                                       [Gouri´roux & Monfort, 1996]
                                             e

   even though motivation is partly-defined models rather than
   complex likelihoods
Indirect inference




                                                 ^
   Minimise [in θ] a distance between estimators β based on a
   pseudo-model for genuine observations and for observations
   simulated under the true model and the parameter θ.

                             [Gouri´roux, Monfort, & Renault, 1993;
                                   e
                             Smith, 1993; Gallant & Tauchen, 1996]
Indirect inference (PML vs. PSE)


   Example of the pseudo-maximum-likelihood (PML)

                ^
                β(y) = arg max          log f (yt |β, y1:(t−1) )
                               β
                                    t


   leading to

                arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
                          ^        ^
                     θ

   when
                    ys (θ) ∼ f (y|θ)        s = 1, . . . , S
Indirect inference (PML vs. PSE)


   Example of the pseudo-score-estimator (PSE)
                                                                      2
                                         ∂ log f
                ^
                β(y) = arg min                   (yt |β, y1:(t−1) )
                              β
                                     t
                                            ∂β

   leading to

                   arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
                             ^        ^
                        θ

   when
                       ys (θ) ∼ f (y|θ)        s = 1, . . . , S
Consistent indirect inference



           “...in order to get a unique solution the dimension of
       the auxiliary parameter β must be larger than or equal to
       the dimension of the initial parameter θ. If the problem is
       just identified the different methods become easier...”

   Consistency depending on the criterion and on the asymptotic
   identifiability of θ
                                [Gouri´roux & Monfort, 1996, p. 66]
                                       e


   Which connection [if any] with the B perspective?
Consistent indirect inference



           “...in order to get a unique solution the dimension of
       the auxiliary parameter β must be larger than or equal to
       the dimension of the initial parameter θ. If the problem is
       just identified the different methods become easier...”

   Consistency depending on the criterion and on the asymptotic
   identifiability of θ
                                [Gouri´roux & Monfort, 1996, p. 66]
                                       e


   Which connection [if any] with the B perspective?
Consistent indirect inference



           “...in order to get a unique solution the dimension of
       the auxiliary parameter β must be larger than or equal to
       the dimension of the initial parameter θ. If the problem is
       just identified the different methods become easier...”

   Consistency depending on the criterion and on the asymptotic
   identifiability of θ
                                [Gouri´roux & Monfort, 1996, p. 66]
                                       e


   Which connection [if any] with the B perspective?
Approximate Bayesian computation



 Unavailable likelihoods

 ABC methods
   Genesis of ABC
   ABC basics
   Advances and interpretations
   ABC as knn

 ABC as an inference machine

 ABCel
Genetic background of ABC



    skip genetics


   ABC is a recent computational technique that only requires being
   able to sample from the likelihood f (·|θ)
   This technique stemmed from population genetics models, about
   15 years ago, and population geneticists still contribute
   significantly to methodological developments of ABC.
                             [Griffith & al., 1997; Tavar´ & al., 1999]
                                                          e
Demo-genetic inference



   Each model is characterized by a set of parameters θ that cover
   historical (time divergence, admixture time ...), demographics
   (population sizes, admixture rates, migration rates, ...) and genetic
   (mutation rate, ...) factors
   The goal is to estimate these parameters from a dataset of
   polymorphism (DNA sample) y observed at the present time

   Problem:
   most of the time, we cannot calculate the likelihood of the
   polymorphism data f (y|θ)...
Demo-genetic inference



   Each model is characterized by a set of parameters θ that cover
   historical (time divergence, admixture time ...), demographics
   (population sizes, admixture rates, migration rates, ...) and genetic
   (mutation rate, ...) factors
   The goal is to estimate these parameters from a dataset of
   polymorphism (DNA sample) y observed at the present time

   Problem:
   most of the time, we cannot calculate the likelihood of the
   polymorphism data f (y|θ)...
Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium



                                     Mutations according to
                                     the Simple stepwise
                                     Mutation Model
                                     (SMM)
                                     • date of the mutations ∼
                                     Poisson process with
                                     intensity θ/2 over the
                                     branches
                                     • MRCA = 100
                                     • independent mutations:
                                     ±1 with pr. 1/2
     Sample of 8 genes
Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
                                     Kingman’s genealogy
                                     When time axis is
                                     normalized,
                                     T (k) ∼ Exp(k(k − 1)/2)

                                     Mutations according to
                                     the Simple stepwise
                                     Mutation Model
                                     (SMM)
                                     • date of the mutations ∼
                                     Poisson process with
                                     intensity θ/2 over the
                                     branches
                                     • MRCA = 100
                                     • independent mutations:
                                     ±1 with pr. 1/2
Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
                                     Kingman’s genealogy
                                     When time axis is
                                     normalized,
                                     T (k) ∼ Exp(k(k − 1)/2)

                                     Mutations according to
                                     the Simple stepwise
                                     Mutation Model
                                     (SMM)
                                     • date of the mutations ∼
                                     Poisson process with
                                     intensity θ/2 over the
                                     branches
                                     • MRCA = 100
                                     • independent mutations:
                                     ±1 with pr. 1/2
Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
                                     Kingman’s genealogy
                                     When time axis is
                                     normalized,
                                     T (k) ∼ Exp(k(k − 1)/2)

                                     Mutations according to
                                     the Simple stepwise
                                     Mutation Model
                                     (SMM)
                                     • date of the mutations ∼
                                     Poisson process with
                                     intensity θ/2 over the
                                     branches
 Observations: leafs of the tree
                                     • MRCA = 100
                  ^
                  θ=?
                                     • independent mutations:
                                     ±1 with pr. 1/2
Much more interesting models. . .

       several independent locus
       Independent gene genealogies and mutations
       different populations
       linked by an evolutionary scenario made of divergences,
       admixtures, migrations between populations, etc.
       larger sample size
       usually between 50 and 100 genes

                                               MRCA
                                                                 τ2
                                                                 τ1


   A typical evolutionary scenario:   POP 0    POP 1   POP 2
Intractable likelihood



   Missing (too missing!) data structure:

                     f (y|θ) =         f (y|G , θ)f (G |θ)dG
                                   G

   cannot be computed in a manageable way...
   The genealogies are considered as nuisance parameters
       This modelling clearly differs from the phylogenetic perspective
       where the tree is the parameter of interest.
Intractable likelihood



   Missing (too missing!) data structure:

                     f (y|θ) =         f (y|G , θ)f (G |θ)dG
                                   G

   cannot be computed in a manageable way...
   The genealogies are considered as nuisance parameters
       This modelling clearly differs from the phylogenetic perspective
       where the tree is the parameter of interest.
A?B?C?




    A stands for approximate
    [wrong likelihood /
    picture]
    B stands for Bayesian
    C stands for computation
    [producing a parameter
    sample]
A?B?C?




    A stands for approximate
    [wrong likelihood /
    picture]
    B stands for Bayesian
    C stands for computation
    [producing a parameter
    sample]
A?B?C?



                                                              ESS=155.6                                                                     ESS=75.93                                                                ESS=76.87




                                                                                                                                                                                     4
                                                                                                          2.0




                                                                                                                                                                                     3
                               Density




                                                                                                Density




                                                                                                                                                                           Density
                                         1.0




                                                                                                                                                                                     2
                                                                                                          1.0




                                                                                                                                                                                     1
                                         0.0




                                                                                                          0.0




                                                                                                                                                                                     0
    A stands for approximate                    −0.5           0.0

                                                                  θ
                                                              ESS=91.54
                                                                             0.5          1.0                               −0.4     −0.2     0.0

                                                                                                                                                θ
                                                                                                                                            ESS=108.4
                                                                                                                                                     0.2       0.4                                       −0.4        −0.2

                                                                                                                                                                                                                         θ
                                                                                                                                                                                                                     ESS=85.13
                                                                                                                                                                                                                                   0.0              0.2




                                         4




                                                                                                          0.0 1.0 2.0 3.0




                                                                                                                                                                                     0.0 1.0 2.0 3.0
                                         3
    [wrong likelihood /




                               Density




                                                                                                Density




                                                                                                                                                                           Density
                                         2
                                         1
                                         0
    picture]                                   −0.6    −0.4     −0.2

                                                                  θ
                                                              ESS=149.1
                                                                             0.0    0.2                                       −0.4            0.0

                                                                                                                                                θ
                                                                                                                                            ESS=96.31
                                                                                                                                                     0.2     0.4     0.6                                  −0.2       0.0

                                                                                                                                                                                                                         θ
                                                                                                                                                                                                                     ESS=83.77
                                                                                                                                                                                                                                 0.2      0.4             0.6




                                                                                                                                                                                     4
                                         2.0




                                                                                                                                                                                     3
                                                                                                          2.0
                               Density




                                                                                                Density




                                                                                                                                                                           Density

                                                                                                                                                                                     2
                                         1.0




                                                                                                          1.0
    B stands for Bayesian




                                                                                                                                                                                     1
                                         0.0




                                                                                                          0.0




                                                                                                                                                                                     0
                                               −0.5           0.0            0.5          1.0                                 −0.4            0.0    0.2     0.4     0.6                               −0.6   −0.4    −0.2        0.0         0.2         0.4

                                                                  θ
                                                              ESS=155.7                                                                         θ
                                                                                                                                            ESS=92.42                                                                    θ
                                                                                                                                                                                                                     ESS=95.01




                                                                                                          0.0 1.0 2.0 3.0
                                         2.0




                                                                                                                                                                                     3.0
    C stands for computation


                               Density




                                                                                                Density




                                                                                                                                                                           Density
                                         1.0




                                                                                                                                                                                     1.5
                                         0.0




                                                                                                                                                                                     0.0
    [producing a parameter                             −0.5

                                                                  θ
                                                                     0.0

                                                              ESS=139.2
                                                                                   0.5                                        −0.4            0.0

                                                                                                                                                θ
                                                                                                                                            ESS=99.33
                                                                                                                                                     0.2     0.4     0.6                                 −0.4

                                                                                                                                                                                                                         θ
                                                                                                                                                                                                                           0.0

                                                                                                                                                                                                                     ESS=87.28
                                                                                                                                                                                                                                   0.2        0.4         0.6




                                         2.0




                                                                                                          0.0 1.0 2.0 3.0




                                                                                                                                                                                     3
    sample]
                               Density




                                                                                                Density




                                                                                                                                                                           Density

                                                                                                                                                                                     2
                                         1.0




                                                                                                                                                                                     1
                                         0.0




                                                                                                                                                                                     0
                                               −0.6      −0.2          0.2         0.6                                         −0.4    −0.2    0.0     0.2     0.4                                        −0.2   0.0        0.2         0.4         0.6
How Bayesian is aBc?



  Could we turn the resolution into a Bayesian answer?
      ideally so (not meaningfull: requires ∞-ly powerful computer
      approximation error unknown (w/o costly simulation)
      true Bayes for wrong model (formal and artificial)
      true Bayes for noisy model (much more convincing)
      true Bayes for estimated likelihood (back to econometrics?)
      illuminating the tension between information and precision
Untractable likelihood



Back to stage zero: what can we do
when a likelihood function f (y|θ) is
well-defined but impossible / too
costly to compute...?
    MCMC cannot be implemented!
    shall we give up Bayesian
    inference altogether?!
Untractable likelihood



Back to stage zero: what can we do
when a likelihood function f (y|θ) is
well-defined but impossible / too
costly to compute...?
    MCMC cannot be implemented!
    shall we give up Bayesian
    inference altogether?!
    or settle for an almost Bayesian
    inference/picture...?
ABC methodology

  Bayesian setting: target is π(θ)f (x|θ)
  When likelihood f (x|θ) not in closed form, likelihood-free rejection
  technique:
  Foundation
  For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
  jointly simulating
                       θ ∼ π(θ) , z ∼ f (z|θ ) ,
  until the auxiliary variable z is equal to the observed value, z = y,
  then the selected
                                θ ∼ π(θ|y)

          [Rubin, 1984; Diggle & Gratton, 1984; Tavar´ et al., 1997]
                                                     e
ABC methodology

  Bayesian setting: target is π(θ)f (x|θ)
  When likelihood f (x|θ) not in closed form, likelihood-free rejection
  technique:
  Foundation
  For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
  jointly simulating
                       θ ∼ π(θ) , z ∼ f (z|θ ) ,
  until the auxiliary variable z is equal to the observed value, z = y,
  then the selected
                                θ ∼ π(θ|y)

          [Rubin, 1984; Diggle & Gratton, 1984; Tavar´ et al., 1997]
                                                     e
ABC methodology

  Bayesian setting: target is π(θ)f (x|θ)
  When likelihood f (x|θ) not in closed form, likelihood-free rejection
  technique:
  Foundation
  For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
  jointly simulating
                       θ ∼ π(θ) , z ∼ f (z|θ ) ,
  until the auxiliary variable z is equal to the observed value, z = y,
  then the selected
                                θ ∼ π(θ|y)

          [Rubin, 1984; Diggle & Gratton, 1984; Tavar´ et al., 1997]
                                                     e
A as A...pproximative



   When y is a continuous random variable, strict equality z = y is
   replaced with a tolerance zone

                              ρ(y, z)

   where ρ is a distance
   Output distributed from
                                     def
               π(θ) Pθ {ρ(y, z) < } ∝ π(θ|ρ(y, z) < )

                                              [Pritchard et al., 1999]
A as A...pproximative



   When y is a continuous random variable, strict equality z = y is
   replaced with a tolerance zone

                              ρ(y, z)

   where ρ is a distance
   Output distributed from
                                     def
               π(θ) Pθ {ρ(y, z) < } ∝ π(θ|ρ(y, z) < )

                                              [Pritchard et al., 1999]
ABC algorithm


  In most implementations, further degree of A...pproximation:

  Algorithm 1 Likelihood-free rejection sampler
    for i = 1 to N do
      repeat
         generate θ from the prior distribution π(·)
         generate z from the likelihood f (·|θ )
      until ρ{η(z), η(y)}
      set θi = θ
    end for

  where η(y) defines a (not necessarily sufficient) statistic
Output


  The likelihood-free algorithm samples from the marginal in z of:

                                   π(θ)f (z|θ)IA ,y (z)
                   π (θ, z|y) =                           ,
                                  A ,y ×Θ π(θ)f (z|θ)dzdθ

  where A   ,y   = {z ∈ D|ρ(η(z), η(y)) < }.
  The idea behind ABC is that the summary statistics coupled with a
  small tolerance should provide a good approximation of the
  posterior distribution:

                    π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .


                                                              ...does it?!
Output


  The likelihood-free algorithm samples from the marginal in z of:

                                   π(θ)f (z|θ)IA ,y (z)
                   π (θ, z|y) =                           ,
                                  A ,y ×Θ π(θ)f (z|θ)dzdθ

  where A   ,y   = {z ∈ D|ρ(η(z), η(y)) < }.
  The idea behind ABC is that the summary statistics coupled with a
  small tolerance should provide a good approximation of the
  posterior distribution:

                    π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .


                                                              ...does it?!
Output


  The likelihood-free algorithm samples from the marginal in z of:

                                   π(θ)f (z|θ)IA ,y (z)
                   π (θ, z|y) =                           ,
                                  A ,y ×Θ π(θ)f (z|θ)dzdθ

  where A   ,y   = {z ∈ D|ρ(η(z), η(y)) < }.
  The idea behind ABC is that the summary statistics coupled with a
  small tolerance should provide a good approximation of the
  posterior distribution:

                    π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .


                                                              ...does it?!
Output

  The likelihood-free algorithm samples from the marginal in z of:

                                        π(θ)f (z|θ)IA ,y (z)
                        π (θ, z|y) =                           ,
                                       A ,y ×Θ π(θ)f (z|θ)dzdθ

  where A       ,y   = {z ∈ D|ρ(η(z), η(y)) < }.
  The idea behind ABC is that the summary statistics coupled with a
  small tolerance should provide a good approximation of the
  restricted posterior distribution:

                        π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .


                                                               Not so good..!
    skip convergence details!
Convergence of ABC


  What happens when                     → 0?
  For B ⊂ Θ, we have

               A         f (z|θ)dz                                       f (z|θ)π(θ)dθ
                    ,y                                                B
                                           π(θ)dθ =                                        dz
   B   A   ,y ×Θ
                 π(θ)f (z|θ)dzdθ                        A   ,y   A   ,y ×Θ
                                                                           π(θ)f (z|θ)dzdθ

                         B   f (z|θ)π(θ)dθ              m(z)
       =                                                               dz
           A   ,y
                                m(z)         A   ,y ×Θ
                                                       π(θ)f (z|θ)dzdθ
                                             m(z)
       =            π(B|z)                                  dz
           A   ,y                 A   ,y ×Θ π(θ)f (z|θ)dzdθ



  which indicates convergence for a continuous π(B|z).
Convergence of ABC


  What happens when                     → 0?
  For B ⊂ Θ, we have

               A         f (z|θ)dz                                       f (z|θ)π(θ)dθ
                    ,y                                                B
                                           π(θ)dθ =                                        dz
   B   A   ,y ×Θ
                 π(θ)f (z|θ)dzdθ                        A   ,y   A   ,y ×Θ
                                                                           π(θ)f (z|θ)dzdθ

                         B   f (z|θ)π(θ)dθ              m(z)
       =                                                               dz
           A   ,y
                                m(z)         A   ,y ×Θ
                                                       π(θ)f (z|θ)dzdθ
                                             m(z)
       =            π(B|z)                                  dz
           A   ,y                 A   ,y ×Θ π(θ)f (z|θ)dzdθ



  which indicates convergence for a continuous π(B|z).
Convergence (do not attempt!)


   ...and the above does not apply to insufficient statistics:
   If η(y) is not a sufficient statistics, the best one can hope for is

                        π(θ|η(y)) ,   not π(θ|y)

   If η(y) is an ancillary statistic, the whole information contained in
   y is lost!, the “best” one can “hope” for is

                            π(θ|η(y)) = π(θ)

                                                               Bummer!!!
Convergence (do not attempt!)


   ...and the above does not apply to insufficient statistics:
   If η(y) is not a sufficient statistics, the best one can hope for is

                        π(θ|η(y)) ,   not π(θ|y)

   If η(y) is an ancillary statistic, the whole information contained in
   y is lost!, the “best” one can “hope” for is

                            π(θ|η(y)) = π(θ)

                                                               Bummer!!!
Convergence (do not attempt!)


   ...and the above does not apply to insufficient statistics:
   If η(y) is not a sufficient statistics, the best one can hope for is

                        π(θ|η(y)) ,   not π(θ|y)

   If η(y) is an ancillary statistic, the whole information contained in
   y is lost!, the “best” one can “hope” for is

                            π(θ|η(y)) = π(θ)

                                                               Bummer!!!
Convergence (do not attempt!)


   ...and the above does not apply to insufficient statistics:
   If η(y) is not a sufficient statistics, the best one can hope for is

                        π(θ|η(y)) ,   not π(θ|y)

   If η(y) is an ancillary statistic, the whole information contained in
   y is lost!, the “best” one can “hope” for is

                            π(θ|η(y)) = π(θ)

                                                               Bummer!!!
Comments




     Role of distance paramount (because     = 0)
     Scaling of components of η(y) is also determinant
       matters little if “small enough”
     representative of “curse of dimensionality”
     small is beautiful!
     the data as a whole may be paradoxically weakly informative
     for ABC
ABC (simul’) advances


                         how approximative is ABC?                ABC as knn

   Simulating from the prior is often poor in efficiency
   Either modify the proposal distribution on θ to increase the density
   of x’s within the vicinity of y ...
        [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

   ...or by viewing the problem as a conditional density estimation
   and by developing techniques to allow for larger
                                               [Beaumont et al., 2002]

   .....or even by including     in the inferential framework [ABCµ ]
                                                     [Ratmann et al., 2009]
ABC (simul’) advances


                         how approximative is ABC?                ABC as knn

   Simulating from the prior is often poor in efficiency
   Either modify the proposal distribution on θ to increase the density
   of x’s within the vicinity of y ...
        [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

   ...or by viewing the problem as a conditional density estimation
   and by developing techniques to allow for larger
                                               [Beaumont et al., 2002]

   .....or even by including     in the inferential framework [ABCµ ]
                                                     [Ratmann et al., 2009]
ABC (simul’) advances


                         how approximative is ABC?                ABC as knn

   Simulating from the prior is often poor in efficiency
   Either modify the proposal distribution on θ to increase the density
   of x’s within the vicinity of y ...
        [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

   ...or by viewing the problem as a conditional density estimation
   and by developing techniques to allow for larger
                                               [Beaumont et al., 2002]

   .....or even by including     in the inferential framework [ABCµ ]
                                                     [Ratmann et al., 2009]
ABC (simul’) advances


                         how approximative is ABC?                ABC as knn

   Simulating from the prior is often poor in efficiency
   Either modify the proposal distribution on θ to increase the density
   of x’s within the vicinity of y ...
        [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

   ...or by viewing the problem as a conditional density estimation
   and by developing techniques to allow for larger
                                               [Beaumont et al., 2002]

   .....or even by including     in the inferential framework [ABCµ ]
                                                     [Ratmann et al., 2009]
ABC-NP


  Better usage of [prior] simulations by
 adjustement: instead of throwing away
 θ such that ρ(η(z), η(y)) > , replace
 θ’s with locally regressed transforms

       θ∗ = θ − {η(z) − η(y)}T β
                               ^
                                            [Csill´ry et al., TEE, 2010]
                                                  e

         ^
   where β is obtained by [NP] weighted least square regression on
   (η(z) − η(y)) with weights

                           Kδ {ρ(η(z), η(y))}

                                    [Beaumont et al., 2002, Genetics]
ABC-NP (regression)



   Also found in the subsequent literature, e.g. in        Fearnhead-Prangle (2012)   :
   weight directly simulation by

                           Kδ {ρ(η(z(θ)), η(y))}

   or
                           S
                       1
                                 Kδ {ρ(η(zs (θ)), η(y))}
                       S
                           s=1

                                       [consistent estimate of f (η|θ)]
   Curse of dimensionality: poor estimate when d = dim(η) is large...
ABC-NP (regression)



   Also found in the subsequent literature, e.g. in        Fearnhead-Prangle (2012)   :
   weight directly simulation by

                           Kδ {ρ(η(z(θ)), η(y))}

   or
                           S
                       1
                                 Kδ {ρ(η(zs (θ)), η(y))}
                       S
                           s=1

                                       [consistent estimate of f (η|θ)]
   Curse of dimensionality: poor estimate when d = dim(η) is large...
ABC-NP (density estimation)



   Use of the kernel weights

                             Kδ {ρ(η(z(θ)), η(y))}

   leads to the NP estimate of the posterior expectation

                         i   θi Kδ {ρ(η(z(θi )), η(y))}
                             i Kδ {ρ(η(z(θi )), η(y))}

                                                          [Blum, JASA, 2010]
ABC-NP (density estimation)



   Use of the kernel weights

                            Kδ {ρ(η(z(θ)), η(y))}

   leads to the NP estimate of the posterior conditional density

                    i
                        ˜
                        Kb (θi − θ)Kδ {ρ(η(z(θi )), η(y))}
                            i Kδ {ρ(η(z(θi )), η(y))}

                                                     [Blum, JASA, 2010]
ABC-NP (density estimations)



   Other versions incorporating regression adjustments

                         i
                             ˜
                             Kb (θ∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
                                   i
                                  i Kδ {ρ(η(z(θi )), η(y))}

   In all cases, error

      E[^ (θ|y)] − g (θ|y) = cb 2 + cδ2 + OP (b 2 + δ2 ) + OP (1/nδd )
        g
                               c
              var(^ (θ|y)) =
                  g                (1 + oP (1))
                             nbδd
ABC-NP (density estimations)



   Other versions incorporating regression adjustments

                         i
                             ˜
                             Kb (θ∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
                                   i
                                  i Kδ {ρ(η(z(θi )), η(y))}

   In all cases, error

      E[^ (θ|y)] − g (θ|y) = cb 2 + cδ2 + OP (b 2 + δ2 ) + OP (1/nδd )
        g
                               c
              var(^ (θ|y)) =
                  g                (1 + oP (1))
                             nbδd
                                                         [Blum, JASA, 2010]
ABC-NP (density estimations)



   Other versions incorporating regression adjustments

                         i
                             ˜
                             Kb (θ∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
                                   i
                                  i Kδ {ρ(η(z(θi )), η(y))}

   In all cases, error

      E[^ (θ|y)] − g (θ|y) = cb 2 + cδ2 + OP (b 2 + δ2 ) + OP (1/nδd )
        g
                               c
              var(^ (θ|y)) =
                  g                (1 + oP (1))
                             nbδd
                                                    [standard NP calculations]
ABC as knn


                                    [Biau et al., 2012, arxiv:1207.6461]

  Practice of ABC: determine tolerance         as a quantile on observed
  distances, say 10% or 1% quantile,

                        =   N   = qα (d1 , . . . , dN )

      Interpretation of ε as nonparametric bandwidth only
      approximation of the actual practice
                                            [Blum & Fran¸ois, 2010]
                                                        c

      ABC is a k-nearest neighbour (knn) method with kN = N N
                                    [Loftsgaarden & Quesenberry, 1965]
ABC as knn


                                    [Biau et al., 2012, arxiv:1207.6461]

  Practice of ABC: determine tolerance         as a quantile on observed
  distances, say 10% or 1% quantile,

                        =   N   = qα (d1 , . . . , dN )

      Interpretation of ε as nonparametric bandwidth only
      approximation of the actual practice
                                            [Blum & Fran¸ois, 2010]
                                                        c

      ABC is a k-nearest neighbour (knn) method with kN = N N
                                    [Loftsgaarden & Quesenberry, 1965]
ABC as knn


                                    [Biau et al., 2012, arxiv:1207.6461]

  Practice of ABC: determine tolerance         as a quantile on observed
  distances, say 10% or 1% quantile,

                        =   N   = qα (d1 , . . . , dN )

      Interpretation of ε as nonparametric bandwidth only
      approximation of the actual practice
                                            [Blum & Fran¸ois, 2010]
                                                        c

      ABC is a k-nearest neighbour (knn) method with kN = N N
                                    [Loftsgaarden & Quesenberry, 1965]
ABC consistency
   Provided

                 kN / log log N −→ ∞ and kN /N −→ 0

    as N → ∞, for almost all s0 (with respect to the distribution of
   S), with probability 1,
                          kN
                      1
                                ϕ(θj ) −→ E[ϕ(θj )|S = s0 ]
                     kN
                          j=1


                                                              [Devroye, 1982]
   Biau et al. (2012) also recall pointwise and integrated mean square error
   consistency results on the corresponding kernel estimate of the
   conditional posterior distribution, under constraints
                                                       p
            kN → ∞,       kN /N → 0,    hN → 0    and hN kN → ∞,
ABC consistency
   Provided

                 kN / log log N −→ ∞ and kN /N −→ 0

    as N → ∞, for almost all s0 (with respect to the distribution of
   S), with probability 1,
                          kN
                      1
                                ϕ(θj ) −→ E[ϕ(θj )|S = s0 ]
                     kN
                          j=1


                                                              [Devroye, 1982]
   Biau et al. (2012) also recall pointwise and integrated mean square error
   consistency results on the corresponding kernel estimate of the
   conditional posterior distribution, under constraints
                                                       p
            kN → ∞,       kN /N → 0,    hN → 0    and hN kN → ∞,
Rates of convergence


   Further assumptions (on target and kernel) allow for precise
   (integrated mean square) convergence rates (as a power of the
   sample size N), derived from classical k-nearest neighbour
   regression, like
                                                           4
       when m = 1, 2, 3, kN ≈ N (p+4)/(p+8) and rate N − p+8
                                                       4
       when m = 4, kN ≈ N (p+4)/(p+8) and rate N − p+8 log N
                                                           4
       when m > 4, kN ≈ N (p+4)/(m+p+4) and rate N − m+p+4
                                  [Biau et al., 2012, arxiv:1207.6461]


   Drag: Only applies to sufficient summary statistics
Rates of convergence


   Further assumptions (on target and kernel) allow for precise
   (integrated mean square) convergence rates (as a power of the
   sample size N), derived from classical k-nearest neighbour
   regression, like
                                                           4
       when m = 1, 2, 3, kN ≈ N (p+4)/(p+8) and rate N − p+8
                                                       4
       when m = 4, kN ≈ N (p+4)/(p+8) and rate N − p+8 log N
                                                           4
       when m > 4, kN ≈ N (p+4)/(m+p+4) and rate N − m+p+4
                                  [Biau et al., 2012, arxiv:1207.6461]


   Drag: Only applies to sufficient summary statistics
ABC inference machine



 Unavailable likelihoods

 ABC methods

 ABC as an inference machine
   Error inc.
   Exact BC and approximate
   targets
   summary statistic

 ABCel
How much Bayesian aBc is..?




      maybe a convergent method of inference (meaningful?
      sufficient? foreign?)
      approximation error unknown (w/o simulation)
      pragmatic Bayes (there is no other solution!)
      many calibration issues (tolerance, distance, statistics)
      the NP side should be incorporated into the whole B picture

                                                                  to ABCel
ABCµ



  Idea Infer about the error as well as about the parameter:
  Use of a joint density

                f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )

  where y is the data, and ξ( |y, θ) is the prior predictive density of
  ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
  Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
  approximation.
             [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
ABCµ



  Idea Infer about the error as well as about the parameter:
  Use of a joint density

                f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )

  where y is the data, and ξ( |y, θ) is the prior predictive density of
  ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
  Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
  approximation.
             [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
ABCµ



  Idea Infer about the error as well as about the parameter:
  Use of a joint density

                f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )

  where y is the data, and ξ( |y, θ) is the prior predictive density of
  ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
  Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
  approximation.
             [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
ABCµ details


   Multidimensional distances ρk (k = 1, . . . , K ) and errors
    k = ρk (ηk (z), ηk (y)), with

                                       1
    k   ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) =
                        ^                       K [{   k −ρk (ηk (zb ), ηk (y))}/hk ]
                                      Bhk
                                            b

   then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
                                              ^
   ABCµ involves acceptance probability

                 π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
                                             ^
                  π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
                                              ^
ABCµ details


   Multidimensional distances ρk (k = 1, . . . , K ) and errors
    k = ρk (ηk (z), ηk (y)), with

                                       1
    k   ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) =
                        ^                       K [{   k −ρk (ηk (zb ), ηk (y))}/hk ]
                                      Bhk
                                            b

   then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
                                              ^
   ABCµ involves acceptance probability

                 π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
                                             ^
                  π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
                                              ^
Wilkinson’s exact BC (not exactly!)

   ABC approximation error (i.e. non-zero tolerance) replaced with
   exact simulation from a controlled approximation to the target,
   convolution of true posterior with kernel function

                               π(θ)f (z|θ)K (y − z)
               π (θ, z|y) =                            ,
                              π(θ)f (z|θ)K (y − z)dzdθ

   with K kernel parameterised by bandwidth .
                                                     [Wilkinson, 2008]

   Theorem
   The ABC algorithm based on the assumption of a randomised
   observation y = y + ξ, ξ ∼ K , and an acceptance probability of
                   ˜

                              K (y − z)/M

   gives draws from the posterior distribution π(θ|y).
Wilkinson’s exact BC (not exactly!)

   ABC approximation error (i.e. non-zero tolerance) replaced with
   exact simulation from a controlled approximation to the target,
   convolution of true posterior with kernel function

                               π(θ)f (z|θ)K (y − z)
               π (θ, z|y) =                            ,
                              π(θ)f (z|θ)K (y − z)dzdθ

   with K kernel parameterised by bandwidth .
                                                     [Wilkinson, 2008]

   Theorem
   The ABC algorithm based on the assumption of a randomised
   observation y = y + ξ, ξ ∼ K , and an acceptance probability of
                   ˜

                              K (y − z)/M

   gives draws from the posterior distribution π(θ|y).
How exact a BC?


  Pros
         Pseudo-data from true model and observed data from noisy
         model
         Interesting perspective in that outcome is completely
         controlled
         Link with ABCµ and assuming y is observed with a
         measurement error with density K
         Relates to the theory of model approximation
                                               [Kennedy & O’Hagan, 2001]
  Cons
         Requires K to be bounded by M
         True approximation error never assessed
How exact a BC?


  Pros
         Pseudo-data from true model and observed data from noisy
         model
         Interesting perspective in that outcome is completely
         controlled
         Link with ABCµ and assuming y is observed with a
         measurement error with density K
         Relates to the theory of model approximation
                                               [Kennedy & O’Hagan, 2001]
  Cons
         Requires K to be bounded by M
         True approximation error never assessed
Noisy ABC



  Specific case of a hidden     Markov model



                             Xt+1 ∼ Qθ (Xt , ·)
                             Yt+1 ∼ gθ (·|xt )

  where only y0 is observed.
              1:n
                                   [Dean, Singh, Jasra, & Peters, 2011]
  Use of specific constraints, adapted to the Markov structure:

                y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
                        0                       0
Noisy ABC



  Specific case of a hidden     Markov model



                             Xt+1 ∼ Qθ (Xt , ·)
                             Yt+1 ∼ gθ (·|xt )

  where only y0 is observed.
              1:n
                                   [Dean, Singh, Jasra, & Peters, 2011]
  Use of specific constraints, adapted to the Markov structure:

                y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
                        0                       0
Noisy ABC-MLE



  Idea: Modify instead the data from the start
                        0
                      (y1 + ζ1 , . . . , yn + ζn )

                                                     [   see Fearnhead-Prangle   ]
  noisy ABC-MLE estimate

     arg max Pθ Y1 ∈ B(y1 + ζ1 , ), . . . , Yn ∈ B(yn + ζn , )
                        0                           0
          θ

                                [Dean, Singh, Jasra, & Peters, 2011]
Consistent noisy ABC-MLE




      Degrading the data improves the estimation performances:
          Noisy ABC-MLE is asymptotically (in n) consistent
          under further assumptions, the noisy ABC-MLE is
          asymptotically normal
          increase in variance of order −2
      likely degradation in precision or computing time due to the
      lack of summary statistic [curse of dimensionality]
Which summary?




  Fundamental difficulty of the choice of the summary statistic when
  there is no non-trivial sufficient statistics
      Loss of statistical information balanced against gain in data
      roughening
      Approximation error remains unknown
      Choice of statistics induces choice of distance function
      towards standardisation
Which summary?




  Fundamental difficulty of the choice of the summary statistic when
  there is no non-trivial sufficient statistics
      Loss of statistical information balanced against gain in data
      roughening
      Approximation error remains unknown
      Choice of statistics induces choice of distance function
      towards standardisation
Which summary for model choice?



   Depending on the choice of η(·), the Bayes factor based on this
   insufficient statistic,

                    η          π1 (θ1 )f1η (η(y)|θ1 ) dθ1
                   B12 (y) =                              ,
                               π2 (θ2 )f2η (η(y)|θ2 ) dθ2

   is consistent or not.
                                    [X, Cornuet, Marin, & Pillai, 2012]
   Consistency only depends on the range of Ei [η(y)] under both
   models.
                                [Marin, Pillai, X, & Rousseau, 2012]
Which summary for model choice?



   Depending on the choice of η(·), the Bayes factor based on this
   insufficient statistic,

                    η          π1 (θ1 )f1η (η(y)|θ1 ) dθ1
                   B12 (y) =                              ,
                               π2 (θ2 )f2η (η(y)|θ2 ) dθ2

   is consistent or not.
                                    [X, Cornuet, Marin, & Pillai, 2012]
   Consistency only depends on the range of Ei [η(y)] under both
   models.
                                [Marin, Pillai, X, & Rousseau, 2012]
Semi-automatic ABC



  Fearnhead and Prangle (2012) study ABC and the selection of the
  summary statistic in close proximity to Wilkinson’s proposal
      ABC considered as inferential method and calibrated as such
      randomised (or ‘noisy’) version of the summary statistics

                            ˜
                            η(y) = η(y) + τ

      derivation of a well-calibrated version of ABC, i.e. an
      algorithm that gives proper predictions for the distribution
      associated with this randomised summary statistic
Summary [of F&P/statistics)

      optimality of the posterior expectation

                                  E[θ|y]

      of the parameter of interest as summary statistics η(y)!
                                              [requires iterative process]
      use of the standard quadratic loss function

                           (θ − θ0 )T A(θ − θ0 )

      recent extension to model choice, optimality of Bayes factor

                                  B12 (y)

                                                   [F&P, ISBA 2012, Kyoto]
Summary [of F&P/statistics)

      optimality of the posterior expectation

                                  E[θ|y]

      of the parameter of interest as summary statistics η(y)!
                                              [requires iterative process]
      use of the standard quadratic loss function

                           (θ − θ0 )T A(θ − θ0 )

      recent extension to model choice, optimality of Bayes factor

                                  B12 (y)

                                                   [F&P, ISBA 2012, Kyoto]
ummary [about summaries]



      Choice of summary statistics is paramount for ABC
      validation/performances
      At best, ABC approximates π(. | η(y))
      Model selection feasible with ABC [with caution!]
      For estimation, consistency if {θ; µ(θ) = µ0 } = θ0 when
      Eθ [η(y)] = µ(θ)
      For testing consistency if
      {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅
                                                        [Marin et al., 2011]
ummary [about summaries]



      Choice of summary statistics is paramount for ABC
      validation/performances
      At best, ABC approximates π(. | η(y))
      Model selection feasible with ABC [with caution!]
      For estimation, consistency if {θ; µ(θ) = µ0 } = θ0 when
      Eθ [η(y)] = µ(θ)
      For testing consistency if
      {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅
                                                        [Marin et al., 2011]
ummary [about summaries]



      Choice of summary statistics is paramount for ABC
      validation/performances
      At best, ABC approximates π(. | η(y))
      Model selection feasible with ABC [with caution!]
      For estimation, consistency if {θ; µ(θ) = µ0 } = θ0 when
      Eθ [η(y)] = µ(θ)
      For testing consistency if
      {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅
                                                        [Marin et al., 2011]
ummary [about summaries]



      Choice of summary statistics is paramount for ABC
      validation/performances
      At best, ABC approximates π(. | η(y))
      Model selection feasible with ABC [with caution!]
      For estimation, consistency if {θ; µ(θ) = µ0 } = θ0 when
      Eθ [η(y)] = µ(θ)
      For testing consistency if
      {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅
                                                        [Marin et al., 2011]
ummary [about summaries]



      Choice of summary statistics is paramount for ABC
      validation/performances
      At best, ABC approximates π(. | η(y))
      Model selection feasible with ABC [with caution!]
      For estimation, consistency if {θ; µ(θ) = µ0 } = θ0 when
      Eθ [η(y)] = µ(θ)
      For testing consistency if
      {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅
                                                        [Marin et al., 2011]
Empirical likelihood (EL)



 Unavailable likelihoods

 ABC methods

 ABC as an inference machine

 ABCel
   ABC and EL
   Composite likelihood
   Illustrations
Empirical likelihood (EL)

   Dataset x made of n independent replicates x = (x1 , . . . , xn ) of
   some X ∼ F
   Generalized moment condition model
                          EF h(X , φ) = 0,
   where h is a known function, and φ an unknown parameter

   Corresponding empirical likelihood
                                                n
                           Lel (φ|x) = max           pi
                                           p
                                               i=1
   for all p such that 0   pi    1,    i   pi = 1,        i   pi h(xi , φ) = 0.

                   [Owen, 1988, Bio’ka, & Empirical Likelihood, 2001]
Empirical likelihood (EL)

   Dataset x made of n independent replicates x = (x1 , . . . , xn ) of
   some X ∼ F
   Generalized moment condition model
                          EF h(X , φ) = 0,
   where h is a known function, and φ an unknown parameter

   Corresponding empirical likelihood
                                                n
                           Lel (φ|x) = max           pi
                                           p
                                               i=1
   for all p such that 0   pi    1,    i   pi = 1,        i   pi h(xi , φ) = 0.

                   [Owen, 1988, Bio’ka, & Empirical Likelihood, 2001]
Convergence of EL [3.4]

   Theorem 3.4 Let X , Y1 , . . . , Yn be independent rv’s with common
   distribution F0 . For θ ∈ Θ, and the function h(X , θ) ∈ Rs , let
   θ0 ∈ Θ be such that
                              Var(h(Yi , θ0 ))
   is finite and has rank q > 0. If θ0 satisfies

                            E(h(X , θ0 )) = 0,

   then
                           Lel (θ0 |Y1 , . . . , Yn )
                  −2 log                                → χ2
                                                           (q)
                                    n−n
   in distribution when n → ∞.
                                                                 [Owen, 2001]
Convergence of EL [3.4]




   “...The interesting thing about Theorem 3.4 is what is not there. It
                                   ^
   includes no conditions to make θ a good estimate of θ0 , nor even
   conditions to ensure a unique value for θ0 , nor even that any solution θ0
   exists. Theorem 3.4 applies in the just determined, over-determined, and
   under-determined cases. When we can prove that our estimating
                                                                     ^
   equations uniquely define θ0 , and provide a consistent estimator θ of it,
   then confidence regions and tests follow almost automatically through
   Theorem 3.4.”.
                                                                [Owen, 2001]
Raw ABCel sampler



   Act as if EL was an exact likelihood
                                               [Lazar, 2003]

     for i = 1 → N do
       generate φi from the prior distribution π(·)
       set the weight ωi = Lel (φi |xobs )
     end for
     return (φi , ωi ), i = 1, . . . , N

       Output weighted sample of size N
Raw ABCel sampler


   Act as if EL was an exact likelihood
                                                      [Lazar, 2003]

     for i = 1 → N do
       generate φi from the prior distribution π(·)
       set the weight ωi = Lel (φi |xobs )
     end for
     return (φi , ωi ), i = 1, . . . , N

       Performance evaluated through effective sample size
                                              2
                              N        N      
                   ESS = 1         ωi       ωj
                                              
                               i=1        j=1
Raw ABCel sampler

   Act as if EL was an exact likelihood
                                                        [Lazar, 2003]

     for i = 1 → N do
       generate φi from the prior distribution π(·)
       set the weight ωi = Lel (φi |xobs )
     end for
     return (φi , ωi ), i = 1, . . . , N

       More advanced algorithms can be adapted to EL:
       E.g., adaptive multiple importance sampling (AMIS) of
       Cornuet et al. to speed up computations
                                               [Cornuet et al., 2012]
Moment condition in population genetics?

   EL does not require a fully defined and often complex (hence
   debatable) parametric model

   Main difficulty
   Derive a constraint
                            EF h(X , φ) = 0,
   on the parameters of interest φ when X is made of the genotypes
   of the sample of individuals at a given locus

   E.g., in phylogeography, φ is composed of
       dates of divergence between populations,
       ratio of population sizes,
       mutation rates, etc.
   None of them are moments of the distribution of the allelic states
   of the sample
Moment condition in population genetics?


   EL does not require a fully defined and often complex (hence
   debatable) parametric model

   Main difficulty
   Derive a constraint
                            EF h(X , φ) = 0,
   on the parameters of interest φ when X is made of the genotypes
   of the sample of individuals at a given locus


   c h made of pairwise composite scores (whose zero is the pairwise
   maximum likelihood estimator)
Pairwise composite likelihood


   The intra-locus pairwise likelihood
                                                         j
                          2 (xk |φ)             2 (xk , xk |φ)
                                                    i
                                      =
                                          i<j
         1            n
   with xk , . . . , xk : allelic states of the gene sample at the k-th locus

   The pairwise score function
                                                                j
                   φ log 2 (xk |φ)               φ log 2 (xk , xk |φ)
                                                           i
                                      =
                                          i<j

       Composite likelihoods are often much narrower than the
       original likelihood of the model

   Safe with EL because we only use position of its mode
Pairwise likelihood: a simple case

                                                      1
   Assumptions                          2 (δ|θ)   =√       ρ (θ)|δ|
                                                    1 + 2θ
         sample ⊂ closed, panmictic     with
                                                         θ
         population at equilibrium      ρ(θ) =           √
                                                  1 + θ + 1 + 2θ
         marker: microsatellite
         mutation rate: θ/2             Pairwise score function
                                        ∂θ log 2 (δ|θ) =
       i     j                               1           |δ|
   if xk et xk are two genes of the     −         + √
   sample,                                1 + 2θ θ 1 + 2θ

             j
    2 (xk , xk |θ)
        i             depends only on
           i      j
   δ = xk − xk
Pairwise likelihood: a simple case

                                                      1
   Assumptions                          2 (δ|θ)   =√       ρ (θ)|δ|
                                                    1 + 2θ
         sample ⊂ closed, panmictic     with
                                                         θ
         population at equilibrium      ρ(θ) =           √
                                                  1 + θ + 1 + 2θ
         marker: microsatellite
         mutation rate: θ/2             Pairwise score function
                                        ∂θ log 2 (δ|θ) =
       i     j                               1           |δ|
   if xk et xk are two genes of the     −         + √
   sample,                                1 + 2θ θ 1 + 2θ

             j
    2 (xk , xk |θ)
        i             depends only on
           i      j
   δ = xk − xk
Pairwise likelihood: a simple case

                                                      1
   Assumptions                          2 (δ|θ)   =√       ρ (θ)|δ|
                                                    1 + 2θ
         sample ⊂ closed, panmictic     with
                                                         θ
         population at equilibrium      ρ(θ) =           √
                                                  1 + θ + 1 + 2θ
         marker: microsatellite
         mutation rate: θ/2             Pairwise score function
                                        ∂θ log 2 (δ|θ) =
       i     j                               1           |δ|
   if xk et xk are two genes of the     −         + √
   sample,                                1 + 2θ θ 1 + 2θ

             j
    2 (xk , xk |θ)
        i             depends only on
           i      j
   δ = xk − xk
Pairwise likelihood: 2 diverging populations

            MRCA                   Then 2 (δ|θ, τ) =
                                               +∞
                                       e−τθ
                                   √                ρ(θ)|k| Iδ−k (τθ).
                               τ       1 + 2θ k=−∞
                                   where
                                   In (z) nth-order modified
    POP a            POP b         Bessel function of the first
   Assumptions                     kind
       τ: divergence date of
       pop. a and b
       θ/2: mutation rate
        i      j
   Let xk and xk be two genes
   coming resp. from pop. a and
   b
            i    j
   Set δ = xk − xk .
Pairwise likelihood: 2 diverging populations

            MRCA
                                   A 2-dim score function
                                   ∂τ log 2 (δ|θ, τ) = −θ+
                               τ   θ 2 (δ − 1|θ, τ) + 2 (δ + 1|θ, τ)
                                   2              2 (δ|θ, τ)


    POP a            POP b         ∂θ log 2 (δ|θ, τ) =
                                             1         q(δ|θ, τ)
   Assumptions                     −τ −            +              +
                                         1 + 2θ        2 (δ|θ, τ)
                                   τ 2 (δ − 1|θ, τ) + 2 (δ + 1|θ, τ)
       τ: divergence date of
                                   2              2 (δ|θ, τ)
       pop. a and b
       θ/2: mutation rate          where
        i     j                    q(δ|θ, τ) :=
   Let xk andxk  be two genes                        ∞
                                     e−τθ ρ (θ)
   coming resp. from pop. a and    √                      |k|ρ(θ)|k| Iδ−k (τθ)
                                     1 + 2θ ρ(θ)
   b                                               k=−∞

            i    j
   Set δ = xk − xk .
Example: normal posterior
            ABCel with two constraints
                                  ESS=155.6                                                                     ESS=75.93                                                                ESS=76.87




                                                                                                                                                         4
                                                                              2.0




                                                                                                                                                         3
  Density




                                                                    Density




                                                                                                                                               Density
             1.0




                                                                                                                                                         2
                                                                              1.0




                                                                                                                                                         1
             0.0




                                                                              0.0




                                                                                                                                                         0
                    −0.5           0.0           0.5          1.0                               −0.4     −0.2     0.0    0.2       0.4                                       −0.4        −0.2          0.0              0.2

                                      θ
                                  ESS=91.54                                                                         θ
                                                                                                                ESS=108.4                                                                    θ
                                                                                                                                                                                         ESS=85.13
             4




                                                                              0.0 1.0 2.0 3.0




                                                                                                                                                         0.0 1.0 2.0 3.0
             3
  Density




                                                                    Density




                                                                                                                                               Density
             2
             1
             0




                   −0.6    −0.4     −0.2         0.0    0.2                                       −0.4            0.0    0.2     0.4     0.6                                  −0.2       0.0         0.2      0.4             0.6

                                      θ
                                  ESS=149.1                                                                         θ
                                                                                                                ESS=96.31                                                                    θ
                                                                                                                                                                                         ESS=83.77




                                                                                                                                                         4
             2.0




                                                                                                                                                         3
                                                                              2.0
  Density




                                                                    Density




                                                                                                                                               Density

                                                                                                                                                         2
             1.0




                                                                              1.0




                                                                                                                                                         1
             0.0




                                                                              0.0




                   −0.5           0.0            0.5          1.0                                 −0.4            0.0    0.2     0.4     0.6             0                 −0.6   −0.4    −0.2        0.0         0.2         0.4

                                      θ
                                  ESS=155.7                                                                         θ
                                                                                                                ESS=92.42                                                                    θ
                                                                                                                                                                                         ESS=95.01
                                                                              0.0 1.0 2.0 3.0
             2.0




                                                                                                                                                         3.0
  Density




                                                                    Density




                                                                                                                                               Density
             1.0




                                                                                                                                                         1.5
             0.0




                                                                                                                                                         0.0




                           −0.5          0.0           0.5                                        −0.4            0.0    0.2     0.4     0.6                                 −0.4              0.0     0.2        0.4         0.6

                                      θ
                                  ESS=139.2                                                                         θ
                                                                                                                ESS=99.33                                                                    θ
                                                                                                                                                                                         ESS=87.28
             2.0




                                                                              0.0 1.0 2.0 3.0




                                                                                                                                                         3
  Density




                                                                    Density




                                                                                                                                               Density

                                                                                                                                                         2
             1.0




                                                                                                                                                         1
             0.0




                                                                                                                                                         0




                   −0.6      −0.2          0.2         0.6                                         −0.4    −0.2    0.0     0.2     0.4                                        −0.2   0.0        0.2         0.4         0.6




            Sample sizes are of 25 (column 1), 50 (column 2) and 75 (column 3)
            observations
Example: normal posterior
            ABCel with three constraints
                               ESS=300.1                                                                        ESS=205.5                                                                   ESS=179.4




                                                                                                                                                                      3.0
                                                                                2.0
  Density




                                                                      Density




                                                                                                                                                            Density
             1.0




                                                                                                                                                                      1.5
                                                                                1.0
             0.0




                                                                                0.0




                                                                                                                                                                      0.0
                     −0.4      0.0           0.4             0.8                                  −0.6          −0.2     0.0        0.2         0.4                                  −0.2          0.0         0.2         0.4

                                   θ
                               ESS=265.1                                                                            θ
                                                                                                                ESS=250.3                                                                       θ
                                                                                                                                                                                            ESS=134.8
             4




                                                                                                                                                                      4
                                                                                2.0
  Density




                                                                      Density




                                                                                                                                                            Density
             3




                                                                                                                                                                      3
             2




                                                                                1.0




                                                                                                                                                                      2
             1




                                                                                                                                                                      1
                                                                                0.0
             0




                                                                                                                                                                      0
                      −0.3   −0.2    −0.1        0.0      0.1                                        −0.6       −0.4     −0.2         0.0             0.2                     −0.4            −0.2             0.0    0.1

                                   θ
                               ESS=331.5                                                                            θ
                                                                                                                ESS=167.4                                                                       θ
                                                                                                                                                                                            ESS=136.5
             2.0




                                                                                                                                                                      4
                                                                                3




                                                                                                                                                                      3
  Density




                                                                      Density




                                                                                                                                                            Density
             1.0




                                                                                2




                                                                                                                                                                      2
                                                                                1




                                                                                                                                                                      1
             0.0




                                                                                0




                   −0.8      −0.4          0.0          0.4                                         −0.9         −0.7           −0.5              −0.3                0       −0.4          −0.2         0.0         0.2

                                   θ
                               ESS=322.4                                                                            θ
                                                                                                                ESS=202.7                                                                       θ
                                                                                                                                                                                             ESS=166
                                                                                0.0 1.0 2.0 3.0




                                                                                                                                                                      4
             2.0




                                                                                                                                                                      3
  Density




                                                                      Density




                                                                                                                                                            Density

                                                                                                                                                                      2
             1.0




                                                                                                                                                                      1
             0.0




                                                                                                                                                                      0




                     −0.2    0.0     0.2     0.4       0.6      0.8                                  −0.4   −0.2        0.0         0.2         0.4                         −0.4       −0.2              0.0         0.2

                                   θ
                               ESS=263.7                                                                            θ
                                                                                                                ESS=190.9                                                                       θ
                                                                                                                                                                                            ESS=165.3
                                                                                                                                                                      3.0
                                                                                3
             2.0
  Density




                                                                      Density




                                                                                                                                                            Density
                                                                                2




                                                                                                                                                                      1.5
             1.0




                                                                                1
             0.0




                                                                                                                                                                      0.0
                                                                                0




                     −1.0          −0.6            −0.2                                           −0.4   −0.2     0.0         0.2         0.4         0.6                     −0.5          −0.3          −0.1         0.1




            Sample sizes are of 25 (column 1), 50 (column 2) and 75 (column 3)
            observations
Example: Superposition of gamma processes


   Example of superposition of N renewal processes with waiting
   times τij (i = 1, . . . , M, j = 1, . . .) ∼ G(α, β), when N is unknown.
   Renewal processes

                       ζi1 = τi1 , ζi2 = ζi1 + τi2 , . . .

   with observations made of first n values of the ζij ’s,

                 z1 = min{ζij }, z2 = min{ζij ; ζij > z1 }, . . .

   ending with
                          zn = min{ζij ; ζij > zn−1 } .
                                           [Cox & Kartsonaki, B’ka, 2012]
ABC in Varanasi
ABC in Varanasi
ABC in Varanasi
ABC in Varanasi
ABC in Varanasi
ABC in Varanasi
ABC in Varanasi
ABC in Varanasi
ABC in Varanasi
ABC in Varanasi
ABC in Varanasi

Más contenido relacionado

La actualidad más candente

A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
 
On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...Gautier Marti
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum EntropyJiawang Liu
 
Network and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspectiveNetwork and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspectiveSYRTO Project
 
Monash University short course, part II
Monash University short course, part IIMonash University short course, part II
Monash University short course, part IIChristian Robert
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsUniversity of Salerno
 
Scalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilityScalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilitySYRTO Project
 
Approximation in Stochastic Integer Programming
Approximation in Stochastic Integer ProgrammingApproximation in Stochastic Integer Programming
Approximation in Stochastic Integer ProgrammingSSA KPI
 
Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time SeriesGautier Marti
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsUmberto Picchini
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for mlSung Yub Kim
 
Optimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasOptimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasGautier Marti
 

La actualidad más candente (20)

A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
Network and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspectiveNetwork and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspective
 
Monash University short course, part II
Monash University short course, part IIMonash University short course, part II
Monash University short course, part II
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov Chains
 
Scalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilityScalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatility
 
Approximation in Stochastic Integer Programming
Approximation in Stochastic Integer ProgrammingApproximation in Stochastic Integer Programming
Approximation in Stochastic Integer Programming
 
Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time Series
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
Bayesian Core: Chapter 6
Bayesian Core: Chapter 6Bayesian Core: Chapter 6
Bayesian Core: Chapter 6
 
Jsm09 talk
Jsm09 talkJsm09 talk
Jsm09 talk
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for ml
 
Optimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasOptimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between Copulas
 

Destacado

Séminaire de Physique à Besancon, Nov. 22, 2012
Séminaire de Physique à Besancon, Nov. 22, 2012Séminaire de Physique à Besancon, Nov. 22, 2012
Séminaire de Physique à Besancon, Nov. 22, 2012Christian Robert
 
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataLet's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataChristian Robert
 
Gelfand and Smith (1990), read by
Gelfand and Smith (1990), read byGelfand and Smith (1990), read by
Gelfand and Smith (1990), read byChristian Robert
 
Reading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrapReading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrapChristian Robert
 

Destacado (6)

Séminaire de Physique à Besancon, Nov. 22, 2012
Séminaire de Physique à Besancon, Nov. 22, 2012Séminaire de Physique à Besancon, Nov. 22, 2012
Séminaire de Physique à Besancon, Nov. 22, 2012
 
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataLet's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
 
Gelfand and Smith (1990), read by
Gelfand and Smith (1990), read byGelfand and Smith (1990), read by
Gelfand and Smith (1990), read by
 
Hastings paper discussion
Hastings paper discussionHastings paper discussion
Hastings paper discussion
 
Reading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrapReading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrap
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 

Similar a ABC in Varanasi

ABC & Empirical Lkd
ABC & Empirical LkdABC & Empirical Lkd
ABC & Empirical LkdDeb Roy
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16Christian Robert
 
[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in RomaChristian Robert
 
Pittsburgh and Toronto "Halloween US trip" seminars
Pittsburgh and Toronto "Halloween US trip" seminarsPittsburgh and Toronto "Halloween US trip" seminars
Pittsburgh and Toronto "Halloween US trip" seminarsChristian Robert
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABCChristian Robert
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMINGChristian Robert
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschChristian Robert
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chaptersChristian Robert
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011Christian Robert
 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statisticsPierre Pudlo
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Christian Robert
 

Similar a ABC in Varanasi (20)

ABC & Empirical Lkd
ABC & Empirical LkdABC & Empirical Lkd
ABC & Empirical Lkd
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16
 
[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma
 
Pittsburgh and Toronto "Halloween US trip" seminars
Pittsburgh and Toronto "Halloween US trip" seminarsPittsburgh and Toronto "Halloween US trip" seminars
Pittsburgh and Toronto "Halloween US trip" seminars
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Boston talk
Boston talkBoston talk
Boston talk
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi Künsch
 
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chapters
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011
 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statistics
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
von Mises lecture, Berlin
von Mises lecture, Berlinvon Mises lecture, Berlin
von Mises lecture, Berlin
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016
 

Más de Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 

Más de Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 

Último

The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphNetziValdelomar1
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 

Último (20)

The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a Paragraph
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 

ABC in Varanasi

  • 1. How approximate is Approximate Bayesian Computation? Christian P. Robert ISBA IWCBTA, Varanasi, Jan. 9, 2013 Joint work with J.-M. Cornuet, J.-M. Marin, K.L. Mengersen, N. Pillai, P. Pudlo and J. Rousseau
  • 2. Advertisment MCMSki IV to be held in Chamonix Mt Blanc, France, from Monday, Jan. 6 to Wed., Jan. 8, 2014 All aspects of MCMC++ theory and methodology Parallel (invited and contributed) sessions: call for proposals on website http://www.pages.drexel.edu/ mwl25/mcmski/
  • 3. Outline Unavailable likelihoods ABC methods ABC as an inference machine ABCel
  • 4. Intractable likelihood Case of a well-defined statistical model where the likelihood function (θ|y) = f (y1 , . . . , yn |θ) is (really!) not available in closed form can (easily!) be neither completed nor demarginalised cannot be estimated by an unbiased estimator c Prohibits direct implementation of a generic MCMC algorithm like Metropolis–Hastings
  • 5. Intractable likelihood Case of a well-defined statistical model where the likelihood function (θ|y) = f (y1 , . . . , yn |θ) is (really!) not available in closed form can (easily!) be neither completed nor demarginalised cannot be estimated by an unbiased estimator c Prohibits direct implementation of a generic MCMC algorithm like Metropolis–Hastings
  • 6. The abc alternative Approximations to the original B problem Degrading the precision down to a tolerance ε Replacing the likelihood with a non-parametric approximation Summarising the data with insufficient statistics
  • 7. The abc alternative Approximations to the original B problem Degrading the precision down to a tolerance ε Replacing the likelihood with a non-parametric approximation Summarising the data with insufficient statistics
  • 8. The abc alternative Approximations to the original B problem Degrading the precision down to a tolerance ε Replacing the likelihood with a non-parametric approximation Summarising the data with insufficient statistics
  • 9. Different worries about abc Impact on B inference a mere computational issue (that will eventually end up being solved by more powerful computers, &tc, even if too costly in the short term, as for regular Monte Carlo methods) an inferential issue (opening opportunities for new inference machine, with legitimity different than for classical B approach) a Bayesian conundrum (how closely related to the/a B approach?)
  • 10. Different worries about abc Impact on B inference a mere computational issue (that will eventually end up being solved by more powerful computers, &tc, even if too costly in the short term, as for regular Monte Carlo methods) an inferential issue (opening opportunities for new inference machine, with legitimity different than for classical B approach) a Bayesian conundrum (how closely related to the/a B approach?)
  • 11. Different worries about abc Impact on B inference a mere computational issue (that will eventually end up being solved by more powerful computers, &tc, even if too costly in the short term, as for regular Monte Carlo methods) an inferential issue (opening opportunities for new inference machine, with legitimity different than for classical B approach) a Bayesian conundrum (how closely related to the/a B approach?)
  • 12. Econom’ections Similar exploration of simulation-based and approximation techniques in Econometrics Simulated method of moments Method of simulated moments Simulated pseudo-maximum-likelihood Indirect inference [Gouri´roux & Monfort, 1996] e even though motivation is partly-defined models rather than complex likelihoods
  • 13. Econom’ections Similar exploration of simulation-based and approximation techniques in Econometrics Simulated method of moments Method of simulated moments Simulated pseudo-maximum-likelihood Indirect inference [Gouri´roux & Monfort, 1996] e even though motivation is partly-defined models rather than complex likelihoods
  • 14. Indirect inference ^ Minimise [in θ] a distance between estimators β based on a pseudo-model for genuine observations and for observations simulated under the true model and the parameter θ. [Gouri´roux, Monfort, & Renault, 1993; e Smith, 1993; Gallant & Tauchen, 1996]
  • 15. Indirect inference (PML vs. PSE) Example of the pseudo-maximum-likelihood (PML) ^ β(y) = arg max log f (yt |β, y1:(t−1) ) β t leading to arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2 ^ ^ θ when ys (θ) ∼ f (y|θ) s = 1, . . . , S
  • 16. Indirect inference (PML vs. PSE) Example of the pseudo-score-estimator (PSE) 2 ∂ log f ^ β(y) = arg min (yt |β, y1:(t−1) ) β t ∂β leading to arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2 ^ ^ θ when ys (θ) ∼ f (y|θ) s = 1, . . . , S
  • 17. Consistent indirect inference “...in order to get a unique solution the dimension of the auxiliary parameter β must be larger than or equal to the dimension of the initial parameter θ. If the problem is just identified the different methods become easier...” Consistency depending on the criterion and on the asymptotic identifiability of θ [Gouri´roux & Monfort, 1996, p. 66] e Which connection [if any] with the B perspective?
  • 18. Consistent indirect inference “...in order to get a unique solution the dimension of the auxiliary parameter β must be larger than or equal to the dimension of the initial parameter θ. If the problem is just identified the different methods become easier...” Consistency depending on the criterion and on the asymptotic identifiability of θ [Gouri´roux & Monfort, 1996, p. 66] e Which connection [if any] with the B perspective?
  • 19. Consistent indirect inference “...in order to get a unique solution the dimension of the auxiliary parameter β must be larger than or equal to the dimension of the initial parameter θ. If the problem is just identified the different methods become easier...” Consistency depending on the criterion and on the asymptotic identifiability of θ [Gouri´roux & Monfort, 1996, p. 66] e Which connection [if any] with the B perspective?
  • 20. Approximate Bayesian computation Unavailable likelihoods ABC methods Genesis of ABC ABC basics Advances and interpretations ABC as knn ABC as an inference machine ABCel
  • 21. Genetic background of ABC skip genetics ABC is a recent computational technique that only requires being able to sample from the likelihood f (·|θ) This technique stemmed from population genetics models, about 15 years ago, and population geneticists still contribute significantly to methodological developments of ABC. [Griffith & al., 1997; Tavar´ & al., 1999] e
  • 22. Demo-genetic inference Each model is characterized by a set of parameters θ that cover historical (time divergence, admixture time ...), demographics (population sizes, admixture rates, migration rates, ...) and genetic (mutation rate, ...) factors The goal is to estimate these parameters from a dataset of polymorphism (DNA sample) y observed at the present time Problem: most of the time, we cannot calculate the likelihood of the polymorphism data f (y|θ)...
  • 23. Demo-genetic inference Each model is characterized by a set of parameters θ that cover historical (time divergence, admixture time ...), demographics (population sizes, admixture rates, migration rates, ...) and genetic (mutation rate, ...) factors The goal is to estimate these parameters from a dataset of polymorphism (DNA sample) y observed at the present time Problem: most of the time, we cannot calculate the likelihood of the polymorphism data f (y|θ)...
  • 24. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Mutations according to the Simple stepwise Mutation Model (SMM) • date of the mutations ∼ Poisson process with intensity θ/2 over the branches • MRCA = 100 • independent mutations: ±1 with pr. 1/2 Sample of 8 genes
  • 25. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Kingman’s genealogy When time axis is normalized, T (k) ∼ Exp(k(k − 1)/2) Mutations according to the Simple stepwise Mutation Model (SMM) • date of the mutations ∼ Poisson process with intensity θ/2 over the branches • MRCA = 100 • independent mutations: ±1 with pr. 1/2
  • 26. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Kingman’s genealogy When time axis is normalized, T (k) ∼ Exp(k(k − 1)/2) Mutations according to the Simple stepwise Mutation Model (SMM) • date of the mutations ∼ Poisson process with intensity θ/2 over the branches • MRCA = 100 • independent mutations: ±1 with pr. 1/2
  • 27. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Kingman’s genealogy When time axis is normalized, T (k) ∼ Exp(k(k − 1)/2) Mutations according to the Simple stepwise Mutation Model (SMM) • date of the mutations ∼ Poisson process with intensity θ/2 over the branches Observations: leafs of the tree • MRCA = 100 ^ θ=? • independent mutations: ±1 with pr. 1/2
  • 28. Much more interesting models. . . several independent locus Independent gene genealogies and mutations different populations linked by an evolutionary scenario made of divergences, admixtures, migrations between populations, etc. larger sample size usually between 50 and 100 genes MRCA τ2 τ1 A typical evolutionary scenario: POP 0 POP 1 POP 2
  • 29. Intractable likelihood Missing (too missing!) data structure: f (y|θ) = f (y|G , θ)f (G |θ)dG G cannot be computed in a manageable way... The genealogies are considered as nuisance parameters This modelling clearly differs from the phylogenetic perspective where the tree is the parameter of interest.
  • 30. Intractable likelihood Missing (too missing!) data structure: f (y|θ) = f (y|G , θ)f (G |θ)dG G cannot be computed in a manageable way... The genealogies are considered as nuisance parameters This modelling clearly differs from the phylogenetic perspective where the tree is the parameter of interest.
  • 31. A?B?C? A stands for approximate [wrong likelihood / picture] B stands for Bayesian C stands for computation [producing a parameter sample]
  • 32. A?B?C? A stands for approximate [wrong likelihood / picture] B stands for Bayesian C stands for computation [producing a parameter sample]
  • 33. A?B?C? ESS=155.6 ESS=75.93 ESS=76.87 4 2.0 3 Density Density Density 1.0 2 1.0 1 0.0 0.0 0 A stands for approximate −0.5 0.0 θ ESS=91.54 0.5 1.0 −0.4 −0.2 0.0 θ ESS=108.4 0.2 0.4 −0.4 −0.2 θ ESS=85.13 0.0 0.2 4 0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 3 [wrong likelihood / Density Density Density 2 1 0 picture] −0.6 −0.4 −0.2 θ ESS=149.1 0.0 0.2 −0.4 0.0 θ ESS=96.31 0.2 0.4 0.6 −0.2 0.0 θ ESS=83.77 0.2 0.4 0.6 4 2.0 3 2.0 Density Density Density 2 1.0 1.0 B stands for Bayesian 1 0.0 0.0 0 −0.5 0.0 0.5 1.0 −0.4 0.0 0.2 0.4 0.6 −0.6 −0.4 −0.2 0.0 0.2 0.4 θ ESS=155.7 θ ESS=92.42 θ ESS=95.01 0.0 1.0 2.0 3.0 2.0 3.0 C stands for computation Density Density Density 1.0 1.5 0.0 0.0 [producing a parameter −0.5 θ 0.0 ESS=139.2 0.5 −0.4 0.0 θ ESS=99.33 0.2 0.4 0.6 −0.4 θ 0.0 ESS=87.28 0.2 0.4 0.6 2.0 0.0 1.0 2.0 3.0 3 sample] Density Density Density 2 1.0 1 0.0 0 −0.6 −0.2 0.2 0.6 −0.4 −0.2 0.0 0.2 0.4 −0.2 0.0 0.2 0.4 0.6
  • 34. How Bayesian is aBc? Could we turn the resolution into a Bayesian answer? ideally so (not meaningfull: requires ∞-ly powerful computer approximation error unknown (w/o costly simulation) true Bayes for wrong model (formal and artificial) true Bayes for noisy model (much more convincing) true Bayes for estimated likelihood (back to econometrics?) illuminating the tension between information and precision
  • 35. Untractable likelihood Back to stage zero: what can we do when a likelihood function f (y|θ) is well-defined but impossible / too costly to compute...? MCMC cannot be implemented! shall we give up Bayesian inference altogether?!
  • 36. Untractable likelihood Back to stage zero: what can we do when a likelihood function f (y|θ) is well-defined but impossible / too costly to compute...? MCMC cannot be implemented! shall we give up Bayesian inference altogether?! or settle for an almost Bayesian inference/picture...?
  • 37. ABC methodology Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: Foundation For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y, then the selected θ ∼ π(θ|y) [Rubin, 1984; Diggle & Gratton, 1984; Tavar´ et al., 1997] e
  • 38. ABC methodology Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: Foundation For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y, then the selected θ ∼ π(θ|y) [Rubin, 1984; Diggle & Gratton, 1984; Tavar´ et al., 1997] e
  • 39. ABC methodology Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: Foundation For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y, then the selected θ ∼ π(θ|y) [Rubin, 1984; Diggle & Gratton, 1984; Tavar´ et al., 1997] e
  • 40. A as A...pproximative When y is a continuous random variable, strict equality z = y is replaced with a tolerance zone ρ(y, z) where ρ is a distance Output distributed from def π(θ) Pθ {ρ(y, z) < } ∝ π(θ|ρ(y, z) < ) [Pritchard et al., 1999]
  • 41. A as A...pproximative When y is a continuous random variable, strict equality z = y is replaced with a tolerance zone ρ(y, z) where ρ is a distance Output distributed from def π(θ) Pθ {ρ(y, z) < } ∝ π(θ|ρ(y, z) < ) [Pritchard et al., 1999]
  • 42. ABC algorithm In most implementations, further degree of A...pproximation: Algorithm 1 Likelihood-free rejection sampler for i = 1 to N do repeat generate θ from the prior distribution π(·) generate z from the likelihood f (·|θ ) until ρ{η(z), η(y)} set θi = θ end for where η(y) defines a (not necessarily sufficient) statistic
  • 43. Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) . ...does it?!
  • 44. Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) . ...does it?!
  • 45. Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) . ...does it?!
  • 46. Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the restricted posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) . Not so good..! skip convergence details!
  • 47. Convergence of ABC What happens when → 0? For B ⊂ Θ, we have A f (z|θ)dz f (z|θ)π(θ)dθ ,y B π(θ)dθ = dz B A ,y ×Θ π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ B f (z|θ)π(θ)dθ m(z) = dz A ,y m(z) A ,y ×Θ π(θ)f (z|θ)dzdθ m(z) = π(B|z) dz A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ which indicates convergence for a continuous π(B|z).
  • 48. Convergence of ABC What happens when → 0? For B ⊂ Θ, we have A f (z|θ)dz f (z|θ)π(θ)dθ ,y B π(θ)dθ = dz B A ,y ×Θ π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ B f (z|θ)π(θ)dθ m(z) = dz A ,y m(z) A ,y ×Θ π(θ)f (z|θ)dzdθ m(z) = π(B|z) dz A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ which indicates convergence for a continuous π(B|z).
  • 49. Convergence (do not attempt!) ...and the above does not apply to insufficient statistics: If η(y) is not a sufficient statistics, the best one can hope for is π(θ|η(y)) , not π(θ|y) If η(y) is an ancillary statistic, the whole information contained in y is lost!, the “best” one can “hope” for is π(θ|η(y)) = π(θ) Bummer!!!
  • 50. Convergence (do not attempt!) ...and the above does not apply to insufficient statistics: If η(y) is not a sufficient statistics, the best one can hope for is π(θ|η(y)) , not π(θ|y) If η(y) is an ancillary statistic, the whole information contained in y is lost!, the “best” one can “hope” for is π(θ|η(y)) = π(θ) Bummer!!!
  • 51. Convergence (do not attempt!) ...and the above does not apply to insufficient statistics: If η(y) is not a sufficient statistics, the best one can hope for is π(θ|η(y)) , not π(θ|y) If η(y) is an ancillary statistic, the whole information contained in y is lost!, the “best” one can “hope” for is π(θ|η(y)) = π(θ) Bummer!!!
  • 52. Convergence (do not attempt!) ...and the above does not apply to insufficient statistics: If η(y) is not a sufficient statistics, the best one can hope for is π(θ|η(y)) , not π(θ|y) If η(y) is an ancillary statistic, the whole information contained in y is lost!, the “best” one can “hope” for is π(θ|η(y)) = π(θ) Bummer!!!
  • 53. Comments Role of distance paramount (because = 0) Scaling of components of η(y) is also determinant matters little if “small enough” representative of “curse of dimensionality” small is beautiful! the data as a whole may be paradoxically weakly informative for ABC
  • 54. ABC (simul’) advances how approximative is ABC? ABC as knn Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y ... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002] .....or even by including in the inferential framework [ABCµ ] [Ratmann et al., 2009]
  • 55. ABC (simul’) advances how approximative is ABC? ABC as knn Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y ... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002] .....or even by including in the inferential framework [ABCµ ] [Ratmann et al., 2009]
  • 56. ABC (simul’) advances how approximative is ABC? ABC as knn Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y ... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002] .....or even by including in the inferential framework [ABCµ ] [Ratmann et al., 2009]
  • 57. ABC (simul’) advances how approximative is ABC? ABC as knn Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y ... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002] .....or even by including in the inferential framework [ABCµ ] [Ratmann et al., 2009]
  • 58. ABC-NP Better usage of [prior] simulations by adjustement: instead of throwing away θ such that ρ(η(z), η(y)) > , replace θ’s with locally regressed transforms θ∗ = θ − {η(z) − η(y)}T β ^ [Csill´ry et al., TEE, 2010] e ^ where β is obtained by [NP] weighted least square regression on (η(z) − η(y)) with weights Kδ {ρ(η(z), η(y))} [Beaumont et al., 2002, Genetics]
  • 59. ABC-NP (regression) Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) : weight directly simulation by Kδ {ρ(η(z(θ)), η(y))} or S 1 Kδ {ρ(η(zs (θ)), η(y))} S s=1 [consistent estimate of f (η|θ)] Curse of dimensionality: poor estimate when d = dim(η) is large...
  • 60. ABC-NP (regression) Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) : weight directly simulation by Kδ {ρ(η(z(θ)), η(y))} or S 1 Kδ {ρ(η(zs (θ)), η(y))} S s=1 [consistent estimate of f (η|θ)] Curse of dimensionality: poor estimate when d = dim(η) is large...
  • 61. ABC-NP (density estimation) Use of the kernel weights Kδ {ρ(η(z(θ)), η(y))} leads to the NP estimate of the posterior expectation i θi Kδ {ρ(η(z(θi )), η(y))} i Kδ {ρ(η(z(θi )), η(y))} [Blum, JASA, 2010]
  • 62. ABC-NP (density estimation) Use of the kernel weights Kδ {ρ(η(z(θ)), η(y))} leads to the NP estimate of the posterior conditional density i ˜ Kb (θi − θ)Kδ {ρ(η(z(θi )), η(y))} i Kδ {ρ(η(z(θi )), η(y))} [Blum, JASA, 2010]
  • 63. ABC-NP (density estimations) Other versions incorporating regression adjustments i ˜ Kb (θ∗ − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))} In all cases, error E[^ (θ|y)] − g (θ|y) = cb 2 + cδ2 + OP (b 2 + δ2 ) + OP (1/nδd ) g c var(^ (θ|y)) = g (1 + oP (1)) nbδd
  • 64. ABC-NP (density estimations) Other versions incorporating regression adjustments i ˜ Kb (θ∗ − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))} In all cases, error E[^ (θ|y)] − g (θ|y) = cb 2 + cδ2 + OP (b 2 + δ2 ) + OP (1/nδd ) g c var(^ (θ|y)) = g (1 + oP (1)) nbδd [Blum, JASA, 2010]
  • 65. ABC-NP (density estimations) Other versions incorporating regression adjustments i ˜ Kb (θ∗ − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))} In all cases, error E[^ (θ|y)] − g (θ|y) = cb 2 + cδ2 + OP (b 2 + δ2 ) + OP (1/nδd ) g c var(^ (θ|y)) = g (1 + oP (1)) nbδd [standard NP calculations]
  • 66. ABC as knn [Biau et al., 2012, arxiv:1207.6461] Practice of ABC: determine tolerance as a quantile on observed distances, say 10% or 1% quantile, = N = qα (d1 , . . . , dN ) Interpretation of ε as nonparametric bandwidth only approximation of the actual practice [Blum & Fran¸ois, 2010] c ABC is a k-nearest neighbour (knn) method with kN = N N [Loftsgaarden & Quesenberry, 1965]
  • 67. ABC as knn [Biau et al., 2012, arxiv:1207.6461] Practice of ABC: determine tolerance as a quantile on observed distances, say 10% or 1% quantile, = N = qα (d1 , . . . , dN ) Interpretation of ε as nonparametric bandwidth only approximation of the actual practice [Blum & Fran¸ois, 2010] c ABC is a k-nearest neighbour (knn) method with kN = N N [Loftsgaarden & Quesenberry, 1965]
  • 68. ABC as knn [Biau et al., 2012, arxiv:1207.6461] Practice of ABC: determine tolerance as a quantile on observed distances, say 10% or 1% quantile, = N = qα (d1 , . . . , dN ) Interpretation of ε as nonparametric bandwidth only approximation of the actual practice [Blum & Fran¸ois, 2010] c ABC is a k-nearest neighbour (knn) method with kN = N N [Loftsgaarden & Quesenberry, 1965]
  • 69. ABC consistency Provided kN / log log N −→ ∞ and kN /N −→ 0 as N → ∞, for almost all s0 (with respect to the distribution of S), with probability 1, kN 1 ϕ(θj ) −→ E[ϕ(θj )|S = s0 ] kN j=1 [Devroye, 1982] Biau et al. (2012) also recall pointwise and integrated mean square error consistency results on the corresponding kernel estimate of the conditional posterior distribution, under constraints p kN → ∞, kN /N → 0, hN → 0 and hN kN → ∞,
  • 70. ABC consistency Provided kN / log log N −→ ∞ and kN /N −→ 0 as N → ∞, for almost all s0 (with respect to the distribution of S), with probability 1, kN 1 ϕ(θj ) −→ E[ϕ(θj )|S = s0 ] kN j=1 [Devroye, 1982] Biau et al. (2012) also recall pointwise and integrated mean square error consistency results on the corresponding kernel estimate of the conditional posterior distribution, under constraints p kN → ∞, kN /N → 0, hN → 0 and hN kN → ∞,
  • 71. Rates of convergence Further assumptions (on target and kernel) allow for precise (integrated mean square) convergence rates (as a power of the sample size N), derived from classical k-nearest neighbour regression, like 4 when m = 1, 2, 3, kN ≈ N (p+4)/(p+8) and rate N − p+8 4 when m = 4, kN ≈ N (p+4)/(p+8) and rate N − p+8 log N 4 when m > 4, kN ≈ N (p+4)/(m+p+4) and rate N − m+p+4 [Biau et al., 2012, arxiv:1207.6461] Drag: Only applies to sufficient summary statistics
  • 72. Rates of convergence Further assumptions (on target and kernel) allow for precise (integrated mean square) convergence rates (as a power of the sample size N), derived from classical k-nearest neighbour regression, like 4 when m = 1, 2, 3, kN ≈ N (p+4)/(p+8) and rate N − p+8 4 when m = 4, kN ≈ N (p+4)/(p+8) and rate N − p+8 log N 4 when m > 4, kN ≈ N (p+4)/(m+p+4) and rate N − m+p+4 [Biau et al., 2012, arxiv:1207.6461] Drag: Only applies to sufficient summary statistics
  • 73. ABC inference machine Unavailable likelihoods ABC methods ABC as an inference machine Error inc. Exact BC and approximate targets summary statistic ABCel
  • 74. How much Bayesian aBc is..? maybe a convergent method of inference (meaningful? sufficient? foreign?) approximation error unknown (w/o simulation) pragmatic Bayes (there is no other solution!) many calibration issues (tolerance, distance, statistics) the NP side should be incorporated into the whole B picture to ABCel
  • 75. ABCµ Idea Infer about the error as well as about the parameter: Use of a joint density f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( ) where y is the data, and ξ( |y, θ) is the prior predictive density of ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ) Warning! Replacement of ξ( |y, θ) with a non-parametric kernel approximation. [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
  • 76. ABCµ Idea Infer about the error as well as about the parameter: Use of a joint density f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( ) where y is the data, and ξ( |y, θ) is the prior predictive density of ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ) Warning! Replacement of ξ( |y, θ) with a non-parametric kernel approximation. [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
  • 77. ABCµ Idea Infer about the error as well as about the parameter: Use of a joint density f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( ) where y is the data, and ξ( |y, θ) is the prior predictive density of ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ) Warning! Replacement of ξ( |y, θ) with a non-parametric kernel approximation. [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
  • 78. ABCµ details Multidimensional distances ρk (k = 1, . . . , K ) and errors k = ρk (ηk (z), ηk (y)), with 1 k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = ^ K [{ k −ρk (ηk (zb ), ηk (y))}/hk ] Bhk b then used in replacing ξ( |y, θ) with mink ξk ( |y, θ) ^ ABCµ involves acceptance probability π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ ) ^ π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ) ^
  • 79. ABCµ details Multidimensional distances ρk (k = 1, . . . , K ) and errors k = ρk (ηk (z), ηk (y)), with 1 k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = ^ K [{ k −ρk (ηk (zb ), ηk (y))}/hk ] Bhk b then used in replacing ξ( |y, θ) with mink ξk ( |y, θ) ^ ABCµ involves acceptance probability π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ ) ^ π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ) ^
  • 80. Wilkinson’s exact BC (not exactly!) ABC approximation error (i.e. non-zero tolerance) replaced with exact simulation from a controlled approximation to the target, convolution of true posterior with kernel function π(θ)f (z|θ)K (y − z) π (θ, z|y) = , π(θ)f (z|θ)K (y − z)dzdθ with K kernel parameterised by bandwidth . [Wilkinson, 2008] Theorem The ABC algorithm based on the assumption of a randomised observation y = y + ξ, ξ ∼ K , and an acceptance probability of ˜ K (y − z)/M gives draws from the posterior distribution π(θ|y).
  • 81. Wilkinson’s exact BC (not exactly!) ABC approximation error (i.e. non-zero tolerance) replaced with exact simulation from a controlled approximation to the target, convolution of true posterior with kernel function π(θ)f (z|θ)K (y − z) π (θ, z|y) = , π(θ)f (z|θ)K (y − z)dzdθ with K kernel parameterised by bandwidth . [Wilkinson, 2008] Theorem The ABC algorithm based on the assumption of a randomised observation y = y + ξ, ξ ∼ K , and an acceptance probability of ˜ K (y − z)/M gives draws from the posterior distribution π(θ|y).
  • 82. How exact a BC? Pros Pseudo-data from true model and observed data from noisy model Interesting perspective in that outcome is completely controlled Link with ABCµ and assuming y is observed with a measurement error with density K Relates to the theory of model approximation [Kennedy & O’Hagan, 2001] Cons Requires K to be bounded by M True approximation error never assessed
  • 83. How exact a BC? Pros Pseudo-data from true model and observed data from noisy model Interesting perspective in that outcome is completely controlled Link with ABCµ and assuming y is observed with a measurement error with density K Relates to the theory of model approximation [Kennedy & O’Hagan, 2001] Cons Requires K to be bounded by M True approximation error never assessed
  • 84. Noisy ABC Specific case of a hidden Markov model Xt+1 ∼ Qθ (Xt , ·) Yt+1 ∼ gθ (·|xt ) where only y0 is observed. 1:n [Dean, Singh, Jasra, & Peters, 2011] Use of specific constraints, adapted to the Markov structure: y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , ) 0 0
  • 85. Noisy ABC Specific case of a hidden Markov model Xt+1 ∼ Qθ (Xt , ·) Yt+1 ∼ gθ (·|xt ) where only y0 is observed. 1:n [Dean, Singh, Jasra, & Peters, 2011] Use of specific constraints, adapted to the Markov structure: y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , ) 0 0
  • 86. Noisy ABC-MLE Idea: Modify instead the data from the start 0 (y1 + ζ1 , . . . , yn + ζn ) [ see Fearnhead-Prangle ] noisy ABC-MLE estimate arg max Pθ Y1 ∈ B(y1 + ζ1 , ), . . . , Yn ∈ B(yn + ζn , ) 0 0 θ [Dean, Singh, Jasra, & Peters, 2011]
  • 87. Consistent noisy ABC-MLE Degrading the data improves the estimation performances: Noisy ABC-MLE is asymptotically (in n) consistent under further assumptions, the noisy ABC-MLE is asymptotically normal increase in variance of order −2 likely degradation in precision or computing time due to the lack of summary statistic [curse of dimensionality]
  • 88. Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics Loss of statistical information balanced against gain in data roughening Approximation error remains unknown Choice of statistics induces choice of distance function towards standardisation
  • 89. Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics Loss of statistical information balanced against gain in data roughening Approximation error remains unknown Choice of statistics induces choice of distance function towards standardisation
  • 90. Which summary for model choice? Depending on the choice of η(·), the Bayes factor based on this insufficient statistic, η π1 (θ1 )f1η (η(y)|θ1 ) dθ1 B12 (y) = , π2 (θ2 )f2η (η(y)|θ2 ) dθ2 is consistent or not. [X, Cornuet, Marin, & Pillai, 2012] Consistency only depends on the range of Ei [η(y)] under both models. [Marin, Pillai, X, & Rousseau, 2012]
  • 91. Which summary for model choice? Depending on the choice of η(·), the Bayes factor based on this insufficient statistic, η π1 (θ1 )f1η (η(y)|θ1 ) dθ1 B12 (y) = , π2 (θ2 )f2η (η(y)|θ2 ) dθ2 is consistent or not. [X, Cornuet, Marin, & Pillai, 2012] Consistency only depends on the range of Ei [η(y)] under both models. [Marin, Pillai, X, & Rousseau, 2012]
  • 92. Semi-automatic ABC Fearnhead and Prangle (2012) study ABC and the selection of the summary statistic in close proximity to Wilkinson’s proposal ABC considered as inferential method and calibrated as such randomised (or ‘noisy’) version of the summary statistics ˜ η(y) = η(y) + τ derivation of a well-calibrated version of ABC, i.e. an algorithm that gives proper predictions for the distribution associated with this randomised summary statistic
  • 93. Summary [of F&P/statistics) optimality of the posterior expectation E[θ|y] of the parameter of interest as summary statistics η(y)! [requires iterative process] use of the standard quadratic loss function (θ − θ0 )T A(θ − θ0 ) recent extension to model choice, optimality of Bayes factor B12 (y) [F&P, ISBA 2012, Kyoto]
  • 94. Summary [of F&P/statistics) optimality of the posterior expectation E[θ|y] of the parameter of interest as summary statistics η(y)! [requires iterative process] use of the standard quadratic loss function (θ − θ0 )T A(θ − θ0 ) recent extension to model choice, optimality of Bayes factor B12 (y) [F&P, ISBA 2012, Kyoto]
  • 95. ummary [about summaries] Choice of summary statistics is paramount for ABC validation/performances At best, ABC approximates π(. | η(y)) Model selection feasible with ABC [with caution!] For estimation, consistency if {θ; µ(θ) = µ0 } = θ0 when Eθ [η(y)] = µ(θ) For testing consistency if {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅ [Marin et al., 2011]
  • 96. ummary [about summaries] Choice of summary statistics is paramount for ABC validation/performances At best, ABC approximates π(. | η(y)) Model selection feasible with ABC [with caution!] For estimation, consistency if {θ; µ(θ) = µ0 } = θ0 when Eθ [η(y)] = µ(θ) For testing consistency if {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅ [Marin et al., 2011]
  • 97. ummary [about summaries] Choice of summary statistics is paramount for ABC validation/performances At best, ABC approximates π(. | η(y)) Model selection feasible with ABC [with caution!] For estimation, consistency if {θ; µ(θ) = µ0 } = θ0 when Eθ [η(y)] = µ(θ) For testing consistency if {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅ [Marin et al., 2011]
  • 98. ummary [about summaries] Choice of summary statistics is paramount for ABC validation/performances At best, ABC approximates π(. | η(y)) Model selection feasible with ABC [with caution!] For estimation, consistency if {θ; µ(θ) = µ0 } = θ0 when Eθ [η(y)] = µ(θ) For testing consistency if {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅ [Marin et al., 2011]
  • 99. ummary [about summaries] Choice of summary statistics is paramount for ABC validation/performances At best, ABC approximates π(. | η(y)) Model selection feasible with ABC [with caution!] For estimation, consistency if {θ; µ(θ) = µ0 } = θ0 when Eθ [η(y)] = µ(θ) For testing consistency if {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅ [Marin et al., 2011]
  • 100. Empirical likelihood (EL) Unavailable likelihoods ABC methods ABC as an inference machine ABCel ABC and EL Composite likelihood Illustrations
  • 101. Empirical likelihood (EL) Dataset x made of n independent replicates x = (x1 , . . . , xn ) of some X ∼ F Generalized moment condition model EF h(X , φ) = 0, where h is a known function, and φ an unknown parameter Corresponding empirical likelihood n Lel (φ|x) = max pi p i=1 for all p such that 0 pi 1, i pi = 1, i pi h(xi , φ) = 0. [Owen, 1988, Bio’ka, & Empirical Likelihood, 2001]
  • 102. Empirical likelihood (EL) Dataset x made of n independent replicates x = (x1 , . . . , xn ) of some X ∼ F Generalized moment condition model EF h(X , φ) = 0, where h is a known function, and φ an unknown parameter Corresponding empirical likelihood n Lel (φ|x) = max pi p i=1 for all p such that 0 pi 1, i pi = 1, i pi h(xi , φ) = 0. [Owen, 1988, Bio’ka, & Empirical Likelihood, 2001]
  • 103. Convergence of EL [3.4] Theorem 3.4 Let X , Y1 , . . . , Yn be independent rv’s with common distribution F0 . For θ ∈ Θ, and the function h(X , θ) ∈ Rs , let θ0 ∈ Θ be such that Var(h(Yi , θ0 )) is finite and has rank q > 0. If θ0 satisfies E(h(X , θ0 )) = 0, then Lel (θ0 |Y1 , . . . , Yn ) −2 log → χ2 (q) n−n in distribution when n → ∞. [Owen, 2001]
  • 104. Convergence of EL [3.4] “...The interesting thing about Theorem 3.4 is what is not there. It ^ includes no conditions to make θ a good estimate of θ0 , nor even conditions to ensure a unique value for θ0 , nor even that any solution θ0 exists. Theorem 3.4 applies in the just determined, over-determined, and under-determined cases. When we can prove that our estimating ^ equations uniquely define θ0 , and provide a consistent estimator θ of it, then confidence regions and tests follow almost automatically through Theorem 3.4.”. [Owen, 2001]
  • 105. Raw ABCel sampler Act as if EL was an exact likelihood [Lazar, 2003] for i = 1 → N do generate φi from the prior distribution π(·) set the weight ωi = Lel (φi |xobs ) end for return (φi , ωi ), i = 1, . . . , N Output weighted sample of size N
  • 106. Raw ABCel sampler Act as if EL was an exact likelihood [Lazar, 2003] for i = 1 → N do generate φi from the prior distribution π(·) set the weight ωi = Lel (φi |xobs ) end for return (φi , ωi ), i = 1, . . . , N Performance evaluated through effective sample size  2 N  N  ESS = 1 ωi ωj   i=1 j=1
  • 107. Raw ABCel sampler Act as if EL was an exact likelihood [Lazar, 2003] for i = 1 → N do generate φi from the prior distribution π(·) set the weight ωi = Lel (φi |xobs ) end for return (φi , ωi ), i = 1, . . . , N More advanced algorithms can be adapted to EL: E.g., adaptive multiple importance sampling (AMIS) of Cornuet et al. to speed up computations [Cornuet et al., 2012]
  • 108. Moment condition in population genetics? EL does not require a fully defined and often complex (hence debatable) parametric model Main difficulty Derive a constraint EF h(X , φ) = 0, on the parameters of interest φ when X is made of the genotypes of the sample of individuals at a given locus E.g., in phylogeography, φ is composed of dates of divergence between populations, ratio of population sizes, mutation rates, etc. None of them are moments of the distribution of the allelic states of the sample
  • 109. Moment condition in population genetics? EL does not require a fully defined and often complex (hence debatable) parametric model Main difficulty Derive a constraint EF h(X , φ) = 0, on the parameters of interest φ when X is made of the genotypes of the sample of individuals at a given locus c h made of pairwise composite scores (whose zero is the pairwise maximum likelihood estimator)
  • 110. Pairwise composite likelihood The intra-locus pairwise likelihood j 2 (xk |φ) 2 (xk , xk |φ) i = i<j 1 n with xk , . . . , xk : allelic states of the gene sample at the k-th locus The pairwise score function j φ log 2 (xk |φ) φ log 2 (xk , xk |φ) i = i<j Composite likelihoods are often much narrower than the original likelihood of the model Safe with EL because we only use position of its mode
  • 111. Pairwise likelihood: a simple case 1 Assumptions 2 (δ|θ) =√ ρ (θ)|δ| 1 + 2θ sample ⊂ closed, panmictic with θ population at equilibrium ρ(θ) = √ 1 + θ + 1 + 2θ marker: microsatellite mutation rate: θ/2 Pairwise score function ∂θ log 2 (δ|θ) = i j 1 |δ| if xk et xk are two genes of the − + √ sample, 1 + 2θ θ 1 + 2θ j 2 (xk , xk |θ) i depends only on i j δ = xk − xk
  • 112. Pairwise likelihood: a simple case 1 Assumptions 2 (δ|θ) =√ ρ (θ)|δ| 1 + 2θ sample ⊂ closed, panmictic with θ population at equilibrium ρ(θ) = √ 1 + θ + 1 + 2θ marker: microsatellite mutation rate: θ/2 Pairwise score function ∂θ log 2 (δ|θ) = i j 1 |δ| if xk et xk are two genes of the − + √ sample, 1 + 2θ θ 1 + 2θ j 2 (xk , xk |θ) i depends only on i j δ = xk − xk
  • 113. Pairwise likelihood: a simple case 1 Assumptions 2 (δ|θ) =√ ρ (θ)|δ| 1 + 2θ sample ⊂ closed, panmictic with θ population at equilibrium ρ(θ) = √ 1 + θ + 1 + 2θ marker: microsatellite mutation rate: θ/2 Pairwise score function ∂θ log 2 (δ|θ) = i j 1 |δ| if xk et xk are two genes of the − + √ sample, 1 + 2θ θ 1 + 2θ j 2 (xk , xk |θ) i depends only on i j δ = xk − xk
  • 114. Pairwise likelihood: 2 diverging populations MRCA Then 2 (δ|θ, τ) = +∞ e−τθ √ ρ(θ)|k| Iδ−k (τθ). τ 1 + 2θ k=−∞ where In (z) nth-order modified POP a POP b Bessel function of the first Assumptions kind τ: divergence date of pop. a and b θ/2: mutation rate i j Let xk and xk be two genes coming resp. from pop. a and b i j Set δ = xk − xk .
  • 115. Pairwise likelihood: 2 diverging populations MRCA A 2-dim score function ∂τ log 2 (δ|θ, τ) = −θ+ τ θ 2 (δ − 1|θ, τ) + 2 (δ + 1|θ, τ) 2 2 (δ|θ, τ) POP a POP b ∂θ log 2 (δ|θ, τ) = 1 q(δ|θ, τ) Assumptions −τ − + + 1 + 2θ 2 (δ|θ, τ) τ 2 (δ − 1|θ, τ) + 2 (δ + 1|θ, τ) τ: divergence date of 2 2 (δ|θ, τ) pop. a and b θ/2: mutation rate where i j q(δ|θ, τ) := Let xk andxk be two genes ∞ e−τθ ρ (θ) coming resp. from pop. a and √ |k|ρ(θ)|k| Iδ−k (τθ) 1 + 2θ ρ(θ) b k=−∞ i j Set δ = xk − xk .
  • 116. Example: normal posterior ABCel with two constraints ESS=155.6 ESS=75.93 ESS=76.87 4 2.0 3 Density Density Density 1.0 2 1.0 1 0.0 0.0 0 −0.5 0.0 0.5 1.0 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 θ ESS=91.54 θ ESS=108.4 θ ESS=85.13 4 0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 3 Density Density Density 2 1 0 −0.6 −0.4 −0.2 0.0 0.2 −0.4 0.0 0.2 0.4 0.6 −0.2 0.0 0.2 0.4 0.6 θ ESS=149.1 θ ESS=96.31 θ ESS=83.77 4 2.0 3 2.0 Density Density Density 2 1.0 1.0 1 0.0 0.0 −0.5 0.0 0.5 1.0 −0.4 0.0 0.2 0.4 0.6 0 −0.6 −0.4 −0.2 0.0 0.2 0.4 θ ESS=155.7 θ ESS=92.42 θ ESS=95.01 0.0 1.0 2.0 3.0 2.0 3.0 Density Density Density 1.0 1.5 0.0 0.0 −0.5 0.0 0.5 −0.4 0.0 0.2 0.4 0.6 −0.4 0.0 0.2 0.4 0.6 θ ESS=139.2 θ ESS=99.33 θ ESS=87.28 2.0 0.0 1.0 2.0 3.0 3 Density Density Density 2 1.0 1 0.0 0 −0.6 −0.2 0.2 0.6 −0.4 −0.2 0.0 0.2 0.4 −0.2 0.0 0.2 0.4 0.6 Sample sizes are of 25 (column 1), 50 (column 2) and 75 (column 3) observations
  • 117. Example: normal posterior ABCel with three constraints ESS=300.1 ESS=205.5 ESS=179.4 3.0 2.0 Density Density Density 1.0 1.5 1.0 0.0 0.0 0.0 −0.4 0.0 0.4 0.8 −0.6 −0.2 0.0 0.2 0.4 −0.2 0.0 0.2 0.4 θ ESS=265.1 θ ESS=250.3 θ ESS=134.8 4 4 2.0 Density Density Density 3 3 2 1.0 2 1 1 0.0 0 0 −0.3 −0.2 −0.1 0.0 0.1 −0.6 −0.4 −0.2 0.0 0.2 −0.4 −0.2 0.0 0.1 θ ESS=331.5 θ ESS=167.4 θ ESS=136.5 2.0 4 3 3 Density Density Density 1.0 2 2 1 1 0.0 0 −0.8 −0.4 0.0 0.4 −0.9 −0.7 −0.5 −0.3 0 −0.4 −0.2 0.0 0.2 θ ESS=322.4 θ ESS=202.7 θ ESS=166 0.0 1.0 2.0 3.0 4 2.0 3 Density Density Density 2 1.0 1 0.0 0 −0.2 0.0 0.2 0.4 0.6 0.8 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 θ ESS=263.7 θ ESS=190.9 θ ESS=165.3 3.0 3 2.0 Density Density Density 2 1.5 1.0 1 0.0 0.0 0 −1.0 −0.6 −0.2 −0.4 −0.2 0.0 0.2 0.4 0.6 −0.5 −0.3 −0.1 0.1 Sample sizes are of 25 (column 1), 50 (column 2) and 75 (column 3) observations
  • 118. Example: Superposition of gamma processes Example of superposition of N renewal processes with waiting times τij (i = 1, . . . , M, j = 1, . . .) ∼ G(α, β), when N is unknown. Renewal processes ζi1 = τi1 , ζi2 = ζi1 + τi2 , . . . with observations made of first n values of the ζij ’s, z1 = min{ζij }, z2 = min{ζij ; ζij > z1 }, . . . ending with zn = min{ζij ; ζij > zn−1 } . [Cox & Kartsonaki, B’ka, 2012]