SlideShare una empresa de Scribd logo
1 de 85
Descargar para leer sin conexión
The 21st Bayesian Century




               “The 21st Century belongs to Bayes”
         as argued by a discussion on Bayesian testing and
                      Bayesian model choice

                                     Christian P. Robert

                            Universit´ Paris Dauphine and CREST-INSEE
                                     e
                            http://www.ceremade.dauphine.fr/~xian
                                   http://xianblog.wordpress.com


                                         July 1, 2009
The 21st Bayesian Century




              A consequence of Bayesian statistics being given a proper
               name is that it encourages too much historical deference
                  from people who think that the bibles of Jeffreys, de
                       Finetti, Jaynes, and others have all the answers.
                               —Gelman, Bayesian Analysis 3(3), 2008
The 21st Bayesian Century




Outline

                    Anyone not shocked by the Bayesian theory of inference has not
                                                                     understood it
                                                                  Senn, BA., 2008


      Introduction

      Tests and model choice

      Bayesian Calculations

      A Defense of the Bayesian Choice
The 21st Bayesian Century
  Introduction




Vocabulary and concepts
                            Bayesian inference is a coherent mathematical theory
                                     but I don’t trust it in scientific applications.
                                                                 Gelman, BA, 2008


      Introduction
          Models
          The Bayesian framework
          Improper prior distributions
          Noninformative prior distributions

      Tests and model choice

      Bayesian Calculations

      A Defense of the Bayesian Choice
The 21st Bayesian Century
  Introduction
     Models



Parametric model

             Bayesians promote the idea that a multiplicity of parameters can be
            handled via hierarchical, typically exchangeable, models, but it seems
          implausible that this could really work automatically [instead of] giving
                                  reasonable answers using minimal assumptions.
                                                               Gelman, BA, 2008

      Observations x1 , . . . , xn generated from a probability distribution
      fi (xi |θi , x1 , . . . , xi−1 ) = fi (xi |θi , x1:i−1 )

                     x = (x1 , . . . , xn ) ∼ f (x|θ),     θ = (θ1 , . . . , θn )

      Associated likelihood
                                        ℓ(θ|x) = f (x|θ)
                                                 [inverted density & starting point]
The 21st Bayesian Century
  Introduction
     Models



And [B] nonparametrics?!
      Equally very active and definitely very 21st, thank you,
      but not mentioned in this talk!
              7th Workshop on Bayesian Nonparametrics - Collegio...                             http://bnpworkshop.carloalberto.org/


                                            21 - 25 June 2009, Moncalieri


                                          The 7th Workshop on Bayesian Nonparametrics will be held at
                                          the Collegio Carlo Alberto from June 21 to 25, 2009. The Collegio is a
                                          Research Institution housed in an historical building located in
                                          Moncalieri on the outskirts of Turin, Italy.
                                          The meeting will feature the latest developments in the area and will
                                          cover a wide variety of both theoretical and applied topics such as:
                                          foundations of the Bayesian nonparametric approach, construction
                                          and properties of prior distributions, asymptotics, interplay with
                                          probability theory and stochastic processes, statistical modelling,
                                          computational algorithms and applications in machine learning,
                                          biostatistics, bioinformatics, economics and econometrics.

                                          The Workshop will be structured in 4 tutorials on special topics, a
                                          series of invited talks and contributed posters sessions.

                                          News
                                          Tentative Workshop Schedule
                                          Abstract Book (last updated 27th May 2009)
                                          Workshop Poster
The 21st Bayesian Century
  Introduction
     The Bayesian framework



Bayes theorem 101

      Bayes theorem = Inversion of probabilities
      If A and E are events such that P (E) = 0, P (A|E) and P (E|A)
      are related by

          P (A|E) =
                          P (E|A)P (A)
                 P (E|A)P (A) + P (E|Ac )P (Ac )
                    P (E|A)P (A)
                 =
                        P (E)

                                                   [Thomas Bayes (?)]
The 21st Bayesian Century
  Introduction
     The Bayesian framework



Bayesian approach
                                      The impact of treating x as a fixed constant
                                      is to increase statistical power as an artefact
                                                     Templeton, Molec. Ecol., 2009

      New perspective
          ◮      Uncertainty on the parameters θ of a model modeled through
                 a probability distribution π on Θ, called prior distribution
          ◮      Inference based on the distribution of θ conditional on x,
                 π(θ|x), called posterior distribution

                                              f (x|θ)π(θ)
                                  π(θ|x) =                   .
                                              f (x|θ)π(θ) dθ
The 21st Bayesian Century
  Introduction
     The Bayesian framework



[Nonphilosophical] justifications

                                            Ignoring the sampling error of x undermines
                            the statistical validity of all inferences made by the method
                                                             Templeton, Molec. Ecol., 2009
          ◮      Semantic drift from unknown to random
          ◮      Actualization of the information on θ by extracting the
                 information on θ contained in the observation x
          ◮      Allows incorporation of imperfect information in the decision
                 process
          ◮      Unique mathematical way to condition upon the observations
                 (conditional perspective)
          ◮      Unique way to give meaning to statements like P(θ > 0)
The 21st Bayesian Century
  Introduction
     The Bayesian framework



Posterior distribution


                  Bayesian methods are presented as an automatic inference engine,
                         and this raises suspicion in anyone with applied experience
                                                                  Gelman, BA, 2008

                              π(θ|x) central to Bayesian inference
          ◮      Operates conditional upon the observations
          ◮      Incorporates the requirement of the Likelihood Principle
          ◮      Avoids averaging over the unobserved values of x
          ◮      Coherent updating of the information available on θ
          ◮      Provides a complete inferential machinery
The 21st Bayesian Century
  Introduction
     Improper prior distributions



Improper distributions

             If we take P (dσ) ∝ dσ as a statement that σ may have any value
        between 0 and ∞ (...), we must use ∞ instead of 1 to denote certainty.
                                                          Jeffreys, ToP, 1939

      Necessary extension from a prior distribution to a prior σ-finite
      measure π such that

                                        π(θ) dθ = +∞
                                    Θ


                                                  Improper prior distribution
                                             [Weird? Inappropriate?? report!! ]
The 21st Bayesian Century
  Introduction
     Improper prior distributions



Justifications


                                 If the parameter may have any value from −∞ to +∞,
                            its prior probability should be taken as uniformly distributed
                                                                      Jeffreys, ToP, 1939
      Automated prior determination often leads to improper priors
         1. Similar performances of estimators derived from these
            generalized distributions
         2. Improper priors as limits of proper distributions in many
            [mathematical] senses
The 21st Bayesian Century
  Introduction
     Improper prior distributions



More justifications


         There is no good objective principle for choosing a noninformative prior
             (even if that concept were mathematically defined, which it is not)
                                                              Gelman, BA, 2008


         4. Robust answer against possible misspecifications of the prior
         5. Frequencial justifications, such as:
                   (i) minimaxity
                  (ii) admissibility
                 (iii) invariance (Haar measure)
         6. Improper priors [much] prefered to vague proper priors like
            N (0, 106 )
The 21st Bayesian Century
  Introduction
     Improper prior distributions



Validation

                               The mistake is to think of them as representing ignorance
                                                                    Lindley, JASA, 1990
      Extension of the posterior distribution π(θ|x) associated with an
      improper prior π as given by Bayes’s formula

                                                    f (x|θ)π(θ)
                                       π(θ|x) =                    ,
                                                  Θ f (x|θ)π(θ) dθ

      when
                                              f (x|θ)π(θ) dθ < ∞
                                          Θ
          Delete all emotional names
The 21st Bayesian Century
  Introduction
     Noninformative prior distributions



Noninformative priors



                 ...cannot be expected to represent exactly total ignorance about the
                 problem, but should rather be taken as reference priors, upon which
                      everyone could fall back when the prior information is missing.
                                                  Kass and Wasserman, JASA, 1996

      What if all we know is that we know “nothing” ?!
      In the absence of prior information, prior distributions solely
      derived from the sample distribution f (x|θ)
      Difficulty with uniform priors, lacking invariance properties.
The 21st Bayesian Century
  Introduction
     Noninformative prior distributions



Jeffreys’ prior

             If we took the prior density for the parameters to be proportional to
       |I(θ)|1/2 , it could be stated for any law that is differentiable with respect
        to all parameters that the total probability in any region of the θi would
                                                                                   ′
           be equal to the total probability in the corresponding region of the θi
                                                                Jeffreys, ToP, 1939

      Based on Fisher information
                                                       ∂ℓ ∂ℓ
                                          I(θ) = Eθ
                                                      ∂θT ∂θ

      Jeffreys’ prior distribution is

                                           π ∗ (θ) ∝ |I(θ)|1/2
The 21st Bayesian Century
  Tests and model choice




Tests and model choice
                   The Jeffreys-subjective synthesis betrays a much more dangerous
                    confusion than the Neyman-Pearson-Fisher synthesis as regards
                                                                   hypothesis tests
                                                                   Senn, BA, 2008


      Introduction

      Tests and model choice
         Bayesian tests
         Bayes factors
         Opposition to classical tests
         Model choice
         Compatible priors
         Variable selection
The 21st Bayesian Century
  Tests and model choice
     Bayesian tests



Construction of Bayes tests


              What is almost never used, however, is the Jeffreys significance test.
                                                                  Senn, BA, 2008


      Definition (Test)
      Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a
      statistical model, a test is a statistical procedure that takes its
      values in {0, 1}.

      Example (Normal mean)
      For x ∼ N (θ, 1), decide whether or not θ ≤ 0.
The 21st Bayesian Century
  Tests and model choice
     Bayesian tests



Decision-theoretic perspective
                             Loss functions [are] not relevant to statistical inference
                                                                   Gelman, BA, 2008

      Theorem (Optimal Bayes decision)
      Under the 0 − 1 loss function
                               
                               0
                                             if d = IΘ0 (θ)
                    L(θ, d) = a0              if d = 1 and θ ∈ Θ0
                               
                               
                                 a1           if d = 0 and θ ∈ Θ0

      the Bayes procedure is

                                  1 if Prπ (θ ∈ Θ0 |x) ≥ a0 /(a0 + a1 )
                      δ π (x) =
                                  0 otherwise
The 21st Bayesian Century
  Tests and model choice
     Bayes factors



A function of posterior probabilities
            The method posits two or more alternative hypotheses and tests their
                                        relative fits to some observed statistics
                                                   Templeton, Mol. Ecol., 2009


      Definition (Bayes factors)
      For hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0

                                                         f (x|θ)π0 (θ)dθ
                              π(Θ0 |x)   π(Θ0 )     Θ0
                      B01   =                   =
                              π(Θc |x)
                                 0       π(Θc )
                                            0            f (x|θ)π1 (θ)dθ
                                                    Θc
                                                     0


                                                [Good, 1958 & Jeffreys, 1961]
                                                                    Goto Poisson example
The 21st Bayesian Century
  Tests and model choice
     Bayes factors



Self-contained concept

      Having a high relative probability does not mean that a hypothesis is true
                                                      or supported by the data
                                                   Templeton, Mol. Ecol., 2009

      Non-decision-theoretic:
          ◮   eliminates choice of π(Θ0 )
          ◮   Bayesian/marginal equivalent to the likelihood ratio
          ◮   Jeffreys’ scale of evidence:
                                      π
                     ◮   if   log10 (B10 )   between 0 and 0.5, evidence against H0 weak,
                                      π
                     ◮   if   log10 (B10 )   0.5 and 1, evidence substantial,
                                      π
                     ◮   if   log10 (B10 )   1 and 2, evidence strong and
                                      π
                     ◮   if   log10 (B10 )   above 2, evidence decisive
The 21st Bayesian Century
  Tests and model choice
     Bayes factors



A major modification



           Considering whether a location parameter α is 0. The prior is uniform
          and we should have to take f (α) = 0 and B10 would always be infinite
                                                            Jeffreys, ToP, 1939

      When the null hypothesis is supported by a set of measure 0,
      π(Θ0 ) = 0 and thus π(Θ0 |x) = 0.
                                                     [End of the story?!]
The 21st Bayesian Century
  Tests and model choice
     Bayes factors



Changing the prior to fit the hypotheses


      Requirement
      Defined prior distributions under both assumptions,

                            π0 (θ) ∝ π(θ)IΘ0 (θ),   π1 (θ) ∝ π(θ)IΘ1 (θ),

      (under the standard dominating measures on Θ0 and Θ1 )

      Using the prior probabilities π(Θ0 ) = ̺0 and π(Θ1 ) = ̺1 ,

                                    π(θ) = ̺0 π0 (θ) + ̺1 π1 (θ).
The 21st Bayesian Century
  Tests and model choice
     Bayes factors



Point null hypotheses

        I have no patience for statistical methods that assign positive probability
           to point hypotheses of the θ = 0 type that can never actually be true
                                                               Gelman, BA, 2008

      Take ρ0 = Prπ (θ = θ0 ) and g1 prior density under Ha . Then


                              f (x|θ0 )ρ0               f (x|θ0 )ρ0
             π(Θ0 |x) =                     =
                             f (x|θ)π(θ) dθ   f (x|θ0 )ρ0 + (1 − ρ0 )m1 (x)

      and Bayes factor

                        π           f (x|θ0 )ρ0        ρ0     f (x|θ0 )
                       B01 (x) =                            =
                                   m1 (x)(1 − ρ0 )   1 − ρ0   m1 (x)
The 21st Bayesian Century
  Tests and model choice
     Bayes factors



Point null hypotheses (cont’d)


      Example (Normal mean)
      Test of H0 : θ = 0 when x ∼ N (θ, 1): we take π1 as N (0, τ 2 )

                            m1 (x)          σ2                  τ 2 x2
                                    =              exp
                            f (x|0)       σ2 + τ 2        2σ 2 (σ 2 + τ 2 )

      and the posterior probability is
                              τ /x        0      0.68    1.28     1.96
                                1       0.586   0.557    0.484    0.351
                               10       0.768   0.729    0.612    0.366
The 21st Bayesian Century
  Tests and model choice
     Opposition to classical tests



Comparison with classical tests



                  The 95 percent frequentist intervals will live up to their advertised
                                                                       coverage claims
                                                                Wasserman, BA, 2008

      Standard answer
      Definition (p-value)
      The p-value p(x) associated with a test is the largest significance
      level for which H0 is rejected
The 21st Bayesian Century
  Tests and model choice
     Opposition to classical tests



Problems with p-values



       The use of P implies that a hypothesis that may be true may be rejected
         because it had not predicted observable results that have not occurred
                                                             Jeffreys, ToP, 1939


          ◮    Evaluation of the wrong quantity, namely the probability to
               exceed the observed quantity.(wrong conditioning)
          ◮    Evaluation only under the null hypothesis
          ◮    Huge numerical difference with the Bayesian range of answers
The 21st Bayesian Century
  Tests and model choice
     Opposition to classical tests



Bayesian lower bounds

                                        If the Bayes estimator has good frequency behavior
                                          then we might as well use the frequentist method.
                                 If it has bad frequency behavior then we shouldn’t use it.
                                                                     Wasserman, BA, 2008

      Least favourable Bayesian answer is

                                                            f (x|θ0 )
                                     B(x, GA ) = inf                     ,
                                                        Θ f (x|θ)g(θ) dθ
                                                g∈GA

                                         ˆ
      i.e., if there exists a mle for θ, θ(x),

                                                         f (x|θ0 )
                                          B(x, GA ) =
                                                             ˆ
                                                        f (x|θ(x))
The 21st Bayesian Century
  Tests and model choice
     Opposition to classical tests



Illustration

      Example (Normal case)
      When x ∼ N (θ, 1) and H0 : θ0 = 0, the lower bounds are

                                         2 /2                               2 /2   −1
                 B(x, GA ) = e−x                and    P(x, GA ) = 1 + ex               ,

      i.e.
                               p-value      0.10       0.05    0.01   0.001
                                  P        0.205      0.128   0.035   0.004
                                  B        0.256      0.146   0.036   0.004
                                                                         [Quite different!]
The 21st Bayesian Century
  Tests and model choice
     Model choice



Model choice and model comparison



                 There is no null hypothesis, which complicates the computation of
                                                                    sampling error
                                                      Templeton, Mol. Ecol., 2009

      Choice among models
      Several models available for the same observation(s)

                              Mi : x ∼ fi (x|θi ),      i∈I

      where I can be finite or infinite
The 21st Bayesian Century
  Tests and model choice
     Model choice



Bayesian resolution
       The posterior probabilities are constructed by using a numerator that is a
           function of the observation for a particular model, then divided by a
                   denominator that ensures that the ”probabilities” sum to one
                                                    Templeton, Mol. Ecol., 2009

      Probabilise the entire model/parameter space
        ◮ allocate probabilities pi to all models Mi

        ◮ define priors πi (θi ) for each parameter space Θi

        ◮ compute


                                             pi         fi (x|θi )πi (θi )dθi
                                                   Θi
                            π(Mi |x) =
                                              pj         fj (x|θj )πj (θj )dθj
                                         j          Θj
The 21st Bayesian Century
  Tests and model choice
     Model choice



Bayesian resolution(2)


             The numerators are not co-measurable across hypotheses, and the
       denominators are sums of non-co-measurable entities. This means that it
                      is mathematically impossible for them to be probabilities.
                                                  Templeton, Mol. Ecol., 2009


          ◮   take largest π(Mi |x) to determine “best” model,
              or use averaged predictive

                                π(Mj |x)        fj (x′ |θj )πj (θj |x)dθj
                            j              Θj
The 21st Bayesian Century
  Tests and model choice
     Model choice



Natural Ockham’s razor

      Pluralitas non est ponenda sine neccesitate

           Variation is random until the
           contrary is shown; and new
           parameters in laws, when they
           are suggested, must be tested
           one at a time, unless there is
           specific reason to the contrary.

                              Jeffreys, ToP, 1939

      The Bayesian approach naturally weights differently models with
      different parameter dimensions (BIC).
The 21st Bayesian Century
  Tests and model choice
     Compatible priors



Compatibility principle


       Further complicating dimensionality of test statistics is the fact that the
      models are often not nested, and one model may contain parameters that
                      do not have analogues in the other models and vice versa
                                                   Templeton, Mol. Ecol., 2009

      Difficulty of finding simultaneously priors on a collection of models
      Easier to start from a single prior on a “big” [encompassing] model
      and to derive others from a coherence principle
                                                [Dawid & Lauritzen, 2000]
                                                                  Raw regression output
The 21st Bayesian Century
  Tests and model choice
     Compatible priors



An illustration for linear regression
      In the case M1 and M2 are two nested Gaussian linear regression
      models with Zellner’s g-priors and the same variance σ 2 ∼ π(σ 2 ):
          ◮   M1 : y|β1 , σ 2 ∼ N (X1 β1 , σ 2 ) with

                             β1 |σ 2 ∼ N s1 , σ 2 n1 (X1 X1 )−1
                                                       T


              where X1 is a (n × k1 ) matrix of rank k1 ≤ n
          ◮   M2 : y|β2 , σ 2 ∼ N (X2 β2 , σ 2 ) with

                             β2 |σ 2 ∼ N s2 , σ 2 n2 (X2 X2 )−1 ,
                                                       T


              where X2 is a (n × k2 ) matrix with span(X2 ) ⊆ span(X1 )
                                           [ c Marin & Robert, Bayesian Core]
The 21st Bayesian Century
  Tests and model choice
     Compatible priors



Compatible g-priors

       I don’t see any role for squared error loss, minimax, or the rest of what is
                                       sometimes called statistical decision theory
                                                                Gelman, BA, 2008

      Since σ 2 is a nuisance parameter, minimize the Kullback-Leibler
      divergence between both marginal distributions conditional on σ 2 :
      m1 (y|σ 2 ; s1 , n1 ) and m2 (y|σ 2 ; s2 , n2 ), with solution

                            β2 |X2 , σ 2 ∼ N s∗ , σ 2 n∗ (X2 X2 )−1
                                              2        2
                                                           T


      with
                             s∗ = (X2 X2 )−1 X2 X1 s1
                              2
                                    T         T
                                                          n∗ = n1
                                                           2
The 21st Bayesian Century
  Tests and model choice
     Variable selection



Variable selection


 Regression setup where y regressed on a
 set {x1 , . . . , xp } of p potential
 explanatory regressors (plus intercept)
 Corresponding 2p submodels Mγ , where
 γ ∈ Γ = {0, 1}p indicates
 inclusion/exclusion of variables by a
 binary representation,
 e.g. γ = 101001011 means that x1 , x3 ,
 x5 , x7 and x8 are included.
The 21st Bayesian Century
  Tests and model choice
     Variable selection



Notations
      For model Mγ ,
          ◮    qγ variables included
          ◮    t1 (γ) = {t1,1 (γ), . . . , t1,qγ (γ)} indices of those variables and
               t0 (γ) indices of the variables not included
          ◮    For β ∈ Rp+1 ,

                             βt1 (γ) =      β0 , βt1,1 (γ) , . . . , βt1,qγ (γ)

                            Xt1 (γ) =       1n |xt1,1 (γ) | . . . |xt1,qγ (γ) .

      Submodel Mγ is thus

                            y|β, γ, σ 2 ∼ N Xt1 (γ) βt1 (γ) , σ 2 In
The 21st Bayesian Century
  Tests and model choice
     Variable selection



Global and compatible priors
      Use Zellner’s g-prior, i.e. a normal prior for β conditional on σ 2 ,
                                           ˜
                                β|σ 2 ∼ N (β, cσ 2 (X T X)−1 )

      and a Jeffreys prior for σ 2 ,

                                          π(σ 2 ) ∝ σ −2

                                                                              Noninformative g


      Resulting compatible prior

                                              −1                                       −1
        βt1 (γ) ∼ N          T
                            Xt1 (γ) Xt1 (γ)         T        ˜
                                                   Xt1 (γ) X β, cσ 2 Xt1 (γ) Xt1 (γ)
                                                                      T
The 21st Bayesian Century
  Tests and model choice
     Variable selection



Posterior model probability

      Can be obtained in closed form:
                                                                                       −n/2
                                                          ˜           ˜ 2y T P1 X β
                                               cy T P1 y β T X T P1 X β           ˜
                             −(qγ +1)/2    T
      π(γ|y) ∝ (c+1)                      y y−          +               −                     .
                                                c+1          c+1           c+1

      Conditionally on γ, posterior distributions of β and σ 2 :
                                      c                 ˜     σ2 c                    −1
      βt1 (γ) |σ 2 , y, γ    ∼   N       (U1 y + U1 X β/c),             T
                                                                      Xt1 (γ) Xt1 (γ)    ,
                                    c+1                      c+1
                                     n yT y    cy T P1 y    ˜            ˜ y T P1 X β
                                                            β T X T P1 X β             ˜
                 σ 2 |y, γ   ∼   IG    ,    −             +                 −            .
                                     2 2       2(c + 1)        2(c + 1)          c+1
The 21st Bayesian Century
  Tests and model choice
     Variable selection



Noninformative case


      Use the same compatible informative g-prior distribution with
      ˜
      β = 0p+1 and a hierarchical diffuse prior distribution on c,

                            π(c) ∝ c−1 IN∗ (c)     or   π(c) ∝ c−1 Ic>0

                                                                          Recall g-prior

      The choice of this hierarchical diffuse prior distribution on c is due
      to the model posterior sensitivity to large values of c:

                    Taking         ˜
                                   β = 0p+1      and c large does not work
The 21st Bayesian Century
  Tests and model choice
     Variable selection



Processionary caterpillar

      Influence of some forest settlement characteristics on the
      development of caterpillar colonies




      Response y log-transform of the average number of nests of
      caterpillars per tree on an area of 500 square meters (n = 33 areas)
                                        [ c Marin & Robert, Bayesian Core]
The 21st Bayesian Century
  Tests and model choice
     Variable selection



Processionary caterpillar (cont’d)

      Potential explanatory variables
           x                               x2                         x3
                   1
               x1 altitude (in meters), x2 slope (in degrees),
               x3 number of pines in the square,
               x4 height (in meters) of the tree at the center of the square,
               x5 diameter of the tree at the center of the square,
               x6 index of the settlement density,
               xx4orientation of the squarex(from 1 if southb’d to 2 ow),
                7
                                            5                         x6

               x8 height (in meters) of the dominant tree,
               x9 number of vegetation strata,
               x10 mix settlement index (from 1 if not mixed to 2 if mixed).


                 x                         x8                         x9
The 21st Bayesian Century
  Tests and model choice
     Variable selection



Bayesian regression output
                            Estimate   BF       log10(BF)

         (Intercept)        9.2714     26.334   1.4205 (***)
         X1                 -0.0037    7.0839   0.8502 (**)
         X2                 -0.0454    3.6850   0.5664 (**)
         X3                 0.0573     0.4356   -0.3609
         X4                 -1.0905    2.8314   0.4520 (*)
         X5                 0.1953     2.5157   0.4007 (*)
         X6                 -0.3008    0.3621   -0.4412
         X7                 -0.2002    0.3627   -0.4404
         X8                 0.1526     0.4589   -0.3383
         X9                 -1.0835    0.9069   -0.0424
         X10                -0.3651    0.4132   -0.3838
      evidence against H0: (****) decisive, (***) strong, (**)
      subtantial, (*) poor
The 21st Bayesian Century
  Tests and model choice
     Variable selection



Bayesian variable selection

                                 t1 (γ)         π(γ|y, X)
                                 0,1,2,4,5          0.0929
                                 0,1,2,4,5,9        0.0325
                                 0,1,2,4,5,10       0.0295
                                 0,1,2,4,5,7        0.0231
                                 0,1,2,4,5,8        0.0228
                                 0,1,2,4,5,6        0.0228
                                 0,1,2,3,4,5        0.0224
                                 0,1,2,3,4,5,9      0.0167
                                 0,1,2,4,5,6,9      0.0167
                                 0,1,2,4,5,8,9      0.0137
                            Noninformative G-prior model choice
The 21st Bayesian Century
  Bayesian Calculations




Bayesian Calculations

                 Bayesian methods seem to quickly move to elaborate computation
                                                              Gelman, BA, 2008


      Introduction

      Tests and model choice

      Bayesian Calculations
         Implementation difficulties
         Bayes factor approximation
         ABC model choice

      A Defense of the Bayesian Choice
The 21st Bayesian Century
  Bayesian Calculations
     Implementation difficulties



B Implementation difficulties
          ◮   Computing the posterior distribution

                                      π(θ|x) ∝ π(θ)f (x|θ)


          ◮   Resolution of

                                 arg min       L(θ, δ)π(θ)f (x|θ)dθ
                                           Θ


          ◮   Maximisation of the marginal posterior

                                   arg max            π(θ|x)dθ−1
                                                Θ−1
The 21st Bayesian Century
  Bayesian Calculations
     Implementation difficulties



B Implementation further difficulties
         A statistical test returns a probability value, but rarely is the probability
                    value per se the reason for an investigator performing the test
                                                       Templeton, Mol. Ecol., 2009


          ◮   Computing posterior quantities

                                                                h(θ) π(θ)f (x|θ)dθ
                          δ π (x) =       h(θ) π(θ|x)dθ =   Θ
                                      Θ                             π(θ)f (x|θ)dθ
                                                                Θ


          ◮   Resolution (in k) of

                                            P (π(θ|x) ≥ k|x) = α
The 21st Bayesian Century
  Bayesian Calculations
     Implementation difficulties



Monte Carlo methods
                Bayesian simulation seems stuck in an infinite regress of inferential
                                                                       uncertainty
                                                                Gelman, BA, 2008

      Approximation of

                                 I=       g(θ)f (x|θ)π(θ) dθ,
                                      Θ

      takes advantage of the fact that f (x|θ)π(θ) is proportional to a
      density: If the θi ’s are from π(θ),
                                          m
                                      1
                                                g(θi )f (x|θi )
                                      m
                                          i=1

      converges (almost surely) to I
The 21st Bayesian Century
  Bayesian Calculations
     Implementation difficulties



Importance function

                      A simulation method of inference hides unrealistic assumptions
                                                       Templeton, Mol. Ecol., 2009

      No need to simulate from π(·|x) or from π: if h is a probability
      density,

                                                   g(θ)f (x|θ)π(θ)
                            g(θ)f (x|θ)π(θ) dθ =                   h(θ) dθ
                      Θ                                 h(θ)

      and         m
                  i=1 g(θi )ω(θi )                          f (x|θi )π(θi )
                    m                       with ω(θi ) =
                    i=1 ω(θi )                                  h(θi )
      approximates Eπ [g(θ)|x]
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



Bayes factor approximation
          ABC’s   When approximating the Bayes factor

                                               f1 (x|θ1 )π1 (θ1 )dθ1
                                          Θ1                               Z1
                                  B12 =                                =
                                                                           Z2
                                               f2 (x|θ2 )π2 (θ2 )dθ2
                                          Θ2

      use of importance functions ̟1 and ̟2 and
                                     n1
                           n−1                  i       i        i
                                     i=1 f1 (x|θ1 )π1 (θ1 )/̟1 (θ1 )        i
                  B12     = 1        n2                                    θj ∼ ̟j (θ)
                           n−1
                            2
                                                i       i        i
                                     i=1 f2 (x|θ2 )π2 (θ2 )/̟2 (θ2 )

                                                                [Chopin & Robert, 2007]
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



Bridge sampling


      Special case:
      If
                                        π1 (θ1 |x) ∝ π1 (θ1 |x)
                                                     ˜
                                        π2 (θ2 |x) ∝ π2 (θ2 |x)
                                                     ˜
      live on the same space (Θ1 = Θ2 ), then
                                         n
                                    1         π1 (θi |x)
                                              ˜
                            B12 ≈                          θi ∼ π2 (θ|x)
                                    n         π2 (θi |x)
                                              ˜
                                        i=1

                            [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



(Further) bridge sampling

      In addition

                                       π2 (θ|x)α(θ)π1 (θ|x)dθ
                                       ˜
                  B12 =                                             ∀ α(·)
                                       π1 (θ|x)α(θ)π2 (θ|x)dθ
                                       ˜

                                        n1
                                  1
                                             π2 (θ1i |x)α(θ1i )
                                             ˜
                                  n1
                                       i=1
                            ≈           n2                        θji ∼ πj (θ|x)
                                  1
                                             π1 (θ2i |x)α(θ2i )
                                             ˜
                                  n2
                                       i=1
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



Optimal bridge sampling

      The optimal choice of auxiliary function is
                                                       n1 + n2
                                  α⋆ (θ) =
                                              n1 π1 (θ|x) + n2 π2 (θ|x)

      leading to
                                         n1
                                    1                    π2 (θ1i |x)
                                                          ˜
                                    n1         n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
                                         i=1
                            B12 ≈         n2
                                    1                    π1 (θ2i |x)
                                                          ˜
                                    n2         n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
                                         i=1

                                                                                 Back later!
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                 ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                     dθk =
                πk (θk )Lk (θk )       πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                T               (t)
                                            1             ϕ(θk )
                                  Z1k = 1                 (t)         (t)
                                            T         πk (θk )Lk (θk )
                                                t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



Approximating Z using a mixture representation



                                                                              Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                  ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
         1. Take δ (t) = 1 with probability

                            (t−1)         (t−1)              (t−1)         (t−1)          (t−1)
              ω1 πk (θk             )Lk (θk       )   ω1 πk (θk      )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                              (t)                 (t−1)
         2. If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
            MCMC(θk , θk   ′ ) denotes an arbitrary MCMC kernel associated

            with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                              (t)
         3. If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                            T
           ˆ 1
           ξ=
                                       (t)      (t)
                                ω1 πk (θk )Lk (θk )
                                                              (t)          (t)
                                                      ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
                                                                                        (t)
              T
                          t=1

      converges to ω1 Zk /{ω1 Zk + 1}
               3k
                           ˆ       ˆ        ˆ
      Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie

                            T           (t)     (t)            (t)          (t)          (t)
                            t=1 ω1 πk (θk )Lk (θk )    ω1 π(θk )Lk (θk ) + ϕ(θk )
           ˆ
           Z3k =
                                   T      (t)           (t)          (t)          (t)
                                   t=1 ϕ(θk )    ω1 πk (θk )Lk (θk ) + ϕ(θk )

                                                                           [Bridge sampler]
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



Chib’s representation


   Direct application of Bayes’ theorem: given
   x ∼ fk (x|θk ) and θk ∼ πk (θk ),

                                  fk (x|θk ) πk (θk )
             Zk = mk (x) =
                                      πk (θk |x)

   Use of an approximation to the posterior
                                          ∗       ∗
                                  fk (x|θk ) πk (θk )
            Zk = mk (x) =                             .
                                       ˆ ∗
                                      πk (θk |x)
The 21st Bayesian Century
  Bayesian Calculations
     Bayes factor approximation



Case of latent variables



      For missing variable z as in mixture models, natural Rao-Blackwell
      estimate
                                       T
                            ∗       1          ∗      (t)
                       πk (θk |x) =       πk (θk |x, zk ) ,
                                    T
                                     t=1
                            (t)
      where the zk ’s are Gibbs sampled latent variables
The 21st Bayesian Century
  Bayesian Calculations
     ABC model choice



Approximate Bayesian Computation

      Simulation target is π(θ)f (x|θ) with likelihood f (x|θ) not in
      closed form.
      Likelihood-free rejection technique:
      ABC algorithm
      For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
      simulating
                           θ′ ∼ π(θ) , x ∼ f (x|θ′ ) ,
      until the auxiliary variable x is equal to the observed value, x = y.

                                                    [Pritchard et al., 1999]
The 21st Bayesian Century
  Bayesian Calculations
     ABC model choice



A as approximative


      When y is a continuous random variable, equality x = y is replaced
      with a tolerance condition,

                                          ̺(x, y) ≤ ǫ

      where ̺ is a distance between summary statistics
      Output distributed from

                            π(θ) Pθ {̺(x, y) < ǫ} ∝ π(θ|̺(x, y) < ǫ)
The 21st Bayesian Century
  Bayesian Calculations
     ABC model choice



Gibbs random fields

      Gibbs distribution
      The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with
      the graph G if

                                      1
                            f (y) =     exp −         Vc (yc )   ,
                                      Z
                                                c∈C

      where Z is the normalising constant, C is the set of cliques of G
      and Vc is any function also called potential
      U (y) = c∈C Vc (yc ) is the energy function

       c Z is usually unavailable in closed form
The 21st Bayesian Century
  Bayesian Calculations
     ABC model choice



Potts model
      Potts model
      Vc (y) is of the form

                            Vc (y) = θS(y) = θ         δyl =yi
                                                 l∼i

      where l∼i denotes a neighbourhood structure

      In most realistic settings, summation

                              Zθ =         exp{θT S(x)}
                                     x∈X

      involves too many terms to be manageable and numerical
      approximations cannot always be trusted
                      [Cucala, Marin, CPR & Titterington, JASA, 2009]
The 21st Bayesian Century
  Bayesian Calculations
     ABC model choice



Neighbourhood relations



      Choice to be made between M neighbourhood relations
                             m
                            i ∼ i′     (0 ≤ m ≤ M − 1)

      with
                                 Sm (x) =          I{xi =xi′ }
                                             m
                                            i∼i′

      driven by the posterior probabilities of the models.
The 21st Bayesian Century
  Bayesian Calculations
     ABC model choice



Model index



      Formalisation via a model index M, new parameter with prior
      distribution π(M = m) and π(θ|M = m) = πm (θm )
      Computational target:

                P(M = m|x) ∝        fm (x|θm )πm (θm ) dθm π(M = m)
                               Θm
The 21st Bayesian Century
  Bayesian Calculations
     ABC model choice



Sufficient statistics
      If S(x) sufficient statistic for the joint parameters
      (M, θ0 , . . . , θM −1 ),
                            P(M = m|x) = P(M = m|S(x)) .
      For each model m, sufficient statistic Sm (·) makes
      S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient.
      For Gibbs random fields,
                                         1           2
                 x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm )
                                            1
                                      =         f 2 (S(x)|θm )
                                        n(S(x)) m
      where
                            n(S(x)) = ♯ {˜ ∈ X : S(˜ ) = S(x)}
                                         x         x
       c S(x) also sufficient for the joint parameters
                                     [Specific to Gibbs random fields!]
The 21st Bayesian Century
  Bayesian Calculations
     ABC model choice



ABC model choice Algorithm


      ABC-MC
          ◮   Generate m∗ from the prior π(M = m).
          ◮             ∗
              Generate θm∗ from the prior πm∗ (·).
          ◮                                      ∗
              Generate x∗ from the model fm∗ (·|θm∗ ).
          ◮   Compute the distance ρ(S(x0 ), S(x∗ )).
          ◮   Accept (θm∗ , m∗ ) if ρ(S(x0 ), S(x∗ )) < ǫ.
                       ∗


                              [Cornuet, Grelaud, Marin & Robert, BA, 2008]

      Note When ǫ = 0 the algorithm is exact
The 21st Bayesian Century
  Bayesian Calculations
     ABC model choice



Toy example

      iid Bernoulli model versus two-state first-order Markov chain, i.e.
                                             n
                   f0 (x|θ0 ) = exp θ0           I{xi =1}   {1 + exp(θ0 )}n ,
                                         i=1

      versus
                                         n
                            1
             f1 (x|θ1 ) =     exp θ1         I{xi =xi−1 }    {1 + exp(θ1 )}n−1 ,
                            2
                                       i=2

      with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase
      transition” boundaries).
The 21st Bayesian Century
  Bayesian Calculations
     ABC model choice



Toy example (2)




      (left) Comparison of the true BF m0 /m1 (x0 ) with BF m0 /m1 (x0 )
      (in logs) over 2, 000 simulations and 4.106 proposals from the
      prior. (right) Same when using tolerance ǫ corresponding to the
      1% quantile on the distances.
The 21st Bayesian Century
  A Defense of the Bayesian Choice




A Defense of the Bayesian Choice


                  Given the advances in practical Bayesian methods in the past two
                            decades, anti-Bayesianism is no longer a serious option
                                                                Gelman, BA, 2009



                   Bayesians are of course their own worst enemies. They make
         non-Bayesians accuse them of religious fervour, and an unwillingness to
                                                      see another point of view.
                                                                Davidson, 2009
The 21st Bayesian Century
  A Defense of the Bayesian Choice




1. Choosing a probabilistic representation



                             Bayesian statistics is about making probability statements
                                                                    Gelman, BA, 2009

              Bayesian Statistics appears as the calculus of uncertainty
      Reminder:
      A probabilistic model is nothing but an interpretation of a given
      phenomenon
      What is the meaning of RD’s t test example?!
The 21st Bayesian Century
  A Defense of the Bayesian Choice




1. Choosing a probabilistic representation (2)


                                                   Inference is impossible.
                                                           Davidson, 2009


      The Bahadur–Savage problem stems from the inability to make
      choices about the shape of a statistical model, not from an
      impossibility to draw [Bayesian] inference.
      Further, a probability distribution is more than the sum of its
      moments. Ill-posed problems thus highlight issues with the model,
      not the inference.
The 21st Bayesian Century
  A Defense of the Bayesian Choice




2. Conditioning on the data


           Bayesian data analysis is a method for summarizing uncertainty and
      making estimates and predictions using probability statements conditional
                                       on observed data and an assumed model
                                                             Gelman, BA, 2009

      At the basis of statistical inference lies an inversion process
      between cause and effect. Using a prior distribution brings a
      necessary balance between observations and parameters and enable
      to operate conditional upon x
      What is the data in RD’s t test example?! U ’s? Y ’s?
The 21st Bayesian Century
  A Defense of the Bayesian Choice




3. Exhibiting the true likelihood


        Frequentist statistics is an approach for evaluating statistical procedures
                        conditional on some family of posited probability models
                                                                Gelman, BA, 2009

      Provides a complete quantitative inference on the parameters and
      predictive that points out inadequacies of frequentist statistics,
      while implementing the Likelihood Principle.
      There needs to be a true likelihood, including in
      non-parametric settings
                                            [Rousseau, Van der Vaart]
The 21st Bayesian Century
  A Defense of the Bayesian Choice




4. Using priors as tools and summaries


              Bayesian techniques allow prior beliefs to be tested and discarded as
                                                                        appropriate
                                                                 Gelman, BA, 2009

      The choice of a prior distribution π does not require any kind of
      belief in this distribution: rather consider it as a tool that
      summarizes the available prior information and the uncertainty
      surrounding this information
      Non-identifiability is an issue in that the prior may strongly
      impact inference about identifiable bits
The 21st Bayesian Century
  A Defense of the Bayesian Choice




4. Using priors as tools and summaries (2)



                                     No uninformative prior exists for such models.
                                                                    Davidson, 2009

      Reference priors can be deduced from the sampling distribution by
      an automated procedure, based on a minimal information principle
      that maximises the information brought by the data.
      Important literature on prior modelling for non-parametric
      problems, incl. smoothness constraints.
The 21st Bayesian Century
  A Defense of the Bayesian Choice




5. Accepting the subjective basis of knowledge

      Knowledge is a critical confrontation between a prioris and
      experiments. Ignoring these a prioris impoverishes analysis.
                  We have, for one thing, to use a language and our
              language is entirely made of preconceived ideas and has to be
              so. However, these are unconscious preconceived ideas, which
              are a million times more dangerous than the other ones. Were
              we to assert that if we are including other preconceived ideas,
              consciously stated, we would aggravate the evil! I do not
              believe so: I rather maintain that they would balance one
              another.
                                                              Henri Poincar´, 1902
                                                                            e
The 21st Bayesian Century
  A Defense of the Bayesian Choice




6. Choosing a coherent system of inference

                        Bayesian data analysis has three stages: formulating a model,
                             splitting the model to data, and checking the model fit.
                            The second step—inference—gets most of the attention,
                                        but the procedure as a whole is not automatic
                                                                    Gelman, BA, 2009

      To force inference into a decision-theoretic mold allows for a
      clarification of the way inferential tools should be evaluated, and
      therefore implies a conscious (although subjective) choice of the
      retained optimality.
      Logical inference process Start with requested properties, i.e.
      loss function and prior distribution, then derive the best solution
      satisfying these properties.
The 21st Bayesian Century
  A Defense of the Bayesian Choice




6. Choosing a coherent system of inference (2)


                                             Asymptopia annoys Bayesians.
                                                          Davidson, 2009

      Asymptotics [for inference] sounds for a proxy for not specifying
      completely the model and thus for using another model. While
      asymptotics [for simulation] is quite acceptable. Bayesian inference
      does not escape asymptotic difficulties, see e.g. mixtures.
      NP Bootstrap aims at inference with no[t enough]
      modelling, while P Bayesian bootstrap is essentially using the
      Bayesian predictive
The 21st Bayesian Century
  A Defense of the Bayesian Choice




7. Looking for optimal frequentist procedures


       At intermediate levels of a Bayesian model, frequency properties typically
                  take care of themselves. It is typically only at the top level of
                                  unreplicated parameters that we have to worry
                                                               Gelman, BA, 2009

      Bayesian inference widely intersects with the three notions of
      minimaxity, admissibility and equivariance (Haar). Looking for an
      optimal estimator most often ends up finding a Bayes estimator.
                Optimality is easier to attain through the Bayes “filter”
The 21st Bayesian Century
  A Defense of the Bayesian Choice




8. Solving the actual problem



        Frequentist methods have coverage guarantees; Bayesian methods don’t.
                                                  In science, coverage matters
                                                         Wasserman, BA, 2009

      Frequentist methods justified on a long-term basis, i.e., from the
      statistician viewpoint. From a decision-maker’s point of view, only
      the problem at hand matters! That is, he/she calls for an inference
      conditional on x.
The 21st Bayesian Century
  A Defense of the Bayesian Choice




9. Providing a universal system of inference



                  Bayesian methods are presented as an automatic inference engine
                                                              Gelman, BA, 2009

      Given the three factors

                            (X , f (x|θ),   (Θ, π(θ)),   (D, L(θ, d)) ,

      the Bayesian approach validates one and only one inferential
      procedure
The 21st Bayesian Century
  A Defense of the Bayesian Choice




10. Computing procedures as a minimization problem



        The discussion of computational issues should not be allowed to obscure
                            the need for further analysis of inferential questions
                                                             Bernardo, BA, 2009

      Bayesian procedures are easier to compute than procedures of
      alternative theories, in the sense that there exists a universal
      method for the computation of Bayes estimators
      Convergence assessment is an issue, but recent developments
      in adaptive MCMC allow for more confidence in the output

Más contenido relacionado

Más de Christian Robert

Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 

Más de Christian Robert (20)

Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 

Último

Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 

Último (20)

Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 

The 21st Century Belongs to Bayes

  • 1. The 21st Bayesian Century “The 21st Century belongs to Bayes” as argued by a discussion on Bayesian testing and Bayesian model choice Christian P. Robert Universit´ Paris Dauphine and CREST-INSEE e http://www.ceremade.dauphine.fr/~xian http://xianblog.wordpress.com July 1, 2009
  • 2. The 21st Bayesian Century A consequence of Bayesian statistics being given a proper name is that it encourages too much historical deference from people who think that the bibles of Jeffreys, de Finetti, Jaynes, and others have all the answers. —Gelman, Bayesian Analysis 3(3), 2008
  • 3. The 21st Bayesian Century Outline Anyone not shocked by the Bayesian theory of inference has not understood it Senn, BA., 2008 Introduction Tests and model choice Bayesian Calculations A Defense of the Bayesian Choice
  • 4. The 21st Bayesian Century Introduction Vocabulary and concepts Bayesian inference is a coherent mathematical theory but I don’t trust it in scientific applications. Gelman, BA, 2008 Introduction Models The Bayesian framework Improper prior distributions Noninformative prior distributions Tests and model choice Bayesian Calculations A Defense of the Bayesian Choice
  • 5. The 21st Bayesian Century Introduction Models Parametric model Bayesians promote the idea that a multiplicity of parameters can be handled via hierarchical, typically exchangeable, models, but it seems implausible that this could really work automatically [instead of] giving reasonable answers using minimal assumptions. Gelman, BA, 2008 Observations x1 , . . . , xn generated from a probability distribution fi (xi |θi , x1 , . . . , xi−1 ) = fi (xi |θi , x1:i−1 ) x = (x1 , . . . , xn ) ∼ f (x|θ), θ = (θ1 , . . . , θn ) Associated likelihood ℓ(θ|x) = f (x|θ) [inverted density & starting point]
  • 6. The 21st Bayesian Century Introduction Models And [B] nonparametrics?! Equally very active and definitely very 21st, thank you, but not mentioned in this talk! 7th Workshop on Bayesian Nonparametrics - Collegio... http://bnpworkshop.carloalberto.org/ 21 - 25 June 2009, Moncalieri The 7th Workshop on Bayesian Nonparametrics will be held at the Collegio Carlo Alberto from June 21 to 25, 2009. The Collegio is a Research Institution housed in an historical building located in Moncalieri on the outskirts of Turin, Italy. The meeting will feature the latest developments in the area and will cover a wide variety of both theoretical and applied topics such as: foundations of the Bayesian nonparametric approach, construction and properties of prior distributions, asymptotics, interplay with probability theory and stochastic processes, statistical modelling, computational algorithms and applications in machine learning, biostatistics, bioinformatics, economics and econometrics. The Workshop will be structured in 4 tutorials on special topics, a series of invited talks and contributed posters sessions. News Tentative Workshop Schedule Abstract Book (last updated 27th May 2009) Workshop Poster
  • 7. The 21st Bayesian Century Introduction The Bayesian framework Bayes theorem 101 Bayes theorem = Inversion of probabilities If A and E are events such that P (E) = 0, P (A|E) and P (E|A) are related by P (A|E) = P (E|A)P (A) P (E|A)P (A) + P (E|Ac )P (Ac ) P (E|A)P (A) = P (E) [Thomas Bayes (?)]
  • 8. The 21st Bayesian Century Introduction The Bayesian framework Bayesian approach The impact of treating x as a fixed constant is to increase statistical power as an artefact Templeton, Molec. Ecol., 2009 New perspective ◮ Uncertainty on the parameters θ of a model modeled through a probability distribution π on Θ, called prior distribution ◮ Inference based on the distribution of θ conditional on x, π(θ|x), called posterior distribution f (x|θ)π(θ) π(θ|x) = . f (x|θ)π(θ) dθ
  • 9. The 21st Bayesian Century Introduction The Bayesian framework [Nonphilosophical] justifications Ignoring the sampling error of x undermines the statistical validity of all inferences made by the method Templeton, Molec. Ecol., 2009 ◮ Semantic drift from unknown to random ◮ Actualization of the information on θ by extracting the information on θ contained in the observation x ◮ Allows incorporation of imperfect information in the decision process ◮ Unique mathematical way to condition upon the observations (conditional perspective) ◮ Unique way to give meaning to statements like P(θ > 0)
  • 10. The 21st Bayesian Century Introduction The Bayesian framework Posterior distribution Bayesian methods are presented as an automatic inference engine, and this raises suspicion in anyone with applied experience Gelman, BA, 2008 π(θ|x) central to Bayesian inference ◮ Operates conditional upon the observations ◮ Incorporates the requirement of the Likelihood Principle ◮ Avoids averaging over the unobserved values of x ◮ Coherent updating of the information available on θ ◮ Provides a complete inferential machinery
  • 11. The 21st Bayesian Century Introduction Improper prior distributions Improper distributions If we take P (dσ) ∝ dσ as a statement that σ may have any value between 0 and ∞ (...), we must use ∞ instead of 1 to denote certainty. Jeffreys, ToP, 1939 Necessary extension from a prior distribution to a prior σ-finite measure π such that π(θ) dθ = +∞ Θ Improper prior distribution [Weird? Inappropriate?? report!! ]
  • 12. The 21st Bayesian Century Introduction Improper prior distributions Justifications If the parameter may have any value from −∞ to +∞, its prior probability should be taken as uniformly distributed Jeffreys, ToP, 1939 Automated prior determination often leads to improper priors 1. Similar performances of estimators derived from these generalized distributions 2. Improper priors as limits of proper distributions in many [mathematical] senses
  • 13. The 21st Bayesian Century Introduction Improper prior distributions More justifications There is no good objective principle for choosing a noninformative prior (even if that concept were mathematically defined, which it is not) Gelman, BA, 2008 4. Robust answer against possible misspecifications of the prior 5. Frequencial justifications, such as: (i) minimaxity (ii) admissibility (iii) invariance (Haar measure) 6. Improper priors [much] prefered to vague proper priors like N (0, 106 )
  • 14. The 21st Bayesian Century Introduction Improper prior distributions Validation The mistake is to think of them as representing ignorance Lindley, JASA, 1990 Extension of the posterior distribution π(θ|x) associated with an improper prior π as given by Bayes’s formula f (x|θ)π(θ) π(θ|x) = , Θ f (x|θ)π(θ) dθ when f (x|θ)π(θ) dθ < ∞ Θ Delete all emotional names
  • 15. The 21st Bayesian Century Introduction Noninformative prior distributions Noninformative priors ...cannot be expected to represent exactly total ignorance about the problem, but should rather be taken as reference priors, upon which everyone could fall back when the prior information is missing. Kass and Wasserman, JASA, 1996 What if all we know is that we know “nothing” ?! In the absence of prior information, prior distributions solely derived from the sample distribution f (x|θ) Difficulty with uniform priors, lacking invariance properties.
  • 16. The 21st Bayesian Century Introduction Noninformative prior distributions Jeffreys’ prior If we took the prior density for the parameters to be proportional to |I(θ)|1/2 , it could be stated for any law that is differentiable with respect to all parameters that the total probability in any region of the θi would ′ be equal to the total probability in the corresponding region of the θi Jeffreys, ToP, 1939 Based on Fisher information ∂ℓ ∂ℓ I(θ) = Eθ ∂θT ∂θ Jeffreys’ prior distribution is π ∗ (θ) ∝ |I(θ)|1/2
  • 17. The 21st Bayesian Century Tests and model choice Tests and model choice The Jeffreys-subjective synthesis betrays a much more dangerous confusion than the Neyman-Pearson-Fisher synthesis as regards hypothesis tests Senn, BA, 2008 Introduction Tests and model choice Bayesian tests Bayes factors Opposition to classical tests Model choice Compatible priors Variable selection
  • 18. The 21st Bayesian Century Tests and model choice Bayesian tests Construction of Bayes tests What is almost never used, however, is the Jeffreys significance test. Senn, BA, 2008 Definition (Test) Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a statistical model, a test is a statistical procedure that takes its values in {0, 1}. Example (Normal mean) For x ∼ N (θ, 1), decide whether or not θ ≤ 0.
  • 19. The 21st Bayesian Century Tests and model choice Bayesian tests Decision-theoretic perspective Loss functions [are] not relevant to statistical inference Gelman, BA, 2008 Theorem (Optimal Bayes decision) Under the 0 − 1 loss function  0  if d = IΘ0 (θ) L(θ, d) = a0 if d = 1 and θ ∈ Θ0   a1 if d = 0 and θ ∈ Θ0 the Bayes procedure is 1 if Prπ (θ ∈ Θ0 |x) ≥ a0 /(a0 + a1 ) δ π (x) = 0 otherwise
  • 20. The 21st Bayesian Century Tests and model choice Bayes factors A function of posterior probabilities The method posits two or more alternative hypotheses and tests their relative fits to some observed statistics Templeton, Mol. Ecol., 2009 Definition (Bayes factors) For hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 f (x|θ)π0 (θ)dθ π(Θ0 |x) π(Θ0 ) Θ0 B01 = = π(Θc |x) 0 π(Θc ) 0 f (x|θ)π1 (θ)dθ Θc 0 [Good, 1958 & Jeffreys, 1961] Goto Poisson example
  • 21. The 21st Bayesian Century Tests and model choice Bayes factors Self-contained concept Having a high relative probability does not mean that a hypothesis is true or supported by the data Templeton, Mol. Ecol., 2009 Non-decision-theoretic: ◮ eliminates choice of π(Θ0 ) ◮ Bayesian/marginal equivalent to the likelihood ratio ◮ Jeffreys’ scale of evidence: π ◮ if log10 (B10 ) between 0 and 0.5, evidence against H0 weak, π ◮ if log10 (B10 ) 0.5 and 1, evidence substantial, π ◮ if log10 (B10 ) 1 and 2, evidence strong and π ◮ if log10 (B10 ) above 2, evidence decisive
  • 22. The 21st Bayesian Century Tests and model choice Bayes factors A major modification Considering whether a location parameter α is 0. The prior is uniform and we should have to take f (α) = 0 and B10 would always be infinite Jeffreys, ToP, 1939 When the null hypothesis is supported by a set of measure 0, π(Θ0 ) = 0 and thus π(Θ0 |x) = 0. [End of the story?!]
  • 23. The 21st Bayesian Century Tests and model choice Bayes factors Changing the prior to fit the hypotheses Requirement Defined prior distributions under both assumptions, π0 (θ) ∝ π(θ)IΘ0 (θ), π1 (θ) ∝ π(θ)IΘ1 (θ), (under the standard dominating measures on Θ0 and Θ1 ) Using the prior probabilities π(Θ0 ) = ̺0 and π(Θ1 ) = ̺1 , π(θ) = ̺0 π0 (θ) + ̺1 π1 (θ).
  • 24. The 21st Bayesian Century Tests and model choice Bayes factors Point null hypotheses I have no patience for statistical methods that assign positive probability to point hypotheses of the θ = 0 type that can never actually be true Gelman, BA, 2008 Take ρ0 = Prπ (θ = θ0 ) and g1 prior density under Ha . Then f (x|θ0 )ρ0 f (x|θ0 )ρ0 π(Θ0 |x) = = f (x|θ)π(θ) dθ f (x|θ0 )ρ0 + (1 − ρ0 )m1 (x) and Bayes factor π f (x|θ0 )ρ0 ρ0 f (x|θ0 ) B01 (x) = = m1 (x)(1 − ρ0 ) 1 − ρ0 m1 (x)
  • 25. The 21st Bayesian Century Tests and model choice Bayes factors Point null hypotheses (cont’d) Example (Normal mean) Test of H0 : θ = 0 when x ∼ N (θ, 1): we take π1 as N (0, τ 2 ) m1 (x) σ2 τ 2 x2 = exp f (x|0) σ2 + τ 2 2σ 2 (σ 2 + τ 2 ) and the posterior probability is τ /x 0 0.68 1.28 1.96 1 0.586 0.557 0.484 0.351 10 0.768 0.729 0.612 0.366
  • 26. The 21st Bayesian Century Tests and model choice Opposition to classical tests Comparison with classical tests The 95 percent frequentist intervals will live up to their advertised coverage claims Wasserman, BA, 2008 Standard answer Definition (p-value) The p-value p(x) associated with a test is the largest significance level for which H0 is rejected
  • 27. The 21st Bayesian Century Tests and model choice Opposition to classical tests Problems with p-values The use of P implies that a hypothesis that may be true may be rejected because it had not predicted observable results that have not occurred Jeffreys, ToP, 1939 ◮ Evaluation of the wrong quantity, namely the probability to exceed the observed quantity.(wrong conditioning) ◮ Evaluation only under the null hypothesis ◮ Huge numerical difference with the Bayesian range of answers
  • 28. The 21st Bayesian Century Tests and model choice Opposition to classical tests Bayesian lower bounds If the Bayes estimator has good frequency behavior then we might as well use the frequentist method. If it has bad frequency behavior then we shouldn’t use it. Wasserman, BA, 2008 Least favourable Bayesian answer is f (x|θ0 ) B(x, GA ) = inf , Θ f (x|θ)g(θ) dθ g∈GA ˆ i.e., if there exists a mle for θ, θ(x), f (x|θ0 ) B(x, GA ) = ˆ f (x|θ(x))
  • 29. The 21st Bayesian Century Tests and model choice Opposition to classical tests Illustration Example (Normal case) When x ∼ N (θ, 1) and H0 : θ0 = 0, the lower bounds are 2 /2 2 /2 −1 B(x, GA ) = e−x and P(x, GA ) = 1 + ex , i.e. p-value 0.10 0.05 0.01 0.001 P 0.205 0.128 0.035 0.004 B 0.256 0.146 0.036 0.004 [Quite different!]
  • 30. The 21st Bayesian Century Tests and model choice Model choice Model choice and model comparison There is no null hypothesis, which complicates the computation of sampling error Templeton, Mol. Ecol., 2009 Choice among models Several models available for the same observation(s) Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite
  • 31. The 21st Bayesian Century Tests and model choice Model choice Bayesian resolution The posterior probabilities are constructed by using a numerator that is a function of the observation for a particular model, then divided by a denominator that ensures that the ”probabilities” sum to one Templeton, Mol. Ecol., 2009 Probabilise the entire model/parameter space ◮ allocate probabilities pi to all models Mi ◮ define priors πi (θi ) for each parameter space Θi ◮ compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj
  • 32. The 21st Bayesian Century Tests and model choice Model choice Bayesian resolution(2) The numerators are not co-measurable across hypotheses, and the denominators are sums of non-co-measurable entities. This means that it is mathematically impossible for them to be probabilities. Templeton, Mol. Ecol., 2009 ◮ take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x′ |θj )πj (θj |x)dθj j Θj
  • 33. The 21st Bayesian Century Tests and model choice Model choice Natural Ockham’s razor Pluralitas non est ponenda sine neccesitate Variation is random until the contrary is shown; and new parameters in laws, when they are suggested, must be tested one at a time, unless there is specific reason to the contrary. Jeffreys, ToP, 1939 The Bayesian approach naturally weights differently models with different parameter dimensions (BIC).
  • 34. The 21st Bayesian Century Tests and model choice Compatible priors Compatibility principle Further complicating dimensionality of test statistics is the fact that the models are often not nested, and one model may contain parameters that do not have analogues in the other models and vice versa Templeton, Mol. Ecol., 2009 Difficulty of finding simultaneously priors on a collection of models Easier to start from a single prior on a “big” [encompassing] model and to derive others from a coherence principle [Dawid & Lauritzen, 2000] Raw regression output
  • 35. The 21st Bayesian Century Tests and model choice Compatible priors An illustration for linear regression In the case M1 and M2 are two nested Gaussian linear regression models with Zellner’s g-priors and the same variance σ 2 ∼ π(σ 2 ): ◮ M1 : y|β1 , σ 2 ∼ N (X1 β1 , σ 2 ) with β1 |σ 2 ∼ N s1 , σ 2 n1 (X1 X1 )−1 T where X1 is a (n × k1 ) matrix of rank k1 ≤ n ◮ M2 : y|β2 , σ 2 ∼ N (X2 β2 , σ 2 ) with β2 |σ 2 ∼ N s2 , σ 2 n2 (X2 X2 )−1 , T where X2 is a (n × k2 ) matrix with span(X2 ) ⊆ span(X1 ) [ c Marin & Robert, Bayesian Core]
  • 36. The 21st Bayesian Century Tests and model choice Compatible priors Compatible g-priors I don’t see any role for squared error loss, minimax, or the rest of what is sometimes called statistical decision theory Gelman, BA, 2008 Since σ 2 is a nuisance parameter, minimize the Kullback-Leibler divergence between both marginal distributions conditional on σ 2 : m1 (y|σ 2 ; s1 , n1 ) and m2 (y|σ 2 ; s2 , n2 ), with solution β2 |X2 , σ 2 ∼ N s∗ , σ 2 n∗ (X2 X2 )−1 2 2 T with s∗ = (X2 X2 )−1 X2 X1 s1 2 T T n∗ = n1 2
  • 37. The 21st Bayesian Century Tests and model choice Variable selection Variable selection Regression setup where y regressed on a set {x1 , . . . , xp } of p potential explanatory regressors (plus intercept) Corresponding 2p submodels Mγ , where γ ∈ Γ = {0, 1}p indicates inclusion/exclusion of variables by a binary representation, e.g. γ = 101001011 means that x1 , x3 , x5 , x7 and x8 are included.
  • 38. The 21st Bayesian Century Tests and model choice Variable selection Notations For model Mγ , ◮ qγ variables included ◮ t1 (γ) = {t1,1 (γ), . . . , t1,qγ (γ)} indices of those variables and t0 (γ) indices of the variables not included ◮ For β ∈ Rp+1 , βt1 (γ) = β0 , βt1,1 (γ) , . . . , βt1,qγ (γ) Xt1 (γ) = 1n |xt1,1 (γ) | . . . |xt1,qγ (γ) . Submodel Mγ is thus y|β, γ, σ 2 ∼ N Xt1 (γ) βt1 (γ) , σ 2 In
  • 39. The 21st Bayesian Century Tests and model choice Variable selection Global and compatible priors Use Zellner’s g-prior, i.e. a normal prior for β conditional on σ 2 , ˜ β|σ 2 ∼ N (β, cσ 2 (X T X)−1 ) and a Jeffreys prior for σ 2 , π(σ 2 ) ∝ σ −2 Noninformative g Resulting compatible prior −1 −1 βt1 (γ) ∼ N T Xt1 (γ) Xt1 (γ) T ˜ Xt1 (γ) X β, cσ 2 Xt1 (γ) Xt1 (γ) T
  • 40. The 21st Bayesian Century Tests and model choice Variable selection Posterior model probability Can be obtained in closed form: −n/2 ˜ ˜ 2y T P1 X β cy T P1 y β T X T P1 X β ˜ −(qγ +1)/2 T π(γ|y) ∝ (c+1) y y− + − . c+1 c+1 c+1 Conditionally on γ, posterior distributions of β and σ 2 : c ˜ σ2 c −1 βt1 (γ) |σ 2 , y, γ ∼ N (U1 y + U1 X β/c), T Xt1 (γ) Xt1 (γ) , c+1 c+1 n yT y cy T P1 y ˜ ˜ y T P1 X β β T X T P1 X β ˜ σ 2 |y, γ ∼ IG , − + − . 2 2 2(c + 1) 2(c + 1) c+1
  • 41. The 21st Bayesian Century Tests and model choice Variable selection Noninformative case Use the same compatible informative g-prior distribution with ˜ β = 0p+1 and a hierarchical diffuse prior distribution on c, π(c) ∝ c−1 IN∗ (c) or π(c) ∝ c−1 Ic>0 Recall g-prior The choice of this hierarchical diffuse prior distribution on c is due to the model posterior sensitivity to large values of c: Taking ˜ β = 0p+1 and c large does not work
  • 42. The 21st Bayesian Century Tests and model choice Variable selection Processionary caterpillar Influence of some forest settlement characteristics on the development of caterpillar colonies Response y log-transform of the average number of nests of caterpillars per tree on an area of 500 square meters (n = 33 areas) [ c Marin & Robert, Bayesian Core]
  • 43. The 21st Bayesian Century Tests and model choice Variable selection Processionary caterpillar (cont’d) Potential explanatory variables x x2 x3 1 x1 altitude (in meters), x2 slope (in degrees), x3 number of pines in the square, x4 height (in meters) of the tree at the center of the square, x5 diameter of the tree at the center of the square, x6 index of the settlement density, xx4orientation of the squarex(from 1 if southb’d to 2 ow), 7 5 x6 x8 height (in meters) of the dominant tree, x9 number of vegetation strata, x10 mix settlement index (from 1 if not mixed to 2 if mixed). x x8 x9
  • 44. The 21st Bayesian Century Tests and model choice Variable selection Bayesian regression output Estimate BF log10(BF) (Intercept) 9.2714 26.334 1.4205 (***) X1 -0.0037 7.0839 0.8502 (**) X2 -0.0454 3.6850 0.5664 (**) X3 0.0573 0.4356 -0.3609 X4 -1.0905 2.8314 0.4520 (*) X5 0.1953 2.5157 0.4007 (*) X6 -0.3008 0.3621 -0.4412 X7 -0.2002 0.3627 -0.4404 X8 0.1526 0.4589 -0.3383 X9 -1.0835 0.9069 -0.0424 X10 -0.3651 0.4132 -0.3838 evidence against H0: (****) decisive, (***) strong, (**) subtantial, (*) poor
  • 45. The 21st Bayesian Century Tests and model choice Variable selection Bayesian variable selection t1 (γ) π(γ|y, X) 0,1,2,4,5 0.0929 0,1,2,4,5,9 0.0325 0,1,2,4,5,10 0.0295 0,1,2,4,5,7 0.0231 0,1,2,4,5,8 0.0228 0,1,2,4,5,6 0.0228 0,1,2,3,4,5 0.0224 0,1,2,3,4,5,9 0.0167 0,1,2,4,5,6,9 0.0167 0,1,2,4,5,8,9 0.0137 Noninformative G-prior model choice
  • 46. The 21st Bayesian Century Bayesian Calculations Bayesian Calculations Bayesian methods seem to quickly move to elaborate computation Gelman, BA, 2008 Introduction Tests and model choice Bayesian Calculations Implementation difficulties Bayes factor approximation ABC model choice A Defense of the Bayesian Choice
  • 47. The 21st Bayesian Century Bayesian Calculations Implementation difficulties B Implementation difficulties ◮ Computing the posterior distribution π(θ|x) ∝ π(θ)f (x|θ) ◮ Resolution of arg min L(θ, δ)π(θ)f (x|θ)dθ Θ ◮ Maximisation of the marginal posterior arg max π(θ|x)dθ−1 Θ−1
  • 48. The 21st Bayesian Century Bayesian Calculations Implementation difficulties B Implementation further difficulties A statistical test returns a probability value, but rarely is the probability value per se the reason for an investigator performing the test Templeton, Mol. Ecol., 2009 ◮ Computing posterior quantities h(θ) π(θ)f (x|θ)dθ δ π (x) = h(θ) π(θ|x)dθ = Θ Θ π(θ)f (x|θ)dθ Θ ◮ Resolution (in k) of P (π(θ|x) ≥ k|x) = α
  • 49. The 21st Bayesian Century Bayesian Calculations Implementation difficulties Monte Carlo methods Bayesian simulation seems stuck in an infinite regress of inferential uncertainty Gelman, BA, 2008 Approximation of I= g(θ)f (x|θ)π(θ) dθ, Θ takes advantage of the fact that f (x|θ)π(θ) is proportional to a density: If the θi ’s are from π(θ), m 1 g(θi )f (x|θi ) m i=1 converges (almost surely) to I
  • 50. The 21st Bayesian Century Bayesian Calculations Implementation difficulties Importance function A simulation method of inference hides unrealistic assumptions Templeton, Mol. Ecol., 2009 No need to simulate from π(·|x) or from π: if h is a probability density, g(θ)f (x|θ)π(θ) g(θ)f (x|θ)π(θ) dθ = h(θ) dθ Θ h(θ) and m i=1 g(θi )ω(θi ) f (x|θi )π(θi ) m with ω(θi ) = i=1 ω(θi ) h(θi ) approximates Eπ [g(θ)|x]
  • 51. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation Bayes factor approximation ABC’s When approximating the Bayes factor f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 Z1 B12 = = Z2 f2 (x|θ2 )π2 (θ2 )dθ2 Θ2 use of importance functions ̟1 and ̟2 and n1 n−1 i i i i=1 f1 (x|θ1 )π1 (θ1 )/̟1 (θ1 ) i B12 = 1 n2 θj ∼ ̟j (θ) n−1 2 i i i i=1 f2 (x|θ2 )π2 (θ2 )/̟2 (θ2 ) [Chopin & Robert, 2007]
  • 52. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation Bridge sampling Special case: If π1 (θ1 |x) ∝ π1 (θ1 |x) ˜ π2 (θ2 |x) ∝ π2 (θ2 |x) ˜ live on the same space (Θ1 = Θ2 ), then n 1 π1 (θi |x) ˜ B12 ≈ θi ∼ π2 (θ|x) n π2 (θi |x) ˜ i=1 [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
  • 53. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation (Further) bridge sampling In addition π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1
  • 54. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation Optimal bridge sampling The optimal choice of auxiliary function is n1 + n2 α⋆ (θ) = n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1 Back later!
  • 55. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 56. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 57. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation Approximating Z using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  • 58. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1. Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2. If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ′ ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3. If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 59. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) (t) ω1 πk (θk )Lk (θk ) (t) (t) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ω1 Zk /{ω1 Zk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  • 60. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 61. The 21st Bayesian Century Bayesian Calculations Bayes factor approximation Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables
  • 62. The 21st Bayesian Century Bayesian Calculations ABC model choice Approximate Bayesian Computation Simulation target is π(θ)f (x|θ) with likelihood f (x|θ) not in closed form. Likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ′ ∼ π(θ) , x ∼ f (x|θ′ ) , until the auxiliary variable x is equal to the observed value, x = y. [Pritchard et al., 1999]
  • 63. The 21st Bayesian Century Bayesian Calculations ABC model choice A as approximative When y is a continuous random variable, equality x = y is replaced with a tolerance condition, ̺(x, y) ≤ ǫ where ̺ is a distance between summary statistics Output distributed from π(θ) Pθ {̺(x, y) < ǫ} ∝ π(θ|̺(x, y) < ǫ)
  • 64. The 21st Bayesian Century Bayesian Calculations ABC model choice Gibbs random fields Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential U (y) = c∈C Vc (yc ) is the energy function c Z is usually unavailable in closed form
  • 65. The 21st Bayesian Century Bayesian Calculations ABC model choice Potts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure In most realistic settings, summation Zθ = exp{θT S(x)} x∈X involves too many terms to be manageable and numerical approximations cannot always be trusted [Cucala, Marin, CPR & Titterington, JASA, 2009]
  • 66. The 21st Bayesian Century Bayesian Calculations ABC model choice Neighbourhood relations Choice to be made between M neighbourhood relations m i ∼ i′ (0 ≤ m ≤ M − 1) with Sm (x) = I{xi =xi′ } m i∼i′ driven by the posterior probabilities of the models.
  • 67. The 21st Bayesian Century Bayesian Calculations ABC model choice Model index Formalisation via a model index M, new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm ) Computational target: P(M = m|x) ∝ fm (x|θm )πm (θm ) dθm π(M = m) Θm
  • 68. The 21st Bayesian Century Bayesian Calculations ABC model choice Sufficient statistics If S(x) sufficient statistic for the joint parameters (M, θ0 , . . . , θM −1 ), P(M = m|x) = P(M = m|S(x)) . For each model m, sufficient statistic Sm (·) makes S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient. For Gibbs random fields, 1 2 x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm ) 1 = f 2 (S(x)|θm ) n(S(x)) m where n(S(x)) = ♯ {˜ ∈ X : S(˜ ) = S(x)} x x c S(x) also sufficient for the joint parameters [Specific to Gibbs random fields!]
  • 69. The 21st Bayesian Century Bayesian Calculations ABC model choice ABC model choice Algorithm ABC-MC ◮ Generate m∗ from the prior π(M = m). ◮ ∗ Generate θm∗ from the prior πm∗ (·). ◮ ∗ Generate x∗ from the model fm∗ (·|θm∗ ). ◮ Compute the distance ρ(S(x0 ), S(x∗ )). ◮ Accept (θm∗ , m∗ ) if ρ(S(x0 ), S(x∗ )) < ǫ. ∗ [Cornuet, Grelaud, Marin & Robert, BA, 2008] Note When ǫ = 0 the algorithm is exact
  • 70. The 21st Bayesian Century Bayesian Calculations ABC model choice Toy example iid Bernoulli model versus two-state first-order Markov chain, i.e. n f0 (x|θ0 ) = exp θ0 I{xi =1} {1 + exp(θ0 )}n , i=1 versus n 1 f1 (x|θ1 ) = exp θ1 I{xi =xi−1 } {1 + exp(θ1 )}n−1 , 2 i=2 with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase transition” boundaries).
  • 71. The 21st Bayesian Century Bayesian Calculations ABC model choice Toy example (2) (left) Comparison of the true BF m0 /m1 (x0 ) with BF m0 /m1 (x0 ) (in logs) over 2, 000 simulations and 4.106 proposals from the prior. (right) Same when using tolerance ǫ corresponding to the 1% quantile on the distances.
  • 72. The 21st Bayesian Century A Defense of the Bayesian Choice A Defense of the Bayesian Choice Given the advances in practical Bayesian methods in the past two decades, anti-Bayesianism is no longer a serious option Gelman, BA, 2009 Bayesians are of course their own worst enemies. They make non-Bayesians accuse them of religious fervour, and an unwillingness to see another point of view. Davidson, 2009
  • 73. The 21st Bayesian Century A Defense of the Bayesian Choice 1. Choosing a probabilistic representation Bayesian statistics is about making probability statements Gelman, BA, 2009 Bayesian Statistics appears as the calculus of uncertainty Reminder: A probabilistic model is nothing but an interpretation of a given phenomenon What is the meaning of RD’s t test example?!
  • 74. The 21st Bayesian Century A Defense of the Bayesian Choice 1. Choosing a probabilistic representation (2) Inference is impossible. Davidson, 2009 The Bahadur–Savage problem stems from the inability to make choices about the shape of a statistical model, not from an impossibility to draw [Bayesian] inference. Further, a probability distribution is more than the sum of its moments. Ill-posed problems thus highlight issues with the model, not the inference.
  • 75. The 21st Bayesian Century A Defense of the Bayesian Choice 2. Conditioning on the data Bayesian data analysis is a method for summarizing uncertainty and making estimates and predictions using probability statements conditional on observed data and an assumed model Gelman, BA, 2009 At the basis of statistical inference lies an inversion process between cause and effect. Using a prior distribution brings a necessary balance between observations and parameters and enable to operate conditional upon x What is the data in RD’s t test example?! U ’s? Y ’s?
  • 76. The 21st Bayesian Century A Defense of the Bayesian Choice 3. Exhibiting the true likelihood Frequentist statistics is an approach for evaluating statistical procedures conditional on some family of posited probability models Gelman, BA, 2009 Provides a complete quantitative inference on the parameters and predictive that points out inadequacies of frequentist statistics, while implementing the Likelihood Principle. There needs to be a true likelihood, including in non-parametric settings [Rousseau, Van der Vaart]
  • 77. The 21st Bayesian Century A Defense of the Bayesian Choice 4. Using priors as tools and summaries Bayesian techniques allow prior beliefs to be tested and discarded as appropriate Gelman, BA, 2009 The choice of a prior distribution π does not require any kind of belief in this distribution: rather consider it as a tool that summarizes the available prior information and the uncertainty surrounding this information Non-identifiability is an issue in that the prior may strongly impact inference about identifiable bits
  • 78. The 21st Bayesian Century A Defense of the Bayesian Choice 4. Using priors as tools and summaries (2) No uninformative prior exists for such models. Davidson, 2009 Reference priors can be deduced from the sampling distribution by an automated procedure, based on a minimal information principle that maximises the information brought by the data. Important literature on prior modelling for non-parametric problems, incl. smoothness constraints.
  • 79. The 21st Bayesian Century A Defense of the Bayesian Choice 5. Accepting the subjective basis of knowledge Knowledge is a critical confrontation between a prioris and experiments. Ignoring these a prioris impoverishes analysis. We have, for one thing, to use a language and our language is entirely made of preconceived ideas and has to be so. However, these are unconscious preconceived ideas, which are a million times more dangerous than the other ones. Were we to assert that if we are including other preconceived ideas, consciously stated, we would aggravate the evil! I do not believe so: I rather maintain that they would balance one another. Henri Poincar´, 1902 e
  • 80. The 21st Bayesian Century A Defense of the Bayesian Choice 6. Choosing a coherent system of inference Bayesian data analysis has three stages: formulating a model, splitting the model to data, and checking the model fit. The second step—inference—gets most of the attention, but the procedure as a whole is not automatic Gelman, BA, 2009 To force inference into a decision-theoretic mold allows for a clarification of the way inferential tools should be evaluated, and therefore implies a conscious (although subjective) choice of the retained optimality. Logical inference process Start with requested properties, i.e. loss function and prior distribution, then derive the best solution satisfying these properties.
  • 81. The 21st Bayesian Century A Defense of the Bayesian Choice 6. Choosing a coherent system of inference (2) Asymptopia annoys Bayesians. Davidson, 2009 Asymptotics [for inference] sounds for a proxy for not specifying completely the model and thus for using another model. While asymptotics [for simulation] is quite acceptable. Bayesian inference does not escape asymptotic difficulties, see e.g. mixtures. NP Bootstrap aims at inference with no[t enough] modelling, while P Bayesian bootstrap is essentially using the Bayesian predictive
  • 82. The 21st Bayesian Century A Defense of the Bayesian Choice 7. Looking for optimal frequentist procedures At intermediate levels of a Bayesian model, frequency properties typically take care of themselves. It is typically only at the top level of unreplicated parameters that we have to worry Gelman, BA, 2009 Bayesian inference widely intersects with the three notions of minimaxity, admissibility and equivariance (Haar). Looking for an optimal estimator most often ends up finding a Bayes estimator. Optimality is easier to attain through the Bayes “filter”
  • 83. The 21st Bayesian Century A Defense of the Bayesian Choice 8. Solving the actual problem Frequentist methods have coverage guarantees; Bayesian methods don’t. In science, coverage matters Wasserman, BA, 2009 Frequentist methods justified on a long-term basis, i.e., from the statistician viewpoint. From a decision-maker’s point of view, only the problem at hand matters! That is, he/she calls for an inference conditional on x.
  • 84. The 21st Bayesian Century A Defense of the Bayesian Choice 9. Providing a universal system of inference Bayesian methods are presented as an automatic inference engine Gelman, BA, 2009 Given the three factors (X , f (x|θ), (Θ, π(θ)), (D, L(θ, d)) , the Bayesian approach validates one and only one inferential procedure
  • 85. The 21st Bayesian Century A Defense of the Bayesian Choice 10. Computing procedures as a minimization problem The discussion of computational issues should not be allowed to obscure the need for further analysis of inferential questions Bernardo, BA, 2009 Bayesian procedures are easier to compute than procedures of alternative theories, in the sense that there exists a universal method for the computation of Bayes estimators Convergence assessment is an issue, but recent developments in adaptive MCMC allow for more confidence in the output