SlideShare una empresa de Scribd logo
1 de 82
Descargar para leer sin conexión
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data




            A Short History of Markov Chain Monte Carlo:
            Subjective Recollections from Incomplete Data

                             Christian P. Robert and George Casella

                                   Universit´ Paris-Dauphine, IuF, & CRESt
                                            e
                                            and University of Florida


                                                   April 2, 2011
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data




       In memoriam, Julian Besag, 1945–2010
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Introduction




Introduction



       Markov Chain Monte Carlo (MCMC) methods around for almost
       as long as Monte Carlo techniques, even though impact on
       Statistics not been truly felt until the late 1980s / early 1990s .
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Introduction




Introduction



       Markov Chain Monte Carlo (MCMC) methods around for almost
       as long as Monte Carlo techniques, even though impact on
       Statistics not been truly felt until the late 1980s / early 1990s .
       Contents: Distinction between Metropolis-Hastings based
       algorithms and those related with Gibbs sampling, and brief entry
       into “second-generation MCMC revolution”.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Introduction




A few landmarks


       Realization that Markov chains could be used in a wide variety of
       situations only came to “mainstream statisticians” with Gelfand
       and Smith (1990) despite earlier publications in the statistical
       literature like Hastings (1970) and growing awareness in spatial
       statistics (Besag, 1986)
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Introduction




A few landmarks


       Realization that Markov chains could be used in a wide variety of
       situations only came to “mainstream statisticians” with Gelfand
       and Smith (1990) despite earlier publications in the statistical
       literature like Hastings (1970) and growing awareness in spatial
       statistics (Besag, 1986)
       Several reasons:
                 lack of computing machinery
                 lack of background on Markov chains
                 lack of trust in the practicality of the method
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Los Alamos



Bombs before the revolution


    Monte Carlo methods born in Los
    Alamos, New Mexico, during WWII,
    mostly by physicists working on atomic
    bombs and eventually producing the
    Metropolis algorithm in the early
    1950’s.

                  [Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Los Alamos



Bombs before the revolution


    Monte Carlo methods born in Los
    Alamos, New Mexico, during WWII,
    mostly by physicists working on atomic
    bombs and eventually producing the
    Metropolis algorithm in the early
    1950’s.
                  [Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Los Alamos



Monte Carlo genesis
    Monte Carlo method usually traced to
    Ulam and von Neumann:
           Stanislaw Ulam associates idea
           with an intractable combinatorial
           computation attempted in 1946
           about “solitaire”
           Idea was enthusiastically adopted
           by John von Neumann for
           implementation on neutron
           diffusion
           Name “Monte Carlo“ being
           suggested by Nicholas Metropolis
                                                                                         [Eckhardt, 1987]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Los Alamos



Monte Carlo genesis
    Monte Carlo method usually traced to
    Ulam and von Neumann:
           Stanislaw Ulam associates idea
           with an intractable combinatorial
           computation attempted in 1946
           about “solitaire”
           Idea was enthusiastically adopted
           by John von Neumann for
           implementation on neutron
           diffusion
           Name “Monte Carlo“ being
           suggested by Nicholas Metropolis
                                                                                         [Eckhardt, 1987]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Los Alamos



Monte Carlo genesis
    Monte Carlo method usually traced to
    Ulam and von Neumann:
           Stanislaw Ulam associates idea
           with an intractable combinatorial
           computation attempted in 1946
           about “solitaire”
           Idea was enthusiastically adopted
           by John von Neumann for
           implementation on neutron
           diffusion
           Name “Monte Carlo“ being
           suggested by Nicholas Metropolis
                                                                                         [Eckhardt, 1987]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Los Alamos



Monte Carlo genesis
    Monte Carlo method usually traced to
    Ulam and von Neumann:
           Stanislaw Ulam associates idea
           with an intractable combinatorial
           computation attempted in 1946
           about “solitaire”
           Idea was enthusiastically adopted
           by John von Neumann for
           implementation on neutron
           diffusion
           Name “Monte Carlo“ being
           suggested by Nicholas Metropolis
                                                                                         [Eckhardt, 1987]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Los Alamos



Monte Carlo with computers


       Very close “coincidence” with
       appearance of very first
       computer, ENIAC, born Feb.
       1946, on which von Neumann
       implemented Monte Carlo in
       1947
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Los Alamos



Monte Carlo with computers


       Very close “coincidence” with
       appearance of very first
       computer, ENIAC, born Feb.
       1946, on which von Neumann
       implemented Monte Carlo in
       1947
       Same year Ulam and von Neumann (re)invented inversion and
       accept-reject techniques
       In 1949, very first symposium on Monte Carlo and very first paper
                                          [Metropolis and Ulam, 1949]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Metropolis et al., 1953



The Metropolis et al. (1953) paper


    Very first MCMC algorithm associated
    with the second computer, MANIAC,
    Los Alamos, early 1952.
    Besides Metropolis, Arianna W.
    Rosenbluth, Marshall N. Rosenbluth,
    Augusta H. Teller, and Edward Teller
    contributed to create the Metropolis
    algorithm...
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Metropolis et al., 1953



Motivating problem

       Computation of integrals of the form

                                        F (p, q) exp{−E(p, q)/kT }dpdq
                               I=                                      ,
                                            exp{−E(p, q)/kT }dpdq

       with energy E defined as
                                                             N     N
                                                 1
                                       E(p, q) =                         V (dij ),
                                                 2
                                                            i=1    j=1
                                                                  j=i

       and N number of particles, V a potential function and dij the
       distance between particles i and j.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Metropolis et al., 1953



Boltzmann distribution



       Boltzmann distribution exp{−E(p, q)/kT } parameterised by
       temperature T , k being the Boltzmann constant, with a
       normalisation factor

                                 Z(T ) =           exp{−E(p, q)/kT }dpdq

       not available in closed form.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Metropolis et al., 1953



Computational challenge



       Since p and q are 2N -dimensional vectors, numerical integration is
       impossible
       Plus, standard Monte Carlo techniques fail to correctly
       approximate I: exp{−E(p, q)/kT } is very small for most
       realizations of random configurations (p, q) of the particle system.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Metropolis et al., 1953



Metropolis algorithm

       Consider a random walk modification of the N particles: for each
       1 ≤ i ≤ N , values

                                 xi = xi + αξ1i and yi = yi + αξ2i

       are proposed, where both ξ1i and ξ2i are uniform U(−1, 1). The
       energy difference between new and previous configurations is ∆E
       and the new configuration is accepted with probability

                                            1 ∧ exp{−∆E/kT } ,

       and otherwise the previous configuration is replicated∗

           ∗
            counting one more time in the average of the F (pt , pt )’s over the τ moves
       of the random walk.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Metropolis et al., 1953



Convergence

       Validity of the algorithm established by proving
         1. irreducibility
         2. ergodicity, that is convergence to the stationary distribution.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Metropolis et al., 1953



Convergence

       Validity of the algorithm established by proving
         1. irreducibility
         2. ergodicity, that is convergence to the stationary distribution.
       Second part obtained via discretization of the space: Metropolis et
       al. note that the proposal is reversible, then establish that
       exp{−E/kT } is invariant.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Metropolis et al., 1953



Convergence

       Validity of the algorithm established by proving
         1. irreducibility
         2. ergodicity, that is convergence to the stationary distribution.
       Second part obtained via discretization of the space: Metropolis et
       al. note that the proposal is reversible, then establish that
       exp{−E/kT } is invariant.
       Application to the specific problem of the rigid-sphere collision
       model. The number of iterations of the Metropolis algorithm
       seems to be limited: 16 steps for burn-in and 48 to 64 subsequent
       iterations (that still required four to five hours on the Los Alamos
       MANIAC).
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Metropolis et al., 1953



Physics and chemistry

              The method of Markov chain Monte Carlo immediately
              had wide use in physics and chemistry.
                                                                        [Geyer & Thompson, 1992]


              Hammersley and Handscomb, 1967
              Piekaar and Clarenburg, 1967
              Kennedy and Kutil, 1985
              Sokal, 1989
              &tc...
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Metropolis et al., 1953



Physics and chemistry

              Statistics has always been fuelled by energetic mining of
              the physics literature.
                                                                                             [Clifford, 1993]


              Hammersley and Handscomb, 1967
              Piekaar and Clarenburg, 1967
              Kennedy and Kutil, 1985
              Sokal, 1989
              &tc...
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Hastings, 1970



A fair generalisation


       In Biometrika 1970, Hastings defines MCMC methodology for
       finite and reversible Markov chains, the continuous case being
       discretised:
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Hastings, 1970



A fair generalisation


       In Biometrika 1970, Hastings defines MCMC methodology for
       finite and reversible Markov chains, the continuous case being
       discretised:
       Generic acceptance probability for a move from state i to state j is
                                                             sij
                                               αij =          πi q ,
                                                          1 + πj qij
                                                                  ji


       where sij is a symmetric function.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Hastings, 1970



State of the art


       Note
       Generic form that encompasses both Metropolis et al. (1953) and
       Barker (1965).
       Peskun’s ordering not yet discovered: Hastings mentions that little
       is known about the relative merits of those two choices (even
       though) Metropolis’s method may be preferable.
       Warning against high rejection rates as indicative of a poor choice
       of transition matrix, but not mention of the opposite pitfall of low
       rejection.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Hastings, 1970



What else?!
       Items included in the paper are
              a Poisson target with a ±1 random walk proposal,
              a normal target with a uniform random walk proposal mixed with its
              reflection (i.e. centered at −X(t) rather than X(t)),
              a multivariate target where Hastings introduces Gibbs sampling,
              updating one component at a time and defining the composed
              transition as satisfying the stationary condition because each
              component does leave the target invariant
              a reference to Erhman, Fosdick and Handscomb (1960) as a
              preliminary if specific instance of this Metropolis-within-Gibbs
              sampler
              an importance sampling version of MCMC,
              some remarks about error assessment,
              a Gibbs sampler for random orthogonal matrices
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Hastings, 1970



Three years later

       Peskun (1973) compares Metropolis’ and Barker’s acceptance
       probabilities and shows (again in a discrete setup) that Metropolis’
       is optimal (in terms of the asymptotic variance of any empirical
       average).
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Hastings, 1970



Three years later

       Peskun (1973) compares Metropolis’ and Barker’s acceptance
       probabilities and shows (again in a discrete setup) that Metropolis’
       is optimal (in terms of the asymptotic variance of any empirical
       average).
       Proof direct consequence of Kemeny and Snell (1960) on
       asymptotic variance. Peskun also establishes that this variance can
       improve upon the iid case if and only if the eigenvalues of P − A
       are all negative, when A is the transition matrix corresponding to
       the iid simulation and P the transition matrix corresponding to the
       Metropolis algorithm, but he concludes that the trace of P − A is
       always positive.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Julian’s early works (1)




       Early 1970’s, Hammersley, Clifford, and Besag were working on the
       specification of joint distributions from conditional distributions
       and on necessary and sufficient conditions for the conditional
       distributions to be compatible with a joint distribution.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Julian’s early works (1)


       Early 1970’s, Hammersley, Clifford, and Besag were working on the
       specification of joint distributions from conditional distributions
       and on necessary and sufficient conditions for the conditional
       distributions to be compatible with a joint distribution.




                                                                [Hammersley and Clifford, 1971]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Julian’s early works (1)


       Early 1970’s, Hammersley, Clifford, and Besag were working on the
       specification of joint distributions from conditional distributions
       and on necessary and sufficient conditions for the conditional
       distributions to be compatible with a joint distribution.
       What is the most general form of the conditional probability
       functions that define a coherent joint function? And what will the
       joint look like?
                                                           [Besag, 1972]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Hammersley-Clifford theorem



       Theorem (Hammersley-Clifford)
       Joint distribution of vector associated with a dependence graph
       must be represented as product of functions over the cliques of the
       graphs, i.e., of functions depending only on the components
       indexed by the labels in the clique.

                                                                 [Cressie, 1993; Lauritzen, 1996]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Hammersley-Clifford theorem



       Theorem (Hammersley-Clifford)
       A probability distribution P with positive and continuous density f
       satisfies the pairwise Markov property with respect to an undirected
       graph G if and only if it factorizes according to G , i.e., (F ) ≡ (G)

                                                                 [Cressie, 1993; Lauritzen, 1996]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Hammersley-Clifford theorem


       Theorem (Hammersley-Clifford)
       Under the positivity condition, the joint distribution g satisfies
                                           p
                                                 g j (y j |y 1 , . . . , y   j−1
                                                                                   ,y   j+1
                                                                                              , . . . , y p)
               g(y1 , . . . , yp ) ∝
                                                 g j (y j |y 1 , . . . , y   j−1
                                                                                   ,y         , . . . , y p)
                                         j=1                                            j+1


        for every permutation                  on {1, 2, . . . , p} and every y ∈ Y .

                                                                 [Cressie, 1993; Lauritzen, 1996]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



An apocryphal theorem


       The Hammersley-Clifford theorem was never published by its
       authors, but only through Grimmet (1973), Preston (1973),
       Sherman (1973), Besag (1974). The authors were dissatisfied with
       the positivity constraint: The joint density could only be recovered
       from the full conditionals when the support of the joint was made
       of the product of the supports of the full conditionals (with
       obvious counter-examples. Moussouris’ counter-example put a full
       stop to their endeavors.
                                                        [Hammersley, 1974]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



To Gibbs or not to Gibbs?

       Julian Besag should certainly be credited to a large extent of the
       (re?-)discovery of the Gibbs sampler.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



To Gibbs or not to Gibbs?

       Julian Besag should certainly be credited to a large extent of the
       (re?-)discovery of the Gibbs sampler.
                   The simulation procedure is to consider the sites
               cyclically and, at each stage, to amend or leave unaltered
               the particular site value in question, according to a
               probability distribution whose elements depend upon the
               current value at neighboring sites (...) However, the
               technique is unlikely to be particularly helpful in many
               other than binary situations and the Markov chain itself
               has no practical interpretation.
                                                                                             [Besag, 1974]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Broader perspective

       In 1964, Hammersley and Handscomb wrote a (the first?)
       textbook on Monte Carlo methods: they cover
       They cover such topics as
               “Crude Monte Carlo“;
               importance sampling;
               control variates; and
               “Conditional Monte Carlo”, which looks surprisingly like a
               missing-data Gibbs completion approach.
       They state in the Preface
               We are convinced nevertheless that Monte Carlo methods
               will one day reach an impressive maturity.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Clicking in

       After Peskun (1973), MCMC mostly dormant in mainstream
       statistical world for about 10 years, then several papers/books
       highlighted its usefulness in specific settings:
               Geman and Geman (1984)
               Besag (1986)
               Strauss (1986)
               Ripley (Stochastic Simulation, 1987)
               Tanner and Wong (1987)
               Younes (1988)
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Enters the Gibbs sampler


       Geman and Geman (1984), building on Metropolis et al. (1953),
       Hastings (1970), and Peskun (1973), constructed a Gibbs sampler
       for optimisation in a discrete image processing problem without
       completion.
       Responsible for the name Gibbs sampling, because method used for
       the Bayesian study of Gibbs random fields linked to the physicist
       Josiah Willard Gibbs (1839–1903)
       Back to Metropolis et al., 1953: the Gibbs sampler is used as a
       simulated annealing algorithm and ergodicity is proven on the
       collection of global maxima
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Besag (1986) integrates GS for SA...


               ...easy to construct the transition matrix Q, of a discrete
               time Markov chain, with state space Ω and limit
               distribution (4). Simulated annealing proceeds by
               running an associated time inhomogeneous Markov chain
               with transition matrices QT , where T is progressively
               decreased according to a prescribed “schedule” to a value
               close to zero.
                                                                                             [Besag, 1986]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



...and links with Metropolis-Hastings...

               There are various related methods of constructing a
               manageable QT (Hastings, 1970). Geman and Geman
               (1984) adopt the simplest, which they term the ”Gibbs
               sampler” (...) time reversibility, a common ingredient in
               this type of problem (see, for example, Besag, 1977a), is
               present at individual stages but not over complete cycles,
               though Peter Green has pointed out that it returns if QT
               is taken over a pair of cycles, the second of which visits
               pixels in reverse order
                                                                                             [Besag, 1986]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



...seeing the larger picture,...
               As Geman and Geman (1984) point out, any property of
               the (posterior) distribution P (x|y) can be simulated by
               running the Gibbs sampler at “temperature” T = 1.
               Thus, if xi maximizes P (xi |y), then it is the most
                         ˆ
               frequently occurring colour at pixel i in an infinite
               realization of the Markov chain with transition matrix Q
               of Section 2.3. The xi ’s can therefore be simultaneously
                                     ˆ
               estimated from a single finite realization of the chain. It
               is not yet clear how long the realization needs to be,
               particularly for estimation near colour boundaries, but the
               amount of computation required is generally prohibitive
               for routine purposes
                                                                                             [Besag, 1986]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



...seeing the larger picture,...
               P (x|y) can be simulated using the Gibbs sampler, as
               suggested by Grenander (1983) and by Geman and
               Geman (1984). My dismissal of such an approach for
               routine applications was somewhat cavalier:
               purpose-built array processors could become relatively
               inexpensive (...) suppose that, for 100 complete cycles
               say, images have been collected from the Gibbs sampler
               (or by Metropolis’ method), following a “settling-in”
               period of perhaps another 100 cycles, which should cater
               for fairly intricate priors (...) These 100 images should
               often be adequate for estimating properties of the
               posterior (...) and for making approximate associated
               confidence statements, as mentioned by Mr Haslett.
                                                                                             [Besag, 1986]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



...if not going fully Bayes!




               ...a neater and more efficient procedure [for parameter
               estimation] is to adopt maximum ”pseudo-likelihood”
               estimation (Besag, 1975)
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



...if not going fully Bayes!


               ...a neater and more efficient procedure [for parameter
               estimation] is to adopt maximum ”pseudo-likelihood”
               estimation (Besag, 1975)

               I have become increasingly enamoured with the Bayesian
               paradigm
                                                                                             [Besag, 1986]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



...if not going fully Bayes!
               ...a neater and more efficient procedure [for parameter
               estimation] is to adopt maximum ”pseudo-likelihood”
               estimation (Besag, 1975)

               I have become increasingly enamoured with the Bayesian
               paradigm
                                                                                              [Besag, 1986]


               The pair (xi , βi ) is then a (bivariate) Markov field and
               can be reconstructed as a bivariate process by the
               methods described in Professor Besag’s paper.
                                                                                             [Clifford, 1986]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



...if not going fully Bayes!
               ...a neater and more efficient procedure [for parameter
               estimation] is to adopt maximum ”pseudo-likelihood”
               estimation (Besag, 1975)

               I have become increasingly enamoured with the Bayesian
               paradigm
                                                                                             [Besag, 1986]


               The simulation-based estimator Epost Ψ(X) will differ
                                         ˆ
               from the m.a.p. estimator Ψ(x).

                                                                                        [Silverman, 1986]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Discussants of Besag (1986)




       Impressive who’s who: D.M. Titterington, P. Clifford, P. Green, P.
       Brown, B. Silverman, F. Critchley, F. Kelly, K. Mardia, C.
       Jennison, J. Kent, D. Spiegelhalter, H. Wynn, D. and S. Geman, J.
       Haslett, J. Kay, H. K¨nsch, P. Switzer, B. Torsney, &tc
                            u
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



A comment on Besag (1986)

               While special purpose algorithms will determine the
               utility of the Bayesian methods, the general purpose
               methods-stochastic relaxation and simulation of solutions
               of the Langevin equation (Grenander, 1983; Geman and
               Geman, 1984; Gidas, 1985a; Geman and Hwang, 1986)
               have proven enormously convenient and versatile. We are
               able to apply a single computer program to every new
               problem by merely changing the subroutine that
               computes the energy function in the Gibbs representation
               of the posterior distribution.
                                                                      [Geman and McClure, 1986]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Another one

               It is easy to compute exact marginal and joint posterior
               probabilities of currently unobserved features, conditional
               on those clinical findings currently available
               (Spiegelhalter, 1986a,b), the updating taking the form of
               ‘propagating evidence’ through the network (...) it would
               be interesting to see if the techniques described tonight,
               which are of intermediate complexity, may have any
               applications in this new and exciting area [causal
               networks].

                                                                                   [Spiegelhalter, 1986]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



The candidate’s formula
       Representation of the marginal likelihood as

                                                         π(θ)f (x|θ)
                                                m(x)
                                                           π(θ|x)

       or of the marginal predictive as

                            pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y )

                                                                                             [Besag, 1989]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



The candidate’s formula
       Representation of the marginal likelihood as

                                                         π(θ)f (x|θ)
                                                m(x)
                                                           π(θ|x)

       or of the marginal predictive as

                            pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y )

                                                                                             [Besag, 1989]

       Why candidate?
       “Equation (2) appeared without explanation in a Durham
       University undergraduate final examination script of 1984.
       Regrettably, the student’s name is no longer known to me.”
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Implications



               Newton and Raftery (1994) used this representation to derive
               the [infamous] harmonic mean approximation to the marginal
               likelihood
               Gelfand and Dey (1994)
               Geyer and Thompson (1995)
               Chib (1995)
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Implications



               Newton and Raftery (1994)
               Gelfand and Dey (1994) also relied on this formula for the
               same purpose in a more general perspective
               Geyer and Thompson (1995)
               Chib (1995)
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Implications



               Newton and Raftery (1994)
               Gelfand and Dey (1994)
               Geyer and Thompson (1995) derived MLEs by a Monte Carlo
               approximation to the normalising constant
               Chib (1995)
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  Before the revolution
     Julian’s early works



Implications



               Newton and Raftery (1994)
               Gelfand and Dey (1994)
               Geyer and Thompson (1995)
               Chib (1995) uses this representation to build a MCMC
               approximation to the marginal likelihood
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Final steps to



Impact


       “This is surely a revolution.”
                                                                                             [Clifford, 1993]
       Geman and Geman (1984) is one more spark that led to the
       explosion, as it had a clear influence on Gelfand, Green, Smith,
       Spiegelhalter and others.
       Sparked new interest in Bayesian methods, statistical computing,
       algorithms, and stochastic processes through the use of computing
       algorithms such as the Gibbs sampler and the Metropolis–Hastings
       algorithm.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Final steps to



Impact


       “[Gibbs sampler] use seems to have been isolated in the spatial
       statistics community until Gelfand and Smith (1990)”
                                                            [Geyer, 1990]
       Geman and Geman (1984) is one more spark that led to the
       explosion, as it had a clear influence on Gelfand, Green, Smith,
       Spiegelhalter and others.
       Sparked new interest in Bayesian methods, statistical computing,
       algorithms, and stochastic processes through the use of computing
       algorithms such as the Gibbs sampler and the Metropolis–Hastings
       algorithm.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Final steps to



Data augmentation
       Tanner and Wong (1987) has essentialy the same ingredients as
       Gelfand and Smith (1990): simulating from conditionals is
       simulating from the joint
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Final steps to



Data augmentation
       Tanner and Wong (1987) has essentialy the same ingredients as
       Gelfand and Smith (1990): simulating from conditionals is
       simulating from the joint
       Lower impact:
               emphasis on missing data problems (hence data augmentation)
               MCMC approximation to the target at every iteration
                                               K
                                         1
                          π(θ|x) ≈                  π(θ|x, z t,k ) ,          z t,k ∼ πt−1 (z|x) ,
                                                                                      ˆ
                                         K
                                              k=1


               too close to Rubin’s (1978) multiple imputation
               theoretical backup based on functional analysis (Markov kernel had
               to be uniformly bounded and equicontinuous)
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Gelfand and Smith, 1990



Epiphany

                                In June 1989, at a Bayesian workshop in Sherbrooke,
                                Qu´bec, Adrian Smith exposed for the first time (?)
                                    e
                                the generic features of Gibbs sampler, exhibiting a ten
                                line Fortran program handling a random effect model

                     Yij       = θi + εij ,           i = 1, . . . , K,         j = 1, . . . , J,
                                          2                           2
                       θi ∼         N(µ, σθ )         εij ∼     N(0, σε )

       by full conditionals on µ, σθ , σε ...
                                                   [Gelfand and Smith, 1990]
                     This was enough to convince the whole audience!
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Gelfand and Smith, 1990



Garden of Eden

       In early 1990s, researchers found that Gibbs and then Metropolis -
       Hastings algorithms would crack almost any problem!
       Flood of papers followed applying MCMC:
              linear mixed models (Gelfand et al., 1990; Zeger and Karim, 1991;
              Wang et al., 1993, 1994)
              generalized linear mixed models (Albert and Chib, 1993)
              mixture models (Tanner and Wong, 1987; Diebolt and X., 1990,
              1994; Escobar and West, 1993)
              changepoint analysis (Carlin et al., 1992)
              point processes (Grenander and Møller, 1994)
              &tc
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Gelfand and Smith, 1990



Garden of Eden

       In early 1990s, researchers found that Gibbs and then Metropolis -
       Hastings algorithms would crack almost any problem!
       Flood of papers followed applying MCMC:
              genomics (Stephens and Smith, 1993; Lawrence et al., 1993;
              Churchill, 1995; Geyer and Thompson, 1995)
              ecology (George and X, 1992; Dupuis, 1995)
              variable selection in regression (George and mcCulloch, 1993)
              spatial statistics (Raftery and Banfield, 1991)
              longitudinal studies (Lange et al., 1992)
              &tc
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Gelfand and Smith, 1990



[some of the] early theoretical advances

       “It may well be remembered as the afternoon of the 11 Bayesians”
                                                         [Clifford, 1993]
              Geyer and Thompson, 1992, relied on MCMC methods for ML
              estimation
              Smith and Roberts, 1993
              Besag and Green, 1993
              Tierney, 1994
              Liu, Wong and Kong, 1994,95
              Mengersen and Tweedie, 1996
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Gelfand and Smith, 1990



[some of the] early theoretical advances

       “It may well be remembered as the afternoon of the 11 Bayesians”
                                                         [Clifford, 1993]
              Geyer and Thompson, 1992,
              Smith and Roberts, 1993 discussed convergence diagnoses and
              applications, incl. mixtures for Gibbs and Metropolis–Hastings
              Besag and Green, 1993
              Tierney, 1994
              Liu, Wong and Kong, 1994,95
              Mengersen and Tweedie, 1996
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Gelfand and Smith, 1990



[some of the] early theoretical advances

       “It may well be remembered as the afternoon of the 11 Bayesians”
                                                         [Clifford, 1993]
              Geyer and Thompson, 1992,
              Smith and Roberts, 1993
              Besag and Green, 1993 stated the desideratas for
              convergences, and connect MCMC with auxiliary and
              antithetic variables
              Tierney, 1994
              Liu, Wong and Kong, 1994,95
              Mengersen and Tweedie, 1996
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Gelfand and Smith, 1990



[some of the] early theoretical advances

       “It may well be remembered as the afternoon of the 11 Bayesians”
                                                         [Clifford, 1993]
              Geyer and Thompson, 1992,
              Smith and Roberts, 1993
              Besag and Green, 1993
              Tierney, 1994 laid out all of the assumptions needed to
              analyze the Markov chains and then developed their
              properties, in particular, convergence of ergodic averages and
              central limit theorems
              Liu, Wong and Kong, 1994,95
              Mengersen and Tweedie, 1996
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Gelfand and Smith, 1990



[some of the] early theoretical advances

       “It may well be remembered as the afternoon of the 11 Bayesians”
                                                         [Clifford, 1993]
              Geyer and Thompson, 1992,
              Smith and Roberts, 1993
              Besag and Green, 1993
              Tierney, 1994
              Liu, Wong and Kong, 1994,95 analyzed the covariance
              structure of Gibbs sampling, and were able to formally
              establish the validity of Rao-Blackwellization in Gibbs
              sampling
              Mengersen and Tweedie, 1996
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Gelfand and Smith, 1990



[some of the] early theoretical advances

       “It may well be remembered as the afternoon of the 11 Bayesians”
                                                         [Clifford, 1993]
              Geyer and Thompson, 1992,
              Smith and Roberts, 1993
              Besag and Green, 1993
              Tierney, 1994
              Liu, Wong and Kong, 1994,95
              Mengersen and Tweedie, 1996 set the tone for the study of
              the speed of convergence of MCMC algorithms to the target
              distribution
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Gelfand and Smith, 1990



[some of the] early theoretical advances

       “It may well be remembered as the afternoon of the 11 Bayesians”
                                                         [Clifford, 1993]
              Geyer and Thompson, 1992,
              Smith and Roberts, 1993
              Besag and Green, 1993
              Tierney, 1994
              Liu, Wong and Kong, 1994,95
              Mengersen and Tweedie, 1996
              Gilks, Clayton and Spiegelhalter, 1993
              &tc...
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  The Revolution
     Convergence diagnoses



Convergence diagnoses
              Can we really tell when a complicated Markov chain has
              reached equilibrium? Frankly, I doubt it.
                                                                                             [Clifford, 1993]

       Explosion of methods
              Gelman and Rubin (1991)
              Besag and Green (1992)
              Geyer (1992)
              Raftery and Lewis (1992)
              Cowles and Carlin (1996) coda
              Brooks and Roberts (1998)
              &tc
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  After the Revolution
     Particle systems



Particles, again


       Iterating importance sampling is about as old as Monte Carlo
       methods themselves!
                 [Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]
       Found in the molecular simulation literature of the 50’s with
       self-avoiding random walks and signal processing
                                                [Marshall, 1965; Handschin and Mayne, 1969]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  After the Revolution
     Particle systems



Particles, again


       Iterating importance sampling is about as old as Monte Carlo
       methods themselves!
                 [Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]
       Found in the molecular simulation literature of the 50’s with
       self-avoiding random walks and signal processing
                                                [Marshall, 1965; Handschin and Mayne, 1969]

       Use of the term “particle” dates back to Kitagawa (1996), and Carpenter
       et al. (1997) coined the term “particle filter”.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  After the Revolution
     Particle systems



Bootstrap filter and sequential Monte Carlo



       Gordon, Salmon and Smith (1993) introduced the bootstrap filter
       which, while formally connected with importance sampling,
       involves past simulations and possible MCMC steps (Gilks and
       Berzuini, 2001).
       Sequential imputation was developped in Kong, Liu and Wong
       (1994), while Liu and Chen (1995) first formally pointed out the
       importance of resampling in “sequential Monte Carlo”, a term they
       coined
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  After the Revolution
     Particle systems



pMC versus pMCMC


              Recycling of past simulations legitimate to build better
              importance sampling functions as in population Monte Carlo
                                     [Iba, 2000; Capp´ et al, 2004; Del Moral et al., 2007]
                                                     e


              Recent synthesis by Andrieu, Doucet, and Hollenstein (2010)
              using particles to build an evolving MCMC kernel pθ (y1:T ) in
                                                                   ˆ
              state space models p(x1:T )p(y1:T |x1:T ), along with Andrieu’s
              and Roberts’ (2009) use of approximations in MCMC
              acceptance steps
                                                                             [Kennedy and Kulti, 1985]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  After the Revolution
     Reversible jump



Reversible jump

       Generaly considered as the second Revolution.

   Formalisation of a Markov chain moving across
   models and parameter spaces allows for the
   Bayesian processing of a wide variety of models
   and to the success of Bayesian model choice

       Definition of a proper balance condition on cross-model Markov
       kernels gives a generic setup for exploring variable dimension
       spaces, even when the number of models under comparison is
       infinite.
                                                               [Green, 1995]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  After the Revolution
     Perfect sampling



Perfect sampling


       Seminal paper of Propp and Wilson (1996) showed how to use
       MCMC methods to produce an exact (or perfect) simulation from
       the target.
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  After the Revolution
     Perfect sampling



Perfect sampling


       Seminal paper of Propp and Wilson (1996) showed how to use
       MCMC methods to produce an exact (or perfect) simulation from
       the target.
       Outburst of papers, particularly from Jesper Møller and coauthors,
       but the excitement somehow dried out [except in dedicated areas]
       as construction of perfect samplers is hard and coalescence times
       very high...
                                         [Møller and Waagepetersen, 2003]
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
  After the Revolution
     Envoi



To be continued...
       ...standing on the shoulders of giants

Más contenido relacionado

La actualidad más candente

Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Modelspetitegeek
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
 
4 stochastic processes
4 stochastic processes4 stochastic processes
4 stochastic processesSolo Hermelin
 
Hidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathHidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathLê Hòa
 
Multi-armed bandit by Joni Turunen
Multi-armed bandit by Joni TurunenMulti-armed bandit by Joni Turunen
Multi-armed bandit by Joni TurunenFrosmo
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Fabian Pedregosa
 
Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...Tatsuya Yokota
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Transformation of Random variables & noise concepts
Transformation of Random variables & noise concepts Transformation of Random variables & noise concepts
Transformation of Random variables & noise concepts Darshan Bhatt
 
Introduction to Multi-armed Bandits
Introduction to Multi-armed BanditsIntroduction to Multi-armed Bandits
Introduction to Multi-armed BanditsYan Xu
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionActiveEon
 
Poisson distribution presentation
Poisson distribution  presentationPoisson distribution  presentation
Poisson distribution presentationJAVAID AHMAD WANI
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscanYan Xu
 
Expected value of random variables
Expected value of random variablesExpected value of random variables
Expected value of random variablesAfrasiyab Haider
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 

La actualidad más candente (20)

Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
4 stochastic processes
4 stochastic processes4 stochastic processes
4 stochastic processes
 
Markov Chains
Markov ChainsMarkov Chains
Markov Chains
 
Hidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathHidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable Path
 
Multi-armed bandit by Joni Turunen
Multi-armed bandit by Joni TurunenMulti-armed bandit by Joni Turunen
Multi-armed bandit by Joni Turunen
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
 
Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Db Scan
Db ScanDb Scan
Db Scan
 
Transformation of Random variables & noise concepts
Transformation of Random variables & noise concepts Transformation of Random variables & noise concepts
Transformation of Random variables & noise concepts
 
Introduction to Multi-armed Bandits
Introduction to Multi-armed BanditsIntroduction to Multi-armed Bandits
Introduction to Multi-armed Bandits
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer Vision
 
Poisson distribution presentation
Poisson distribution  presentationPoisson distribution  presentation
Poisson distribution presentation
 
mcmc
mcmcmcmc
mcmc
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscan
 
Hidden Markov Model
Hidden Markov Model Hidden Markov Model
Hidden Markov Model
 
Expected value of random variables
Expected value of random variablesExpected value of random variables
Expected value of random variables
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 

Destacado

Monte Carlo Simulation - Paul wilmott
Monte Carlo Simulation - Paul wilmottMonte Carlo Simulation - Paul wilmott
Monte Carlo Simulation - Paul wilmottAguinaldo Flor
 
Introduction of mixed effect model
Introduction of mixed effect modelIntroduction of mixed effect model
Introduction of mixed effect modelVivian S. Zhang
 
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...Sylvain Gauthier
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering TutorialZitao Liu
 
Metodos de monte carlo en mecánica estadistica
Metodos de monte carlo en mecánica estadisticaMetodos de monte carlo en mecánica estadistica
Metodos de monte carlo en mecánica estadisticaAlejandro Claro Mosqueda
 
That's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data ScienceThat's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data ScienceCorey Chivers
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian MethodsCorey Chivers
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine LearningCorey Chivers
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoderssuga93
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowTatsuya Shirakawa
 
Improving Forecasts with Monte Carlo Simulations
Improving Forecasts with Monte Carlo Simulations  Improving Forecasts with Monte Carlo Simulations
Improving Forecasts with Monte Carlo Simulations Michael Wallace
 
Monte carlo simulation
Monte carlo simulationMonte carlo simulation
Monte carlo simulationRajesh Piryani
 
TensorFlowによるニューラルネットワーク入門
TensorFlowによるニューラルネットワーク入門TensorFlowによるニューラルネットワーク入門
TensorFlowによるニューラルネットワーク入門Etsuji Nakai
 
IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習Preferred Networks
 

Destacado (16)

Monte Carlo Simulation - Paul wilmott
Monte Carlo Simulation - Paul wilmottMonte Carlo Simulation - Paul wilmott
Monte Carlo Simulation - Paul wilmott
 
Introduction of mixed effect model
Introduction of mixed effect modelIntroduction of mixed effect model
Introduction of mixed effect model
 
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering Tutorial
 
Metodos de monte carlo en mecánica estadistica
Metodos de monte carlo en mecánica estadisticaMetodos de monte carlo en mecánica estadistica
Metodos de monte carlo en mecánica estadistica
 
That's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data ScienceThat's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data Science
 
MCQMC 2016 Tutorial
MCQMC 2016 TutorialMCQMC 2016 Tutorial
MCQMC 2016 Tutorial
 
Machine Learning for dummies
Machine Learning for dummiesMachine Learning for dummies
Machine Learning for dummies
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian Methods
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive Flow
 
Improving Forecasts with Monte Carlo Simulations
Improving Forecasts with Monte Carlo Simulations  Improving Forecasts with Monte Carlo Simulations
Improving Forecasts with Monte Carlo Simulations
 
Monte carlo simulation
Monte carlo simulationMonte carlo simulation
Monte carlo simulation
 
TensorFlowによるニューラルネットワーク入門
TensorFlowによるニューラルネットワーク入門TensorFlowによるニューラルネットワーク入門
TensorFlowによるニューラルネットワーク入門
 
IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習
 

Similar a A short history of MCMC

Computational Universe
Computational UniverseComputational Universe
Computational UniverseMehmet Zirek
 
The birth of quantum mechanics canvas
The birth of quantum mechanics canvasThe birth of quantum mechanics canvas
The birth of quantum mechanics canvasGabriel O'Brien
 
A Short Introduction to Quantum Information and Quantum.pdf
A Short Introduction to Quantum Information and Quantum.pdfA Short Introduction to Quantum Information and Quantum.pdf
A Short Introduction to Quantum Information and Quantum.pdfSolMar38
 
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docx
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docxEden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docx
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docxmadlynplamondon
 
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docx
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docxEden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docx
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docxtidwellveronique
 
Quantum computers
Quantum computersQuantum computers
Quantum computersGeet Patel
 
Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Pierre Jacob
 
photon2022116p23.pdf
photon2022116p23.pdfphoton2022116p23.pdf
photon2022116p23.pdfssuser4ac3d8
 
Quantum Entanglement
Quantum EntanglementQuantum Entanglement
Quantum Entanglementpixiejen
 

Similar a A short history of MCMC (10)

Computational Universe
Computational UniverseComputational Universe
Computational Universe
 
Part III - Quantum Mechanics
Part III - Quantum MechanicsPart III - Quantum Mechanics
Part III - Quantum Mechanics
 
The birth of quantum mechanics canvas
The birth of quantum mechanics canvasThe birth of quantum mechanics canvas
The birth of quantum mechanics canvas
 
A Short Introduction to Quantum Information and Quantum.pdf
A Short Introduction to Quantum Information and Quantum.pdfA Short Introduction to Quantum Information and Quantum.pdf
A Short Introduction to Quantum Information and Quantum.pdf
 
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docx
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docxEden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docx
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docx
 
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docx
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docxEden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docx
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docx
 
Quantum computers
Quantum computersQuantum computers
Quantum computers
 
Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7
 
photon2022116p23.pdf
photon2022116p23.pdfphoton2022116p23.pdf
photon2022116p23.pdf
 
Quantum Entanglement
Quantum EntanglementQuantum Entanglement
Quantum Entanglement
 

Más de Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 

Más de Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Último

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Último (20)

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

A short history of MCMC

  • 1. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Christian P. Robert and George Casella Universit´ Paris-Dauphine, IuF, & CRESt e and University of Florida April 2, 2011
  • 2. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data In memoriam, Julian Besag, 1945–2010
  • 3. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Introduction Introduction Markov Chain Monte Carlo (MCMC) methods around for almost as long as Monte Carlo techniques, even though impact on Statistics not been truly felt until the late 1980s / early 1990s .
  • 4. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Introduction Introduction Markov Chain Monte Carlo (MCMC) methods around for almost as long as Monte Carlo techniques, even though impact on Statistics not been truly felt until the late 1980s / early 1990s . Contents: Distinction between Metropolis-Hastings based algorithms and those related with Gibbs sampling, and brief entry into “second-generation MCMC revolution”.
  • 5. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Introduction A few landmarks Realization that Markov chains could be used in a wide variety of situations only came to “mainstream statisticians” with Gelfand and Smith (1990) despite earlier publications in the statistical literature like Hastings (1970) and growing awareness in spatial statistics (Besag, 1986)
  • 6. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Introduction A few landmarks Realization that Markov chains could be used in a wide variety of situations only came to “mainstream statisticians” with Gelfand and Smith (1990) despite earlier publications in the statistical literature like Hastings (1970) and growing awareness in spatial statistics (Besag, 1986) Several reasons: lack of computing machinery lack of background on Markov chains lack of trust in the practicality of the method
  • 7. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los Alamos Bombs before the revolution Monte Carlo methods born in Los Alamos, New Mexico, during WWII, mostly by physicists working on atomic bombs and eventually producing the Metropolis algorithm in the early 1950’s. [Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]
  • 8. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los Alamos Bombs before the revolution Monte Carlo methods born in Los Alamos, New Mexico, during WWII, mostly by physicists working on atomic bombs and eventually producing the Metropolis algorithm in the early 1950’s. [Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]
  • 9. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los Alamos Monte Carlo genesis Monte Carlo method usually traced to Ulam and von Neumann: Stanislaw Ulam associates idea with an intractable combinatorial computation attempted in 1946 about “solitaire” Idea was enthusiastically adopted by John von Neumann for implementation on neutron diffusion Name “Monte Carlo“ being suggested by Nicholas Metropolis [Eckhardt, 1987]
  • 10. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los Alamos Monte Carlo genesis Monte Carlo method usually traced to Ulam and von Neumann: Stanislaw Ulam associates idea with an intractable combinatorial computation attempted in 1946 about “solitaire” Idea was enthusiastically adopted by John von Neumann for implementation on neutron diffusion Name “Monte Carlo“ being suggested by Nicholas Metropolis [Eckhardt, 1987]
  • 11. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los Alamos Monte Carlo genesis Monte Carlo method usually traced to Ulam and von Neumann: Stanislaw Ulam associates idea with an intractable combinatorial computation attempted in 1946 about “solitaire” Idea was enthusiastically adopted by John von Neumann for implementation on neutron diffusion Name “Monte Carlo“ being suggested by Nicholas Metropolis [Eckhardt, 1987]
  • 12. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los Alamos Monte Carlo genesis Monte Carlo method usually traced to Ulam and von Neumann: Stanislaw Ulam associates idea with an intractable combinatorial computation attempted in 1946 about “solitaire” Idea was enthusiastically adopted by John von Neumann for implementation on neutron diffusion Name “Monte Carlo“ being suggested by Nicholas Metropolis [Eckhardt, 1987]
  • 13. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los Alamos Monte Carlo with computers Very close “coincidence” with appearance of very first computer, ENIAC, born Feb. 1946, on which von Neumann implemented Monte Carlo in 1947
  • 14. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los Alamos Monte Carlo with computers Very close “coincidence” with appearance of very first computer, ENIAC, born Feb. 1946, on which von Neumann implemented Monte Carlo in 1947 Same year Ulam and von Neumann (re)invented inversion and accept-reject techniques In 1949, very first symposium on Monte Carlo and very first paper [Metropolis and Ulam, 1949]
  • 15. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953 The Metropolis et al. (1953) paper Very first MCMC algorithm associated with the second computer, MANIAC, Los Alamos, early 1952. Besides Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller contributed to create the Metropolis algorithm...
  • 16. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953 Motivating problem Computation of integrals of the form F (p, q) exp{−E(p, q)/kT }dpdq I= , exp{−E(p, q)/kT }dpdq with energy E defined as N N 1 E(p, q) = V (dij ), 2 i=1 j=1 j=i and N number of particles, V a potential function and dij the distance between particles i and j.
  • 17. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953 Boltzmann distribution Boltzmann distribution exp{−E(p, q)/kT } parameterised by temperature T , k being the Boltzmann constant, with a normalisation factor Z(T ) = exp{−E(p, q)/kT }dpdq not available in closed form.
  • 18. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953 Computational challenge Since p and q are 2N -dimensional vectors, numerical integration is impossible Plus, standard Monte Carlo techniques fail to correctly approximate I: exp{−E(p, q)/kT } is very small for most realizations of random configurations (p, q) of the particle system.
  • 19. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953 Metropolis algorithm Consider a random walk modification of the N particles: for each 1 ≤ i ≤ N , values xi = xi + αξ1i and yi = yi + αξ2i are proposed, where both ξ1i and ξ2i are uniform U(−1, 1). The energy difference between new and previous configurations is ∆E and the new configuration is accepted with probability 1 ∧ exp{−∆E/kT } , and otherwise the previous configuration is replicated∗ ∗ counting one more time in the average of the F (pt , pt )’s over the τ moves of the random walk.
  • 20. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953 Convergence Validity of the algorithm established by proving 1. irreducibility 2. ergodicity, that is convergence to the stationary distribution.
  • 21. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953 Convergence Validity of the algorithm established by proving 1. irreducibility 2. ergodicity, that is convergence to the stationary distribution. Second part obtained via discretization of the space: Metropolis et al. note that the proposal is reversible, then establish that exp{−E/kT } is invariant.
  • 22. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953 Convergence Validity of the algorithm established by proving 1. irreducibility 2. ergodicity, that is convergence to the stationary distribution. Second part obtained via discretization of the space: Metropolis et al. note that the proposal is reversible, then establish that exp{−E/kT } is invariant. Application to the specific problem of the rigid-sphere collision model. The number of iterations of the Metropolis algorithm seems to be limited: 16 steps for burn-in and 48 to 64 subsequent iterations (that still required four to five hours on the Los Alamos MANIAC).
  • 23. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953 Physics and chemistry The method of Markov chain Monte Carlo immediately had wide use in physics and chemistry. [Geyer & Thompson, 1992] Hammersley and Handscomb, 1967 Piekaar and Clarenburg, 1967 Kennedy and Kutil, 1985 Sokal, 1989 &tc...
  • 24. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953 Physics and chemistry Statistics has always been fuelled by energetic mining of the physics literature. [Clifford, 1993] Hammersley and Handscomb, 1967 Piekaar and Clarenburg, 1967 Kennedy and Kutil, 1985 Sokal, 1989 &tc...
  • 25. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970 A fair generalisation In Biometrika 1970, Hastings defines MCMC methodology for finite and reversible Markov chains, the continuous case being discretised:
  • 26. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970 A fair generalisation In Biometrika 1970, Hastings defines MCMC methodology for finite and reversible Markov chains, the continuous case being discretised: Generic acceptance probability for a move from state i to state j is sij αij = πi q , 1 + πj qij ji where sij is a symmetric function.
  • 27. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970 State of the art Note Generic form that encompasses both Metropolis et al. (1953) and Barker (1965). Peskun’s ordering not yet discovered: Hastings mentions that little is known about the relative merits of those two choices (even though) Metropolis’s method may be preferable. Warning against high rejection rates as indicative of a poor choice of transition matrix, but not mention of the opposite pitfall of low rejection.
  • 28. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970 What else?! Items included in the paper are a Poisson target with a ±1 random walk proposal, a normal target with a uniform random walk proposal mixed with its reflection (i.e. centered at −X(t) rather than X(t)), a multivariate target where Hastings introduces Gibbs sampling, updating one component at a time and defining the composed transition as satisfying the stationary condition because each component does leave the target invariant a reference to Erhman, Fosdick and Handscomb (1960) as a preliminary if specific instance of this Metropolis-within-Gibbs sampler an importance sampling version of MCMC, some remarks about error assessment, a Gibbs sampler for random orthogonal matrices
  • 29. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970 Three years later Peskun (1973) compares Metropolis’ and Barker’s acceptance probabilities and shows (again in a discrete setup) that Metropolis’ is optimal (in terms of the asymptotic variance of any empirical average).
  • 30. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970 Three years later Peskun (1973) compares Metropolis’ and Barker’s acceptance probabilities and shows (again in a discrete setup) that Metropolis’ is optimal (in terms of the asymptotic variance of any empirical average). Proof direct consequence of Kemeny and Snell (1960) on asymptotic variance. Peskun also establishes that this variance can improve upon the iid case if and only if the eigenvalues of P − A are all negative, when A is the transition matrix corresponding to the iid simulation and P the transition matrix corresponding to the Metropolis algorithm, but he concludes that the trace of P − A is always positive.
  • 31. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Julian’s early works (1) Early 1970’s, Hammersley, Clifford, and Besag were working on the specification of joint distributions from conditional distributions and on necessary and sufficient conditions for the conditional distributions to be compatible with a joint distribution.
  • 32. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Julian’s early works (1) Early 1970’s, Hammersley, Clifford, and Besag were working on the specification of joint distributions from conditional distributions and on necessary and sufficient conditions for the conditional distributions to be compatible with a joint distribution. [Hammersley and Clifford, 1971]
  • 33. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Julian’s early works (1) Early 1970’s, Hammersley, Clifford, and Besag were working on the specification of joint distributions from conditional distributions and on necessary and sufficient conditions for the conditional distributions to be compatible with a joint distribution. What is the most general form of the conditional probability functions that define a coherent joint function? And what will the joint look like? [Besag, 1972]
  • 34. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Hammersley-Clifford theorem Theorem (Hammersley-Clifford) Joint distribution of vector associated with a dependence graph must be represented as product of functions over the cliques of the graphs, i.e., of functions depending only on the components indexed by the labels in the clique. [Cressie, 1993; Lauritzen, 1996]
  • 35. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Hammersley-Clifford theorem Theorem (Hammersley-Clifford) A probability distribution P with positive and continuous density f satisfies the pairwise Markov property with respect to an undirected graph G if and only if it factorizes according to G , i.e., (F ) ≡ (G) [Cressie, 1993; Lauritzen, 1996]
  • 36. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Hammersley-Clifford theorem Theorem (Hammersley-Clifford) Under the positivity condition, the joint distribution g satisfies p g j (y j |y 1 , . . . , y j−1 ,y j+1 , . . . , y p) g(y1 , . . . , yp ) ∝ g j (y j |y 1 , . . . , y j−1 ,y , . . . , y p) j=1 j+1 for every permutation on {1, 2, . . . , p} and every y ∈ Y . [Cressie, 1993; Lauritzen, 1996]
  • 37. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works An apocryphal theorem The Hammersley-Clifford theorem was never published by its authors, but only through Grimmet (1973), Preston (1973), Sherman (1973), Besag (1974). The authors were dissatisfied with the positivity constraint: The joint density could only be recovered from the full conditionals when the support of the joint was made of the product of the supports of the full conditionals (with obvious counter-examples. Moussouris’ counter-example put a full stop to their endeavors. [Hammersley, 1974]
  • 38. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works To Gibbs or not to Gibbs? Julian Besag should certainly be credited to a large extent of the (re?-)discovery of the Gibbs sampler.
  • 39. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works To Gibbs or not to Gibbs? Julian Besag should certainly be credited to a large extent of the (re?-)discovery of the Gibbs sampler. The simulation procedure is to consider the sites cyclically and, at each stage, to amend or leave unaltered the particular site value in question, according to a probability distribution whose elements depend upon the current value at neighboring sites (...) However, the technique is unlikely to be particularly helpful in many other than binary situations and the Markov chain itself has no practical interpretation. [Besag, 1974]
  • 40. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Broader perspective In 1964, Hammersley and Handscomb wrote a (the first?) textbook on Monte Carlo methods: they cover They cover such topics as “Crude Monte Carlo“; importance sampling; control variates; and “Conditional Monte Carlo”, which looks surprisingly like a missing-data Gibbs completion approach. They state in the Preface We are convinced nevertheless that Monte Carlo methods will one day reach an impressive maturity.
  • 41. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Clicking in After Peskun (1973), MCMC mostly dormant in mainstream statistical world for about 10 years, then several papers/books highlighted its usefulness in specific settings: Geman and Geman (1984) Besag (1986) Strauss (1986) Ripley (Stochastic Simulation, 1987) Tanner and Wong (1987) Younes (1988)
  • 42. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Enters the Gibbs sampler Geman and Geman (1984), building on Metropolis et al. (1953), Hastings (1970), and Peskun (1973), constructed a Gibbs sampler for optimisation in a discrete image processing problem without completion. Responsible for the name Gibbs sampling, because method used for the Bayesian study of Gibbs random fields linked to the physicist Josiah Willard Gibbs (1839–1903) Back to Metropolis et al., 1953: the Gibbs sampler is used as a simulated annealing algorithm and ergodicity is proven on the collection of global maxima
  • 43. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Besag (1986) integrates GS for SA... ...easy to construct the transition matrix Q, of a discrete time Markov chain, with state space Ω and limit distribution (4). Simulated annealing proceeds by running an associated time inhomogeneous Markov chain with transition matrices QT , where T is progressively decreased according to a prescribed “schedule” to a value close to zero. [Besag, 1986]
  • 44. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works ...and links with Metropolis-Hastings... There are various related methods of constructing a manageable QT (Hastings, 1970). Geman and Geman (1984) adopt the simplest, which they term the ”Gibbs sampler” (...) time reversibility, a common ingredient in this type of problem (see, for example, Besag, 1977a), is present at individual stages but not over complete cycles, though Peter Green has pointed out that it returns if QT is taken over a pair of cycles, the second of which visits pixels in reverse order [Besag, 1986]
  • 45. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works ...seeing the larger picture,... As Geman and Geman (1984) point out, any property of the (posterior) distribution P (x|y) can be simulated by running the Gibbs sampler at “temperature” T = 1. Thus, if xi maximizes P (xi |y), then it is the most ˆ frequently occurring colour at pixel i in an infinite realization of the Markov chain with transition matrix Q of Section 2.3. The xi ’s can therefore be simultaneously ˆ estimated from a single finite realization of the chain. It is not yet clear how long the realization needs to be, particularly for estimation near colour boundaries, but the amount of computation required is generally prohibitive for routine purposes [Besag, 1986]
  • 46. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works ...seeing the larger picture,... P (x|y) can be simulated using the Gibbs sampler, as suggested by Grenander (1983) and by Geman and Geman (1984). My dismissal of such an approach for routine applications was somewhat cavalier: purpose-built array processors could become relatively inexpensive (...) suppose that, for 100 complete cycles say, images have been collected from the Gibbs sampler (or by Metropolis’ method), following a “settling-in” period of perhaps another 100 cycles, which should cater for fairly intricate priors (...) These 100 images should often be adequate for estimating properties of the posterior (...) and for making approximate associated confidence statements, as mentioned by Mr Haslett. [Besag, 1986]
  • 47. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works ...if not going fully Bayes! ...a neater and more efficient procedure [for parameter estimation] is to adopt maximum ”pseudo-likelihood” estimation (Besag, 1975)
  • 48. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works ...if not going fully Bayes! ...a neater and more efficient procedure [for parameter estimation] is to adopt maximum ”pseudo-likelihood” estimation (Besag, 1975) I have become increasingly enamoured with the Bayesian paradigm [Besag, 1986]
  • 49. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works ...if not going fully Bayes! ...a neater and more efficient procedure [for parameter estimation] is to adopt maximum ”pseudo-likelihood” estimation (Besag, 1975) I have become increasingly enamoured with the Bayesian paradigm [Besag, 1986] The pair (xi , βi ) is then a (bivariate) Markov field and can be reconstructed as a bivariate process by the methods described in Professor Besag’s paper. [Clifford, 1986]
  • 50. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works ...if not going fully Bayes! ...a neater and more efficient procedure [for parameter estimation] is to adopt maximum ”pseudo-likelihood” estimation (Besag, 1975) I have become increasingly enamoured with the Bayesian paradigm [Besag, 1986] The simulation-based estimator Epost Ψ(X) will differ ˆ from the m.a.p. estimator Ψ(x). [Silverman, 1986]
  • 51. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Discussants of Besag (1986) Impressive who’s who: D.M. Titterington, P. Clifford, P. Green, P. Brown, B. Silverman, F. Critchley, F. Kelly, K. Mardia, C. Jennison, J. Kent, D. Spiegelhalter, H. Wynn, D. and S. Geman, J. Haslett, J. Kay, H. K¨nsch, P. Switzer, B. Torsney, &tc u
  • 52. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works A comment on Besag (1986) While special purpose algorithms will determine the utility of the Bayesian methods, the general purpose methods-stochastic relaxation and simulation of solutions of the Langevin equation (Grenander, 1983; Geman and Geman, 1984; Gidas, 1985a; Geman and Hwang, 1986) have proven enormously convenient and versatile. We are able to apply a single computer program to every new problem by merely changing the subroutine that computes the energy function in the Gibbs representation of the posterior distribution. [Geman and McClure, 1986]
  • 53. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Another one It is easy to compute exact marginal and joint posterior probabilities of currently unobserved features, conditional on those clinical findings currently available (Spiegelhalter, 1986a,b), the updating taking the form of ‘propagating evidence’ through the network (...) it would be interesting to see if the techniques described tonight, which are of intermediate complexity, may have any applications in this new and exciting area [causal networks]. [Spiegelhalter, 1986]
  • 54. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works The candidate’s formula Representation of the marginal likelihood as π(θ)f (x|θ) m(x) π(θ|x) or of the marginal predictive as pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y ) [Besag, 1989]
  • 55. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works The candidate’s formula Representation of the marginal likelihood as π(θ)f (x|θ) m(x) π(θ|x) or of the marginal predictive as pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y ) [Besag, 1989] Why candidate? “Equation (2) appeared without explanation in a Durham University undergraduate final examination script of 1984. Regrettably, the student’s name is no longer known to me.”
  • 56. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Implications Newton and Raftery (1994) used this representation to derive the [infamous] harmonic mean approximation to the marginal likelihood Gelfand and Dey (1994) Geyer and Thompson (1995) Chib (1995)
  • 57. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Implications Newton and Raftery (1994) Gelfand and Dey (1994) also relied on this formula for the same purpose in a more general perspective Geyer and Thompson (1995) Chib (1995)
  • 58. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Implications Newton and Raftery (1994) Gelfand and Dey (1994) Geyer and Thompson (1995) derived MLEs by a Monte Carlo approximation to the normalising constant Chib (1995)
  • 59. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works Implications Newton and Raftery (1994) Gelfand and Dey (1994) Geyer and Thompson (1995) Chib (1995) uses this representation to build a MCMC approximation to the marginal likelihood
  • 60. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Final steps to Impact “This is surely a revolution.” [Clifford, 1993] Geman and Geman (1984) is one more spark that led to the explosion, as it had a clear influence on Gelfand, Green, Smith, Spiegelhalter and others. Sparked new interest in Bayesian methods, statistical computing, algorithms, and stochastic processes through the use of computing algorithms such as the Gibbs sampler and the Metropolis–Hastings algorithm.
  • 61. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Final steps to Impact “[Gibbs sampler] use seems to have been isolated in the spatial statistics community until Gelfand and Smith (1990)” [Geyer, 1990] Geman and Geman (1984) is one more spark that led to the explosion, as it had a clear influence on Gelfand, Green, Smith, Spiegelhalter and others. Sparked new interest in Bayesian methods, statistical computing, algorithms, and stochastic processes through the use of computing algorithms such as the Gibbs sampler and the Metropolis–Hastings algorithm.
  • 62. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Final steps to Data augmentation Tanner and Wong (1987) has essentialy the same ingredients as Gelfand and Smith (1990): simulating from conditionals is simulating from the joint
  • 63. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Final steps to Data augmentation Tanner and Wong (1987) has essentialy the same ingredients as Gelfand and Smith (1990): simulating from conditionals is simulating from the joint Lower impact: emphasis on missing data problems (hence data augmentation) MCMC approximation to the target at every iteration K 1 π(θ|x) ≈ π(θ|x, z t,k ) , z t,k ∼ πt−1 (z|x) , ˆ K k=1 too close to Rubin’s (1978) multiple imputation theoretical backup based on functional analysis (Markov kernel had to be uniformly bounded and equicontinuous)
  • 64. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990 Epiphany In June 1989, at a Bayesian workshop in Sherbrooke, Qu´bec, Adrian Smith exposed for the first time (?) e the generic features of Gibbs sampler, exhibiting a ten line Fortran program handling a random effect model Yij = θi + εij , i = 1, . . . , K, j = 1, . . . , J, 2 2 θi ∼ N(µ, σθ ) εij ∼ N(0, σε ) by full conditionals on µ, σθ , σε ... [Gelfand and Smith, 1990] This was enough to convince the whole audience!
  • 65. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990 Garden of Eden In early 1990s, researchers found that Gibbs and then Metropolis - Hastings algorithms would crack almost any problem! Flood of papers followed applying MCMC: linear mixed models (Gelfand et al., 1990; Zeger and Karim, 1991; Wang et al., 1993, 1994) generalized linear mixed models (Albert and Chib, 1993) mixture models (Tanner and Wong, 1987; Diebolt and X., 1990, 1994; Escobar and West, 1993) changepoint analysis (Carlin et al., 1992) point processes (Grenander and Møller, 1994) &tc
  • 66. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990 Garden of Eden In early 1990s, researchers found that Gibbs and then Metropolis - Hastings algorithms would crack almost any problem! Flood of papers followed applying MCMC: genomics (Stephens and Smith, 1993; Lawrence et al., 1993; Churchill, 1995; Geyer and Thompson, 1995) ecology (George and X, 1992; Dupuis, 1995) variable selection in regression (George and mcCulloch, 1993) spatial statistics (Raftery and Banfield, 1991) longitudinal studies (Lange et al., 1992) &tc
  • 67. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990 [some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, relied on MCMC methods for ML estimation Smith and Roberts, 1993 Besag and Green, 1993 Tierney, 1994 Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996
  • 68. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990 [some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 discussed convergence diagnoses and applications, incl. mixtures for Gibbs and Metropolis–Hastings Besag and Green, 1993 Tierney, 1994 Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996
  • 69. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990 [some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 Besag and Green, 1993 stated the desideratas for convergences, and connect MCMC with auxiliary and antithetic variables Tierney, 1994 Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996
  • 70. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990 [some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 Besag and Green, 1993 Tierney, 1994 laid out all of the assumptions needed to analyze the Markov chains and then developed their properties, in particular, convergence of ergodic averages and central limit theorems Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996
  • 71. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990 [some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 Besag and Green, 1993 Tierney, 1994 Liu, Wong and Kong, 1994,95 analyzed the covariance structure of Gibbs sampling, and were able to formally establish the validity of Rao-Blackwellization in Gibbs sampling Mengersen and Tweedie, 1996
  • 72. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990 [some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 Besag and Green, 1993 Tierney, 1994 Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996 set the tone for the study of the speed of convergence of MCMC algorithms to the target distribution
  • 73. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990 [some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 Besag and Green, 1993 Tierney, 1994 Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996 Gilks, Clayton and Spiegelhalter, 1993 &tc...
  • 74. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Convergence diagnoses Convergence diagnoses Can we really tell when a complicated Markov chain has reached equilibrium? Frankly, I doubt it. [Clifford, 1993] Explosion of methods Gelman and Rubin (1991) Besag and Green (1992) Geyer (1992) Raftery and Lewis (1992) Cowles and Carlin (1996) coda Brooks and Roberts (1998) &tc
  • 75. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Particle systems Particles, again Iterating importance sampling is about as old as Monte Carlo methods themselves! [Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955] Found in the molecular simulation literature of the 50’s with self-avoiding random walks and signal processing [Marshall, 1965; Handschin and Mayne, 1969]
  • 76. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Particle systems Particles, again Iterating importance sampling is about as old as Monte Carlo methods themselves! [Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955] Found in the molecular simulation literature of the 50’s with self-avoiding random walks and signal processing [Marshall, 1965; Handschin and Mayne, 1969] Use of the term “particle” dates back to Kitagawa (1996), and Carpenter et al. (1997) coined the term “particle filter”.
  • 77. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Particle systems Bootstrap filter and sequential Monte Carlo Gordon, Salmon and Smith (1993) introduced the bootstrap filter which, while formally connected with importance sampling, involves past simulations and possible MCMC steps (Gilks and Berzuini, 2001). Sequential imputation was developped in Kong, Liu and Wong (1994), while Liu and Chen (1995) first formally pointed out the importance of resampling in “sequential Monte Carlo”, a term they coined
  • 78. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Particle systems pMC versus pMCMC Recycling of past simulations legitimate to build better importance sampling functions as in population Monte Carlo [Iba, 2000; Capp´ et al, 2004; Del Moral et al., 2007] e Recent synthesis by Andrieu, Doucet, and Hollenstein (2010) using particles to build an evolving MCMC kernel pθ (y1:T ) in ˆ state space models p(x1:T )p(y1:T |x1:T ), along with Andrieu’s and Roberts’ (2009) use of approximations in MCMC acceptance steps [Kennedy and Kulti, 1985]
  • 79. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Reversible jump Reversible jump Generaly considered as the second Revolution. Formalisation of a Markov chain moving across models and parameter spaces allows for the Bayesian processing of a wide variety of models and to the success of Bayesian model choice Definition of a proper balance condition on cross-model Markov kernels gives a generic setup for exploring variable dimension spaces, even when the number of models under comparison is infinite. [Green, 1995]
  • 80. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Perfect sampling Perfect sampling Seminal paper of Propp and Wilson (1996) showed how to use MCMC methods to produce an exact (or perfect) simulation from the target.
  • 81. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Perfect sampling Perfect sampling Seminal paper of Propp and Wilson (1996) showed how to use MCMC methods to produce an exact (or perfect) simulation from the target. Outburst of papers, particularly from Jesper Møller and coauthors, but the excitement somehow dried out [except in dedicated areas] as construction of perfect samplers is hard and coalescence times very high... [Møller and Waagepetersen, 2003]
  • 82. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Envoi To be continued... ...standing on the shoulders of giants