SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Does MCMC converge?
        Postprocessing MCMC output




Multimodality and label switching: a discussion

                      Christian P. Robert

          Universit´ Paris-Dauphine and CREST, INSEE
                   e
           http://www.ceremade.dauphine.fr/~xian


            Workshop on mixtures, ICMS
                 February 28, 2010




                 Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?
                Postprocessing MCMC output


Outline




  1   Does MCMC converge?

  2   Postprocessing MCMC output




                         Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?
               Postprocessing MCMC output


Monte Carlo perspective


  When, given a target π, an MCMC sampler never visits more than
  50% of the support of π,




                        Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?
               Postprocessing MCMC output


Monte Carlo perspective


  When, given a target π, an MCMC sampler never visits more than
  50% of the support of π, it can be argued that the sampler does
  not converge!




                        Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?
               Postprocessing MCMC output


Monte Carlo perspective


  When, given a target π, an MCMC sampler never visits more than
  50% of the support of π, it can be argued that the sampler does
  not converge!
  Two-component normal mixture and Gibbs sampler




                                                   4
  Case when both means µi


                                                   3
  are the only unknowns,

                                                   2
  with different weights and                   µ2
                                                   1

  same variance: identifiable                       0




  model
                                                   −1




                                                            −1        0         1        2         3      4

                                                                                    µ1




                        Christian P. Robert             Multimodality and label switching: a discussion
Does MCMC converge?
               Postprocessing MCMC output


Monte Carlo perspective


  When, given a target π, an MCMC sampler never visits more than
  50% of the support of π, it can be argued that the sampler does
  not converge!
  Two-component normal mixture and Gibbs sampler




                                                   4
  Case when both means µi


                                                   3
  are the only unknowns,

                                                   2
  with different weights and                   µ2
                                                   1

  same variance: identifiable                       0




  model
                                                   −1




                                                            −1        0         1        2         3      4

                                                                                    µ1




                        Christian P. Robert             Multimodality and label switching: a discussion
Does MCMC converge?
               Postprocessing MCMC output


Monte Carlo perspective

  When, given a target π, an MCMC sampler never visits more than
  50% of the support of π, it can be argued that the sampler does
  not converge!
  Two-component normal mixture and Gibbs sampler




                                                   4
  Case when both means µi


                                                   3
  are the only unknowns,

                                                   2
  with different weights and
                                              µ2
                                                   1
  same variance: identifiable                       0




  model
                                                   −1




                                                            −1        0         1        2         3      4

                                                                                    µ1




                (C.) Simple MCMC does not work


                        Christian P. Robert             Multimodality and label switching: a discussion
Does MCMC converge?
               Postprocessing MCMC output


Imposed permutations may miss the mark




  While duplicating the MCMC sampler according to all
  permutations ρ in Sk produces perfect exchangeability [nice!],




                        Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?
               Postprocessing MCMC output


Imposed permutations may miss the mark




  While duplicating the MCMC sampler according to all
  permutations ρ in Sk produces perfect exchangeability [nice!],
      it does not bring additional energy to the MCMC sampler
      it does not identify other modes (under- or over-fitting)
      it does not apply in nearly-but-not exchangeable settings




                        Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?
                  Postprocessing MCMC output


Illustrations

   Example (Two-mean Gaussian mixture)




                                                                                            4
                                                                                            3
                                                                                            2
  Case of
  pN (µ1 , 1) + (1 − p)N (µ1 , 1)




                                                                                                 µ1
                                                                                            1
  (p = 0.5)




                                                                                            0
                                                                                            −1
                                                              4   3   2        1   0   −1

                                                                          µ2




                           Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?
               Postprocessing MCMC output


Illustrations (2)

  Example (Two-mean Gaussian mixture and outliers)
  Same model, but data from 5-component mixture




                        Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?
                Postprocessing MCMC output


Illustrations (3)

  Example (Outlier Gaussian mixture)
  Case of pN (0, 1) + (1 − p)N (µ, σ 2 ) with p known




                         Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?
                Postprocessing MCMC output


Illustrations (3)

  Example (Outlier Gaussian mixture)
  Case of pN (0, 1) + (1 − p)N (µ, σ 2 ) with p known




                         Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
                Postprocessing MCMC output     Nested sampling


Postprocessing issues


  When assessing the number k of components via the evidence

                       Zk =             πk (θk )Lk (θk ) dθk ,
                                   Θk

  aka the marginal likelihood,




                         Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
                Postprocessing MCMC output     Nested sampling


Postprocessing issues


  When assessing the number k of components via the evidence

                       Zk =             πk (θk )Lk (θk ) dθk ,
                                   Θk

  aka the marginal likelihood, label switching is a liability and a
  uninteresting phenomenon




                         Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
                  Postprocessing MCMC output     Nested sampling


Postprocessing issues


  When assessing the number k of components via the evidence

                         Zk =             πk (θk )Lk (θk ) dθk ,
                                     Θk

  aka the marginal likelihood, label switching is a liability and a
  uninteresting phenomenon
  Indeed,

               πk (θk )Lk (θk ) dθk , = k!                   πk (θk )Lk (θk ) dθk
          Θk                                      Θk /Sk

  means that integrating over the restricted space is [more than] ok!



                           Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
               Postprocessing MCMC output     Nested sampling


Chib’s representation



  Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
  θk ∼ πk (θk ),
                                 fk (x|θk ) πk (θk )
                   Zk = mk (x) =
                                     πk (θk |x)




                        Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?         Chib’s solution
               Postprocessing MCMC output        Nested sampling


Chib’s representation



  Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
  θk ∼ πk (θk ),
                                 fk (x|θk ) πk (θk )
                   Zk = mk (x) =
                                     πk (θk |x)
  Use of an approximation to the posterior
                                                      ∗       ∗
                                              fk (x|θk ) πk (θk )
                   Zk = mk (x) =                                  .
                                                  πk (θk |x)
                                                   ˆ ∗




                        Christian P. Robert      Multimodality and label switching: a discussion
Does MCMC converge?            Chib’s solution
                   Postprocessing MCMC output           Nested sampling


Case of latent variables



  For missing variable z as in mixture models, natural Rao-Blackwell
  estimate
                                   T
                        ∗       1          ∗      (t)
                   πk (θk |x) =       πk (θk |x, zk ) ,
                                T
                                                  t=1
             (t)
  where the zk ’s are Gibbs sampled latent variables




                            Christian P. Robert         Multimodality and label switching: a discussion
Does MCMC converge?            Chib’s solution
                   Postprocessing MCMC output           Nested sampling


Case of latent variables



  For missing variable z as in mixture models, natural Rao-Blackwell
  estimate
                                   T
                        ∗       1          ∗      (t)
                   πk (θk |x) =       πk (θk |x, zk ) ,
                                T
                                                  t=1
             (t)
  where the zk ’s are Gibbs sampled latent variables
  But convergence impaired by lack of label switching




                            Christian P. Robert         Multimodality and label switching: a discussion
Does MCMC converge?            Chib’s solution
                   Postprocessing MCMC output           Nested sampling


Case of latent variables



  For missing variable z as in mixture models, natural Rao-Blackwell
  estimate
                                   T
                        ∗       1          ∗      (t)
                   πk (θk |x) =       πk (θk |x, zk ) ,
                                T
                                                  t=1
             (t)
  where the zk ’s are Gibbs sampled latent variables
  But convergence impaired by lack of label switching
                (C.) Simple MCMC does not work




                            Christian P. Robert         Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
                Postprocessing MCMC output     Nested sampling


Compensation for label switching
                           (t)
  For mixture models, zk usually fails to visit all configurations in a
  balanced way, despite the symmetry predicted by the theory
                                                  1
            πk (θk |x) = πk (ρ(θk )|x) =                         πk (ρ(θk )|x)
                                                  k!
                                                       ρ∈S

  for all ρ’s in Sk , set of all permutations of {1, . . . , k}.
  Consequences on numerical approximation, biased by an order k!




                         Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?       Chib’s solution
                Postprocessing MCMC output      Nested sampling


Compensation for label switching
                            (t)
  For mixture models, zk usually fails to visit all configurations in a
  balanced way, despite the symmetry predicted by the theory
                                                    1
            πk (θk |x) = πk (ρ(θk )|x) =                          πk (ρ(θk )|x)
                                                    k!
                                                         ρ∈S

  for all ρ’s in Sk , set of all permutations of {1, . . . , k}.
  Consequences on numerical approximation, biased by an order k!
  Recover the theoretical symmetry by using
                                                T
                  ∗           1                            ∗              (t)
             πk (θk |x)    =                        πk (ρ(θk )|x, zk ) .
                             T k!
                                        ρ∈Sk t=1

                                        [Berkhof, Mechelen, & Gelman, 2003]

                          Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
                Postprocessing MCMC output     Nested sampling


Galaxy dataset (k)
                                          ∗
  Using only the original estimate, with θk as the MAP estimator,

                         log(mk (x)) = −105.1396
                             ˆ

  for k = 3 (based on 103 simulations), while introducing the
  permutations leads to

          log(mk (x)) = −103.3479 = −105.1396 + log(3!)
              ˆ




                         Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
                 Postprocessing MCMC output     Nested sampling


Galaxy dataset (k)
                                          ∗
  Using only the original estimate, with θk as the MAP estimator,

                          log(mk (x)) = −105.1396
                              ˆ

  for k = 3 (based on 103 simulations), while introducing the
  permutations leads to

            log(mk (x)) = −103.3479 = −105.1396 + log(3!)
                ˆ

   k           2          3             4           5                6            7              8
   mk (x)   -115.68    -103.35       -102.66     -101.93          -102.88      -105.48        -108.44

  Estimations of the marginal likelihoods by the symmetrised Chib’s
  approximation (based on 105 Gibbs iterations and, for k > 5, 100
  permutations selected at random in Sk ).
                            [Lee, Marin, Mengersen & Robert, 2008]
                          Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
                Postprocessing MCMC output     Nested sampling


Comparison between evidence approximations


   1   Nested sampling: M = 1000 points, with 10 random walk
       moves at each step, simulations from the constr’d prior and a
       stopping rule at 95% of the observed maximum likelihood
   2   T = 104 MCMC (=Gibbs) simulations producing
       non-parametric estimates ϕ
   3   Monte Carlo estimates Z1 , Z2 , Z3 using product of two
       Gaussian kernels
   4   numerical integration based on 850 × 950 grid [reference
       value, confirmed by Chib’s]




                         Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
               Postprocessing MCMC output     Nested sampling


Comparison (cont’d)




  Graph based on a sample of 10 observations for µ = 2 and
  σ = 3/2 (150 replicas) V1=Nested sampling, V2=importance
  sampling, V3=harmonic mean, V4=bridge sampling.
                                          [Chopin & Robert, 2010]


                        Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
               Postprocessing MCMC output     Nested sampling


Comparison (cont’d)




  Graph based on a sample of 50 observations for µ = 2 and
  σ = 3/2 (150 replicas) V1=Nested sampling, V2=importance
  sampling, V3=harmonic mean, V4=bridge sampling.
                                          [Chopin & Robert, 2010]


                        Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
               Postprocessing MCMC output     Nested sampling


Comparison (cont’d)




  Graph based on a sample of 100 observations for µ = 2 and
  σ = 3/2 (150 replicas) V1=Nested sampling, V2=importance
  sampling, V3=harmonic mean, V4=bridge sampling.
                                          [Chopin & Robert, 2010]


                        Christian P. Robert   Multimodality and label switching: a discussion
Does MCMC converge?      Chib’s solution
            Postprocessing MCMC output     Nested sampling


Comparison (cont’d)




                               [Lee, Marin, Mengersen & Robert, 2010]



                     Christian P. Robert   Multimodality and label switching: a discussion

Más contenido relacionado

Más de Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 

Más de Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Último

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 

Último (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 

ICMS Discussion, March 2010

  • 1. Does MCMC converge? Postprocessing MCMC output Multimodality and label switching: a discussion Christian P. Robert Universit´ Paris-Dauphine and CREST, INSEE e http://www.ceremade.dauphine.fr/~xian Workshop on mixtures, ICMS February 28, 2010 Christian P. Robert Multimodality and label switching: a discussion
  • 2. Does MCMC converge? Postprocessing MCMC output Outline 1 Does MCMC converge? 2 Postprocessing MCMC output Christian P. Robert Multimodality and label switching: a discussion
  • 3. Does MCMC converge? Postprocessing MCMC output Monte Carlo perspective When, given a target π, an MCMC sampler never visits more than 50% of the support of π, Christian P. Robert Multimodality and label switching: a discussion
  • 4. Does MCMC converge? Postprocessing MCMC output Monte Carlo perspective When, given a target π, an MCMC sampler never visits more than 50% of the support of π, it can be argued that the sampler does not converge! Christian P. Robert Multimodality and label switching: a discussion
  • 5. Does MCMC converge? Postprocessing MCMC output Monte Carlo perspective When, given a target π, an MCMC sampler never visits more than 50% of the support of π, it can be argued that the sampler does not converge! Two-component normal mixture and Gibbs sampler 4 Case when both means µi 3 are the only unknowns, 2 with different weights and µ2 1 same variance: identifiable 0 model −1 −1 0 1 2 3 4 µ1 Christian P. Robert Multimodality and label switching: a discussion
  • 6. Does MCMC converge? Postprocessing MCMC output Monte Carlo perspective When, given a target π, an MCMC sampler never visits more than 50% of the support of π, it can be argued that the sampler does not converge! Two-component normal mixture and Gibbs sampler 4 Case when both means µi 3 are the only unknowns, 2 with different weights and µ2 1 same variance: identifiable 0 model −1 −1 0 1 2 3 4 µ1 Christian P. Robert Multimodality and label switching: a discussion
  • 7. Does MCMC converge? Postprocessing MCMC output Monte Carlo perspective When, given a target π, an MCMC sampler never visits more than 50% of the support of π, it can be argued that the sampler does not converge! Two-component normal mixture and Gibbs sampler 4 Case when both means µi 3 are the only unknowns, 2 with different weights and µ2 1 same variance: identifiable 0 model −1 −1 0 1 2 3 4 µ1 (C.) Simple MCMC does not work Christian P. Robert Multimodality and label switching: a discussion
  • 8. Does MCMC converge? Postprocessing MCMC output Imposed permutations may miss the mark While duplicating the MCMC sampler according to all permutations ρ in Sk produces perfect exchangeability [nice!], Christian P. Robert Multimodality and label switching: a discussion
  • 9. Does MCMC converge? Postprocessing MCMC output Imposed permutations may miss the mark While duplicating the MCMC sampler according to all permutations ρ in Sk produces perfect exchangeability [nice!], it does not bring additional energy to the MCMC sampler it does not identify other modes (under- or over-fitting) it does not apply in nearly-but-not exchangeable settings Christian P. Robert Multimodality and label switching: a discussion
  • 10. Does MCMC converge? Postprocessing MCMC output Illustrations Example (Two-mean Gaussian mixture) 4 3 2 Case of pN (µ1 , 1) + (1 − p)N (µ1 , 1) µ1 1 (p = 0.5) 0 −1 4 3 2 1 0 −1 µ2 Christian P. Robert Multimodality and label switching: a discussion
  • 11. Does MCMC converge? Postprocessing MCMC output Illustrations (2) Example (Two-mean Gaussian mixture and outliers) Same model, but data from 5-component mixture Christian P. Robert Multimodality and label switching: a discussion
  • 12. Does MCMC converge? Postprocessing MCMC output Illustrations (3) Example (Outlier Gaussian mixture) Case of pN (0, 1) + (1 − p)N (µ, σ 2 ) with p known Christian P. Robert Multimodality and label switching: a discussion
  • 13. Does MCMC converge? Postprocessing MCMC output Illustrations (3) Example (Outlier Gaussian mixture) Case of pN (0, 1) + (1 − p)N (µ, σ 2 ) with p known Christian P. Robert Multimodality and label switching: a discussion
  • 14. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Postprocessing issues When assessing the number k of components via the evidence Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood, Christian P. Robert Multimodality and label switching: a discussion
  • 15. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Postprocessing issues When assessing the number k of components via the evidence Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood, label switching is a liability and a uninteresting phenomenon Christian P. Robert Multimodality and label switching: a discussion
  • 16. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Postprocessing issues When assessing the number k of components via the evidence Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood, label switching is a liability and a uninteresting phenomenon Indeed, πk (θk )Lk (θk ) dθk , = k! πk (θk )Lk (θk ) dθk Θk Θk /Sk means that integrating over the restricted space is [more than] ok! Christian P. Robert Multimodality and label switching: a discussion
  • 17. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Christian P. Robert Multimodality and label switching: a discussion
  • 18. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . πk (θk |x) ˆ ∗ Christian P. Robert Multimodality and label switching: a discussion
  • 19. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables Christian P. Robert Multimodality and label switching: a discussion
  • 20. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables But convergence impaired by lack of label switching Christian P. Robert Multimodality and label switching: a discussion
  • 21. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables But convergence impaired by lack of label switching (C.) Simple MCMC does not work Christian P. Robert Multimodality and label switching: a discussion
  • 22. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (ρ(θk )|x) = πk (ρ(θk )|x) k! ρ∈S for all ρ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Christian P. Robert Multimodality and label switching: a discussion
  • 23. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (ρ(θk )|x) = πk (ρ(θk )|x) k! ρ∈S for all ρ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (ρ(θk )|x, zk ) . T k! ρ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003] Christian P. Robert Multimodality and label switching: a discussion
  • 24. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 = −105.1396 + log(3!) ˆ Christian P. Robert Multimodality and label switching: a discussion
  • 25. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 = −105.1396 + log(3!) ˆ k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008] Christian P. Robert Multimodality and label switching: a discussion
  • 26. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Comparison between evidence approximations 1 Nested sampling: M = 1000 points, with 10 random walk moves at each step, simulations from the constr’d prior and a stopping rule at 95% of the observed maximum likelihood 2 T = 104 MCMC (=Gibbs) simulations producing non-parametric estimates ϕ 3 Monte Carlo estimates Z1 , Z2 , Z3 using product of two Gaussian kernels 4 numerical integration based on 850 × 950 grid [reference value, confirmed by Chib’s] Christian P. Robert Multimodality and label switching: a discussion
  • 27. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Comparison (cont’d) Graph based on a sample of 10 observations for µ = 2 and σ = 3/2 (150 replicas) V1=Nested sampling, V2=importance sampling, V3=harmonic mean, V4=bridge sampling. [Chopin & Robert, 2010] Christian P. Robert Multimodality and label switching: a discussion
  • 28. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Comparison (cont’d) Graph based on a sample of 50 observations for µ = 2 and σ = 3/2 (150 replicas) V1=Nested sampling, V2=importance sampling, V3=harmonic mean, V4=bridge sampling. [Chopin & Robert, 2010] Christian P. Robert Multimodality and label switching: a discussion
  • 29. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Comparison (cont’d) Graph based on a sample of 100 observations for µ = 2 and σ = 3/2 (150 replicas) V1=Nested sampling, V2=importance sampling, V3=harmonic mean, V4=bridge sampling. [Chopin & Robert, 2010] Christian P. Robert Multimodality and label switching: a discussion
  • 30. Does MCMC converge? Chib’s solution Postprocessing MCMC output Nested sampling Comparison (cont’d) [Lee, Marin, Mengersen & Robert, 2010] Christian P. Robert Multimodality and label switching: a discussion