BAYSM'14, Wien, Austria

1. My life as a mixture Christian P. Robert Universite Paris-Dauphine, Paris University of Warwick, Coventry September 17, 2014 bayesianstatistics@gmail.com

2. Your next Valencia meeting: I Objective Bayes section of ISBA major meeting: I O-Bayes 2015 in Valencia, Spain, June 1-4(+1), 2015 I in memory of our friend Susie Bayarri I objective Bayes, limited information, partly de

3. ned and approximate models, tc I all avours of Bayesian analysis welcomed! I Spain in June, what else...?!

4. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor

5. birthdate: May 1989, Ottawa Civic Hospital Repartition of grey levels in an unprocessed chest radiograph [X, 1994]

6. Mixture models Structure of mixtures of distributions: x fj with probability pj , for j = 1, 2, . . . , k, with overall density p1f1(x) + + pk fk (x) . Usual case: parameterised components Xk i=1 pi f (xji ) Xn i=1 pi = 1 where weights pi 's are distinguished from other parameters

7. Mixture models Structure of mixtures of distributions: x fj with probability pj , for j = 1, 2, . . . , k, with overall density p1f1(x) + + pk fk (x) . Usual case: parameterised components Xk i=1 pi f (xji ) Xn i=1 pi = 1 where weights pi 's are distinguished from other parameters

8. Motivations I Dataset made of several latent/missing/unobserved groups/strata/subpopulations. Mixture structure due to the missing origin/allocation of each observation to a speci

9. c subpopulation/stratum. Inference on either the allocations (clustering) or on the parameters (i , pi ) or on the number of groups I Semiparametric perspective where mixtures are functional basis approximations of unknown distributions

10. Motivations I Dataset made of several latent/missing/unobserved groups/strata/subpopulations. Mixture structure due to the missing origin/allocation of each observation to a speci

11. c subpopulation/stratum. Inference on either the allocations (clustering) or on the parameters (i , pi ) or on the number of groups I Semiparametric perspective where mixtures are functional basis approximations of unknown distributions

12. License Dataset derived from [my] license plate image Grey levels concentrated on 256 values [later jittered] [Marin X, 2007]

13. Likelihood For a sample of independent random variables (x1, , xn), likelihood Yn i=1 fp1f1(xi ) + + pk fk (xi )g . Expanding this product involves kn elementary terms: prohibitive to compute in large samples. But likelihood still computable [pointwise] in O(kn) time.

14. Likelihood For a sample of independent random variables (x1, , xn), likelihood Yn i=1 fp1f1(xi ) + + pk fk (xi )g . Expanding this product involves kn elementary terms: prohibitive to compute in large samples. But likelihood still computable [pointwise] in O(kn) time.

15. Normal mean benchmark Normal mixture p N(1, 1) + (1 - p)N(2, 1) with only means unknown (2-D representation possible) Identi

16. ability Parameters 1 and 2 identi

17. able: 1 cannot be confused with 2 when p is dierent from 0.5. Presence of a spurious mode, understood by letting p go to 0.5

18. Normal mean benchmark Normal mixture p N(1, 1) + (1 - p)N(2, 1) with only means unknown (2-D representation possible) Identi

19. ability Parameters 1 and 2 identi

20. able: 1 cannot be confused with 2 when p is dierent from 0.5. Presence of a spurious mode, understood by letting p go to 0.5

21. Bayesian inference on mixtures For any prior (, p), posterior distribution of (, p) available up to a multiplicative constant (, pjx) / 2 4 Yn i=1 Xk j=1 pj f (xi jj ) 3 5 (, p) at a cost of order O(kn) B Diculty Despite this, derivation of posterior characteristics like posterior expectations only possible in an exponential time of order O(kn)!

22. Bayesian inference on mixtures For any prior (, p), posterior distribution of (, p) available up to a multiplicative constant (, pjx) / 2 4 Yn i=1 Xk j=1 pj f (xi jj ) 3 5 (, p) at a cost of order O(kn) B Diculty Despite this, derivation of posterior characteristics like posterior expectations only possible in an exponential time of order O(kn)!

23. Missing variable representation Associate to each xi a missing/latent variable zi that indicates its component: zi jp Mk (p1, . . . , pk ) and xi jzi , f (jzi ) . Completed likelihood `(, pjx, z) = Yn i=1 pzi f (xi jzi ) , and (, pjx, z) / Yn i=1 pzi f (xi jzi ) # (, p) where z = (z1, . . . , zn)

24. Missing variable representation Associate to each xi a missing/latent variable zi that indicates its component: zi jp Mk (p1, . . . , pk ) and xi jzi , f (jzi ) . Completed likelihood `(, pjx, z) = Yn i=1 pzi f (xi jzi ) , and (, pjx, z) / Yn i=1 pzi f (xi jzi ) # (, p) where z = (z1, . . . , zn)

25. Gibbs sampling for mixture models Take advantage of the missing data structure: Algorithm I Initialization: choose p(0) and (0) arbitrarily I Step t. For t = 1, . . . 1. Generate z(t) i (i = 1, . . . , n) from (j = 1, . . . , k) P z(t) i = j jp(t-1) j , (t-1) j , xi / p(t-1) j f xi j(t-1) j 2. Generate p(t) from (pjz(t)), 3. Generate (t) from (jz(t), x). [Brooks Gelman, 1990; Diebolt X, 1990, 1994; Escobar West, 1991]

26. Normal mean example (cont'd) Algorithm I Initialization. Choose (0) 1 and (0) 2 , I Step t. For t = 1, . . . 1. Generate z(t) i (i = 1, . . . , n) from P z(t) i = 1 = 1-P z(t) i = 2 / p exp - 1 2 xi - (t-1) 1 2 2. Compute n(t) j = Xn i=1 i =j and (sx Iz(t) j )(t) = Xn i=1 Iz(t) i =jxi 3. Generate (t) j (j = 1, 2) from N + (sx j )(t) + n(t) j , 1 + n(t) j ! .

27. Normal mean example (cont'd) −1 0 1 2 3 4 −1 0 1 2 m1 3 4 m2 (a) initialised at random [X Casella, 2009]

28. Normal mean example (cont'd) −1 0 1 2 3 4 −1 0 1 2 m1 3 4 m2 (a) initialised at random −2 0 2 4 −2 0 2 4 m1 m2 (b) initialised close to the lower mode [X Casella, 2009]

29. License Consider k = 3 components, a D3(1=2, 1=2, 1=2) prior for the weights, a N(x, ^2=3) prior on the means i and a Ga(10, ^2) prior on the precisions -2 i , where x and ^2 are the empirical mean and variance of License [Empirical Bayes] [Marin X, 2007]

31. weakly informative priors I possible symmetric empirical Bayes priors p D( , . . . , ), i N(^, ^ !2i ), -2 i Ga(, ^) which can be replaced with hierarchical priors [Diebolt X, 1990; Richardson Green, 1997] I independent improper priors on j 's prohibited, thus standard at and Jereys priors impossible to use (except with the exclude-empty-component trick) [Diebolt X, 1990; Wasserman, 1999]

32. weakly informative priors I Reparameterization to compact set for use of uniform priors i ! ei 1 + ei , i ! i 1 + i [Chopin, 2000] I dependent weakly informative priors p D(k, . . . , 1), i N(i-1, 2i -1), i U([0, i-1]) [Mengersen X, 1996; X Titterington, 1998] I reference priors p D(1, . . . , 1), i N(0, (2i + 20 )=2), 2i C+(0, 20 ) [Moreno Liseo, 1999]

33. Re-ban on improper priors Dicult to use improper priors in the setting of mixtures because independent improper priors, () = Yk i=1 i (i ) , with Z i (i )di = 1 end up, for all n's, with the property Z (, pjx)ddp = 1 Reason There are (k - 1)n terms among the kn terms in the expansion that allocate no observation at all to the i-th component

34. Re-ban on improper priors Dicult to use improper priors in the setting of mixtures because independent improper priors, () = Yk i=1 i (i ) , with Z i (i )di = 1 end up, for all n's, with the property Z (, pjx)ddp = 1 Reason There are (k - 1)n terms among the kn terms in the expansion that allocate no observation at all to the i-th component

36. Connected diculties 1. Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2. Under exchangeable priors on (, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of 1 equal to posterior expectation of 2

37. Connected diculties 1. Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2. Under exchangeable priors on (, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of 1 equal to posterior expectation of 2

38. License When Gibbs output does not (re)produce exchangeability, Gibbs sampler has failed to explored the whole parameter space: not enough energy to switch simultaneously enough component allocations at once [Marin X, 2007]

39. Label switching paradox I We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. I If we observe it, then we do not know how to estimate the parameters. I If we do not, then we are uncertain about the convergence!!! [Celeux, Hurn X, 2000] [Fruhwirth-Schnatter, 2001, 2004] [Holmes, Jasra Stephens, 2005]

42. Constraints Usual reply to lack of identi

43. ability: impose constraints like 1 6 . . . 6 k in the prior Mostly incompatible with the topology of the posterior surface: posterior expectations then depend on the choice of the constraints. Computational detail The constraint need be imposed during the simulation but can instead be imposed after simulation, reordering MCMC output according to constraints. [This avoids possible negative eects on convergence]

48. Relabeling towards the mode Selection of one of the k! modal regions of the posterior, post-simulation, by computing the approximate MAP (, p)(i) with i = arg max i=1,...,M (, p)(i)jx Pivotal Reordering At iteration i 2 f1, . . . ,Mg, 1. Compute the optimal permutation i = arg min 2Sk d ((i), p(i)), ((i), p(i)) where d(, ) distance in the parameter space. 2. Set ((i), p(i)) = i (((i), p(i))). [Celeux, 1998; Stephens, 2000; Celeux, Hurn X, 2000]

49. Relabeling towards the mode Selection of one of the k! modal regions of the posterior, post-simulation, by computing the approximate MAP (, p)(i) with i = arg max i=1,...,M (, p)(i)jx Pivotal Reordering At iteration i 2 f1, . . . ,Mg, 1. Compute the optimal permutation i = arg min 2Sk d ((i), p(i)), ((i), p(i)) where d(, ) distance in the parameter space. 2. Set ((i), p(i)) = i (((i), p(i))). [Celeux, 1998; Stephens, 2000; Celeux, Hurn X, 2000]

50. Loss functions for mixture estimation Global loss function that considers distance between predictives L(,^ ) = Z X f(x) log f(x)=f^(x) dx eliminates the labelling eect Similar solution for estimating clusters through allocation variables L(z, ^z) = X ij I[zi=zj ](1 - I[^zi=^zj ]) + I[^zi=^zj ](1 - I[zi=zj ]) . [Celeux, Hurn X, 2000]

52. MAP estimation For high-dimensional parameter space, diculty with marginal MAP (MMAP) estimates because nuisance parameters must be integrated out MMAP 1 = arg1 max p (1jy) where p (1jy) = Z 2 p (1, 2jy) d2

53. MAP estimation SAME stands for State Augmentation for Marginal Estimation [Doucet, Godsill X, 2001] Arti

54. cially augmented probability model whose marginal distribution is p (1jy) / p (1jy) via replications of the nuisance parameters: I Replace 2 with arti

55. cial replications, 2 (1) , . . . , 2 ( )

57. cially augmented probability model whose marginal distribution is p (1jy) / p (1jy) via replications of the nuisance parameters: I Treat the 2 (j)'s as distinct random variables: q (1, 2 (1) , . . . , 2 ( )jy) / Y k=1 p (1, 2 (k)jy)

59. cially augmented probability model whose marginal distribution is p (1jy) / p (1jy) via replications of the nuisance parameters: I Use corresponding marginal for 1 q (1jy) = Z q (1, 2 (1) , . . . , 2 ( )jy) d2 (1) . . . d2 ( ) / ZY k=1 p (1, 2 (k)jy) d2 (1) . . . d2 ( ) = p (1jy)

61. cially augmented probability model whose marginal distribution is p (1jy) / p (1jy) via replications of the nuisance parameters: I Build a MCMC algorithm in the augmented space, with invariant distribution q (1, 2 (1) , . . . , 2 ( )jy)

63. cially augmented probability model whose marginal distribution is p (1jy) / p (1jy) via replications of the nuisance parameters: I Use simulated subsequence (i) 1 ; i 2 N as drawn from marginal posterior p (1jy)

64. example: Galaxy dataset benchmark 82 observations of galaxy velocities from 3 (?) groups Algorithm EM MCEM SAME Mean log-posterior 65.47 60.73 66.22 Std dev of 2.31 4.48 0.02 log-posterior [Doucet X, 2002]

65. Really the SAME?! SAME algorithm re-invented in many guises: I Gaetan Yao, 2003, Biometrika I Jacquier, Johannes Polson, 2007, J. Econometrics I Lele, Dennis Lutscher, 2007, Ecology Letters [data cloning] I ...

67. Propp and Wilson's perfect sampler Diculty devising MCMC stopping rules: when should one stop an MCMC algorithm?! Principle: Coupling from the past rather than start at t = 0 and wait till t = +1, start at t = -1 and wait till t = 0 [Propp Wilson, 1996] c Outcome at time t = 0 is stationnary

68. Propp and Wilson's perfect sampler Diculty devising MCMC stopping rules: when should one stop an MCMC algorithm?! Principle: Coupling from the past rather than start at t = 0 and wait till t = +1, start at t = -1 and wait till t = 0 [Propp Wilson, 1996] c Outcome at time t = 0 is stationnary

69. CFTP Algorithm Algorithm (Coupling from the past) 1. Start from the m possible values at time -t 2. Run the m chains till time 0 (coupling allowed) 3. Check if the chains are equal at time 0 4. If not, start further back: t 2 t, using the same random numbers at time already simulated I requires a

70. nite state space I probability of merging chains must be high enough I hard to implement w/o a monotonicity in both state space and transition

71. CFTP Algorithm Algorithm (Coupling from the past) 1. Start from the m possible values at time -t 2. Run the m chains till time 0 (coupling allowed) 3. Check if the chains are equal at time 0 4. If not, start further back: t 2 t, using the same random numbers at time already simulated I requires a

72. nite state space I probability of merging chains must be high enough I hard to implement w/o a monotonicity in both state space and transition

73. Mixture models Simplest possible mixture structure pf0(x) + (1 - p)f1(x), with uniform prior on p. Algorithm (Data Augmentation Gibbs sampler) At iteration t: 1. Generate n iid U(0, 1) rv's u(t) 1 , . . . , u(t) n . 2. Derive the indicator variables z(t) i as z(t) i = 0 i u(t) i 6 q(t-1) i = p(t-1)f0(xi ) p(t-1)f0(xi ) + (1 - p(t-1))f1(xi ) and compute m(t) = Xn i=1 z(t) i . 3. Simulate p(t) Be(n + 1 - m(t), 1 + m(t)).

74. Mixture models Algorithm (CFTP Gibbs sampler) At iteration -t: 1. Generate n iid uniform rv's u(-t) 1 , . . . , u(-t) n . 2. Partition [0, 1) into intervals [q[j ], q[j+1]). 3. For each [q(-t) [j ] , q(-t) [j+1]), generate p(-t) j Be(n - j + 1, j + 1). 4. For each j = 0, 1, . . . , n, r(-t) j p(-t) j 5. For (` = 1, ` T, ` + +) r(-t+`) j p(-t+`) k with k such that r(-t+`-1) j 2 [q(-t+`) [k] , q(-t+`) [k+1] ] 6. Stop if the r (0) j 's (0 6 j 6 n) are all equal. Otherwise, t 2 t. [Hobert et al., 1999]

75. Mixture models Extension to the case k = 3: Sample of n = 35 observations from .23N(2.2, 1.44) + .62N(1.4, 0.49) + .15N(0.6, 0.64) [Hobert et al., 1999]

77. Bayesian model choice Comparison of models Mi by Bayesian means: probabilise the entire model/parameter space I allocate probabilities pi to all models Mi I de

78. ne priors i (i ) for each parameter space i I compute (Mi jx) = pi Z i fi (xji )i (i )di X j pj Z j fj (xjj )j (j )dj

79. Bayesian model choice Comparison of models Mi by Bayesian means: Relies on a central notion: the evidence Zk = Z k k (k )Lk (k ) dk , aka the marginal likelihood.

80. Chib's representation Direct application of Bayes' theorem: given x fk (xjk ) and k k (k ), Zk = mk (x) = fk (xjk ) k (k ) k (k jx) Replace with an approximation to the posterior bZ k = cmk (x) = fk (xjk ) k (k ) ^ k (k jx) . [Chib, 1995]

81. Chib's representation Direct application of Bayes' theorem: given x fk (xjk ) and k k (k ), Zk = mk (x) = fk (xjk ) k (k ) k (k jx) Replace with an approximation to the posterior bZ k = cmk (x) = fk (xjk ) k (k ) ^ k (k jx) . [Chib, 1995]

82. Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate ck (k jx) = 1 T XT t=1 k (k jx, z(t) k ) , where the z(t) k 's are Gibbs sampled latent variables [Diebolt Robert, 1990; Chib, 1995]

83. Compensation for label switching For mixture models, z(t) k usually fails to visit all con

84. gurations in a balanced way, despite the symmetry predicted by the theory Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using fk (k jx) = 1 T k! X 2Sk XT t=1 k ((k )jx, z(t) k ) . for all 's in Sk , set of all permutations of f1, . . . , kg [Berkhof, Mechelen, Gelman, 2003]

85. Compensation for label switching For mixture models, z(t) k usually fails to visit all con

86. gurations in a balanced way, despite the symmetry predicted by the theory Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using fk (k jx) = 1 T k! X 2Sk XT t=1 k ((k )jx, z(t) k ) . for all 's in Sk , set of all permutations of f1, . . . , kg [Berkhof, Mechelen, Gelman, 2003]

87. Galaxy dataset (k) Using Chib's estimate, with k as MAP estimator, log(^Z k (x)) = -105.1396 for k = 3, while introducing permutations leads to log(^Z k (x)) = -103.3479 Note that -105.1396 + log(3!) = -103.3479 k 2 3 4 5 6 7 8 Zk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib's approximation (based on 105 Gibbs iterations and, for k 5, 100 permutations selected at random in Sk ). [Lee et al., 2008]

90. More ecient sampling Diculty with the explosive numbers of terms in fk (k jx) = 1 T k! X 2Sk XT t=1 k ((k )jx, z(t) k ) . when most terms are equal to zero... Iterative bridge sampling: bE (t)(k) = bE (t-1)(k)M-1 1 XM1 l=1 ^(~l jx) M1q(~l ) + M2 ^(~l jx) . M-1 2 XM2 m=1 q(^m) M1q(^m) + M2 ^(^mjx) [Fruhwirth-Schnatter, 2004]

91. More ecient sampling Iterative bridge sampling: bE (t)(k) = bE (t-1)(k)M-1 1 XM1 l=1 ^(~l jx) M1q(~l ) + M2 ^(~l jx) . M-1 2 XM2 m=1 q(^m) M1q(^m) + M2 ^(^mjx) [Fruhwirth-Schnatter, 2004] where q() = 1 J1 XJ1 j=1 p(jz(j)) Yk i=1 p(i j(j) ij , (j-1) ij , z(j), x)

92. More ecient sampling Iterative bridge sampling: bE (t)(k) = bE (t-1)(k)M-1 1 XM1 l=1 ^(~l jx) M1q(~l ) + M2 ^(~l jx) . M-1 2 XM2 m=1 q(^m) M1q(^m) + M2 ^(^mjx) [Fruhwirth-Schnatter, 2004] or where q() = 1 k! X 2S(k) p(j(zo)) Yk i=1 p(i j(oi j ), (oi j ), (zo), x)

93. Further eciency After de-switching (un-switching?), representation of importance function as q() = 1 Jk! XJ j=1 X 2Sk (j('(j)), x) = 1 k! X 2Sk h() where h associated with particular mode of q Assuming generations ((1), . . . , (T)) hc () how many of the h((t)) are non-zero?

94. Sparsity for the sum Contribution of each term relative to q() () = h() k!q() = P hi () 2Sk h() and importance of permutation evaluated by b Ehc [i ()] = 1 M XM l=1 i ((l)) , (l) hc () Approximate set A(k) S(k) consist of [1, , n] for the smallest n that satis

95. es the condition ^ n = 1 M XM l=1

98. ~qn((l)) - q((l))

101. dual importance sampling with approximation DIS2A 1 Randomly select fz(j), (j)gJj =1 from Gibbs sample and un-switch Construct q() 2 Choose hc () and generate particles f(t)gTt =1 hc () 3 Construction of approximation ~q() using

102. rst M-sample 3.1 Compute bE hc [1 ()], ,bE hc [k! ()] 3.2 Reorder the 's such that bE hc [1 ()] bE hc [k! ()]. 3.3 Initially set n = 1 and compute ~qn((t))'s and b n. If b n=1 , go to Step 4. Otherwise increase n = n + 1 4 Replace q((1)), . . . , q((T)) with ~q((1)), . . . , ~q((T)) to estimate bE [Lee X, 2014]

103. illustrations k k! jA(k)j (A) 3 6 1.0000 0.1675 4 24 2.7333 0.1148 Fishery data k k! jA(k)j (A) 3 6 1.000 0.1675 4 24 15.7000 0.6545 6 720 298.1200 0.4146 Galaxy data Table : Mean estimates of approximate set sizes, jA(k)j, and the reduction rate of a number of evaluated h-terms (A) for (a)

104. shery and (b) galaxy datasets

106. Jereys priors for mixtures [teaser] True Jereys prior for mixtures of distributions de

107. ned as

109. E rTrlog f (Xj)

111. I O(k) matrix I unavailable in closed form except special cases I unidimensional integrals approximated by Monte Carlo tools [Grazian [talk tomorrow] et al., 2014+]

112. Diculties I complexity grows in O(k2) I signi

113. cant computing requirement (reduced by delayed acceptance) [Banterle et al., 2014] I dier from component-wise Jereys [Diebolt X, 1990; Stoneking, 2014] I when is the posterior proper? I how to check properness via MCMC outputs?

115. Diculties with Bayes factors I delicate calibration towards supporting a given hypothesis or model I long-lasting impact of prior modelling, despite overall consistency I discontinuity in the use of improper priors in most settings I binary outcome more suited for immediate decision than for model evaluation I related impossibility to ascertain mis

116. t or outliers I missing assesment of uncertainty associated with the decision I dicult computation of marginal likelihoods in most settings

117. Reformulation I Representation of the test problem as a two-component mixture estimation problem where the weights are formally equal to 0 or 1 I Mixture model thus contains both models under comparison as extreme cases I Inspired by consistency result of Rousseau and Mengersen (2011) on over

118. tting mixtures I Use of posteror distribution of the weight of a model instead of single-digit posterior probability [Kamari [see poster] et al., 2014+]

119. Construction of Bayes tests Given two statistical models, M1 : x f1(xj1) , 1 2 1 and M2 : x f2(xj2 , 2 2 2 , ) embed both models within an encompassing mixture model M : x f1(xj1) + (1 - )f2(xj2) , 0 6 6 1 . (1) Both models as special cases of the mixture model, one for = 1 and the other for = 0 c Test as inference on

120. Arguments I substituting estimate of the weight for posterior probability of model M1 produces an equally convergent indicator of which model is true while removing the need of often arti

121. cial prior probabilities on model indices I interpretation at least as natural as for the posterior probability, while avoiding the zero-one loss setting I highly problematic computation of marginal likelihoods bypassed by standard algorithms for mixture estimation I straightforward extension to collection of models allows to consider all models at once I posterior on evaluates thoroughly strength of support for a given model, compared with single digit Bayes factor I mixture model acknowledges possibility that both models [or none] could be acceptable

122. Arguments I standard prior modelling can be reproduced here but improper priors now acceptable, when both models reparameterised towards common-meaning parameters, e.g. location and scale I using same parameters on both components is essential: opposition between components is not an issue with dierent parameter values I parameters of components, 1 and 2, integrated out by Monte Carlo I contrary to common testing settings, data signal lack of agreement with either model when posterior on away from both 0 and 1 I in most settings, approach easily calibrated by parametric boostrap providing posterior of under each model and prior predictive error

123. Toy examples (1) Test of a Poisson P() versus a geometric Geo(p) [as a number of failures, starting at zero] Same parameter used in Poisson P() and geometric Geo(p) with p = 1=1+ Improper noninformative prior () = 1= is valid Posterior on conditional to allocation vector ( j x, ) / exp(-n1())( Pni =1 xi-1)( + 1)-(n2+s2()) and Be(n1 + a0, n2 + a0)

124. Toy examples (1) .1 .2 .3 .4 .5 1 0.00 0.02 0.04 0.06 0.08 lBF= −509684 Posterior of Poisson weight when a0 = .1, .2, .3, .4, .5, 1 and sample of geometric G(0.5) 105 observations

125. Toy examples (2) Normal N(, 1) model versus double-exponential L(, p p 2) [Scale 2 is intentionaly chosen to make both distributions share the same variance] Location parameter can be shared by both models with a single at prior (). Beta distributions B(a0, a0) are compared wrt their hyperparameter a0

126. Toy examples (2) Posterior of double-exponential weight for L(0, p 2) data, with 5, . . . , 103 observations and 105 Gibbs iterations

127. Toy examples (2) Posterior of Normal weight for N(0, .72) data with 103 observations and 104 Gibbs iterations

128. Toy examples (2) Posterior of Normal weight for N(0, 1) data with 103 observations and 104 Gibbs iterations

129. Toy examples (2) Posterior of normal weight for double-exponential data, with 103 observations and 104 Gibbs iterations

130. Danke schon! Enjoy BAYSM 2014!

BAYSM'14, Wien, Austria

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a BAYSM'14, Wien, Austria

Similar a BAYSM'14, Wien, Austria (20)

Más de Christian Robert

Más de Christian Robert (19)

Último

Último (20)

BAYSM'14, Wien, Austria