SlideShare una empresa de Scribd logo
1 de 46
Descargar para leer sin conexión
Computing Bayesian posterior with empirical 
likelihood in population genetics 
Pierre Pudlo 
INRA & U. Montpellier 2 
SSB, 18 sept. 2012 
Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 1 / 29
Table of contents 
1 Likelihood free methods 
2 Empirical likelihood 
3 Population genetics 
4 Numerical experiments 
Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 2 / 29
Table of contents 
1 Likelihood free methods 
2 Empirical likelihood 
3 Population genetics 
4 Numerical experiments 
Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 3 / 29
When the likelihood is not completely known 
Likelihood intractable: `(yj) = 
Z 
U 
`(y,uj)du 
where u is a latent process 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 4 / 29
When the likelihood is not completely known 
Likelihood intractable: `(yj) = 
Z 
U 
`(y,uj)du 
where u is a latent process 
I Hidden Markov and other dynamic models: latent process 
which is not observed 
,! Classical answer: Markov chain Monte Carlo,. . . 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 4 / 29
When the likelihood is not completely known 
Likelihood intractable: `(yj) = 
Z 
U 
`(y,uj)du 
where u is a latent process 
I Hidden Markov and other dynamic models: latent process 
which is not observed 
,! Classical answer: Markov chain Monte Carlo,. . . 
I Population genetics: the whole gene genealogy is unobserved 
Likelihood is an integral over 
I all possible gene genealogies 
I all possible mutations along the genealogies 
,! Classical answer: Approximate Bayesian computation (ABC) 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 4 / 29
ABC in a nutshell 
Posterior distribution is the conditional distribution of 
()`(xj) () 
knowing that x = xobs 
Methodology 
Draw a (large) set of particles (i, xi) from () and use a 
nonparametric estimate of the conditional density 
(jxobs) / ()`(xobsj) 
Seminal papers 
I Tavar ´ e, Balding, Griffith and Donnelly (1997, Genetics) 
I Pritchard, Seielstad, Perez-Lezuan, Feldman (1999, Molecular 
Biology and Evolution) 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 5 / 29
ABC in a nutshell 
Posterior distribution is the conditional distribution of 
()`(xj) () 
knowing that x = xobs 
Methodology 
Draw a (large) set of particles (i, xi) from () and use a 
nonparametric estimate of the conditional density 
(jxobs) / ()`(xobsj) 
Shortcomings. 
I time consuming – If simulation of the latent process is not 
straightforward 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 5 / 29
ABC in a nutshell 
Posterior distribution is the conditional distribution of 
()`(xj) () 
knowing that x = xobs 
Methodology 
Draw a (large) set of particles (i, xi) from () and use a 
nonparametric estimate of the conditional density 
(jxobs) / ()`(xobsj) 
Shortcomings. 
I time consuming – If simulation of the latent process is not 
straightforward 
I curse of dimensionality vs. lost of information – 
I If x lies in a high dimensional space X (often), we are unable to 
estimate of the conditional density 
I Hence, we project the (observed and simulated) datasets on a 
space with smaller dimension (trough summary statistics) 
 : X ! Rd (summary statistics) 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 5 / 29
You went to school to learn, girl (. . . ) 
Why 2 plus 2 makes four 
Now, now, now, I’m gonna teach you (. . . ) 
All you gotta do is repeat after me! 
A, B, C! 
It’s easy as 1, 2, 3! 
Or simple as Do, re, mi! (. . . ) 
Surveys 
I Beaumont (2010, Annual Review of Ecology, Evolution, and 
Systematics) 
I Lopes, Beaumont (2010, Genetics and Evolution) 
I Csill ` ery, Blum, Gaggiotti, Franc¸ois (2010, Trends in Ecology and 
Evolution) 
I Marin, Pudlo, Robert, Ryder (to appear, Statis. and Comput.) 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 6 / 29
Table of contents 
1 Likelihood free methods 
2 Empirical likelihood 
3 Population genetics 
4 Numerical experiments 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 7 / 29
Empirical likelihood (EL) 
Owen (1988, Biometrika), 
Owen (2001, Chapman  
Hall) 
Data 
Empirical 
Likelihood 
Parameters' 
definition in 
term of a 
moment 
constraint 
x = (x1, ..., xK) 
Profiled likelihood 
x 
xi 
Black box 
Input: 
I Independent data 
x = (x1, . . . , xK) 
I Definition of the parameters 
E(h(xi,)) = 0 
Output: 
I A profiled “likelihood function” 
Lel(jx) 
EL do not require a fully defined and 
often complex (hence debatable) 
parametric model 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 8 / 29
Empirical likelihood (EL) 
Owen (1988, Biometrika), Owen (2001, Chapman  Hall) 
Assume that the dataset x is composed of n independent replicates 
x = (x1, . . . , xn) of some X  F 
Generalized moment condition model 
The law F of X satisfy 
EF 
 
h(X,) 
 
= 0, 
where h is a known function, and  an unknown parameter 
Empirical likelihood 
Lel(jx) = max 
p 
Yn 
i=1 
pi () 
for all p such that 0 6 pi 6 1, 
P 
pi = 1, 
P 
i pih(xi,) = 0. 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 9 / 29
Empirical likelihood (EL) 
Empirical likelihood 
Lel(jx) = max 
p 
Yn 
i=1 
pi () 
for all p such that 0 6 pi 6 1, 
P 
pi = 1, 
P 
i pih(xi,) = 0. 
Why ()? 
(1) L(p) := 
Yn 
i=1 
pi is the likehood of 
discrete distribution p = 
P 
i pixi 
(2) 
X 
i 
pih(xi,) = 0 implies that p 
fulfils the moment condition 
How to compute? 
Newton-Lagrange scheme. 
The constraint is linear (in p) 
See Owen (2001) chap. 12 
 package emplik in R 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 10 / 29
EL examples 
Data 
Empirical 
Likelihood 
Parameters' 
definition in 
term of a 
moment 
constraint 
x = (x1, ..., xK) 
Profiled likelihood 
x 
xi 
Simple case:  = mean 
h(xi,) = xi -  
I Lel(jx) is maximal at 
^ 
= 1 
K(x1 +    + xK) 
I if enough data (K large), 
 if data from a true parametric 
model, 
then Lel  true likelihood 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 11 / 29
EL examples 
Data 
Empirical 
Likelihood 
Parameters' 
definition in 
term of a 
moment 
constraint 
x = (x1, ..., xK) 
Profiled likelihood 
x 
xi 
h(xi,) = xi -  
red = param. likelihood 
black = Lel 
Example 1: Gaussian model 
K = 20 
1.0 1.5 2.0 2.5 3.0 
0.0 0.2 0.4 0.6 0.8 1.0 
theta 
el.normal 
K = 200 
1.0 1.5 2.0 2.5 3.0 
0.0 0.2 0.4 0.6 0.8 1.0 
theta 
el.normal 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 12 / 29
EL examples 
Data 
Empirical 
Likelihood 
Parameters' 
definition in 
term of a 
moment 
constraint 
x = (x1, ..., xK) 
Profiled likelihood 
x 
xi 
red = true likelihood 
blue = Gaussian likelihood 
black = Lel 
Example 2: non Gaussian model 
K = 200 
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 
0.0 0.2 0.4 0.6 0.8 1.0 
theta 
el.normal 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 13 / 29
EL examples 
Data 
Empirical 
Likelihood 
Parameters' 
definition in 
term of a 
moment 
constraint 
x = (x1, ..., xK) 
Profiled likelihood 
x 
xi 
red = true likelihood 
blue = Gaussian likelihood 
black = Lel 
Example 2: non Gaussian model 
K = 200 
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 
0.0 0.2 0.4 0.6 0.8 1.0 
theta 
el.normal 
I Lel is better than a false model. 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 13 / 29
Table of contents 
1 Likelihood free methods 
2 Empirical likelihood 
3 Population genetics 
4 Numerical experiments 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 14 / 29
Neutral model at a given microsatellite locus, in a 
closed panmictic population at equilibrium 
Sample of 8 genes 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29
Neutral model at a given microsatellite locus, in a 
closed panmictic population at equilibrium 
Kingman’s genealogy 
When time axis is 
normalized, 
T(k)  Exp(k(k - 1)=2) 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29
Neutral model at a given microsatellite locus, in a 
closed panmictic population at equilibrium 
Kingman’s genealogy 
When time axis is 
normalized, 
T(k)  Exp(k(k - 1)=2) 
Mutations according to 
the Simple stepwise 
Mutation Model (SMM) 
 date of the mutations  
Poisson process with 
intensity =2 over the 
branches 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29
Neutral model at a given microsatellite locus, in a 
closed panmictic population at equilibrium 
Observations: leafs of the tree 
^ = ? 
Kingman’s genealogy 
When time axis is 
normalized, 
T(k)  Exp(k(k - 1)=2) 
Mutations according to 
the Simple stepwise 
Mutation Model (SMM) 
 date of the mutations  
Poisson process with 
intensity =2 over the 
branches 
 MRCA = 100 
 independent mutations: 
1 with pr. 1=2 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29
Much more interesting models. . . 
I several independent locus 
Independent gene genealogies and mutations 
I different populations 
linked by an evolutionary scenario made of divergences, 
admixtures, migrations between populations, etc. 
I larger sample size 
usually between 50 and 100 genes 
A typical evolutionary scenario: 
MRCA 
POP 0 POP 1 POP 2 
2 
1 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 16 / 29
Moment condition in population genetics? 
Main difficulty 
Derive a constraint 
EF 
 
h(X,) 
 
= 0, 
on the parameters of interest  when X is the genotypes of the sample 
of individuals at a given locus 
E.g., in phylogeography,  is composed of 
I dates of divergence between populations, 
I ratio of population sizes, 
I mutation rates, etc. 
None of them are moments of the distribution of the allelic states of the 
sample 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 17 / 29
Moment condition in population genetics? 
Main difficulty 
Derive a constraint 
EF 
 
h(X,) 
 
= 0, 
on the parameters of interest  when X is the genotypes of the sample 
of individuals at a given locus 
E.g., in phylogeography,  is composed of 
I dates of divergence between populations, 
I ratio of population sizes, 
I mutation rates, etc. 
None of them are moments of the distribution of the allelic states of the 
sample 
,! h = pairwise composite scores whose zero is the pairwise 
maximum likelihood estimator 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 17 / 29
Pairwise composite likelihood? 
The intra-locus pairwise likelihood 
`2(xkj) = 
Y 
ij 
k, xj 
`2(xi 
kj) 
with x1 
k, . . . , xnk 
: allelic states of the gene sample at the k-th locus 
Sample of size 2 
) coalescent tree is a single line joining xi 
k to xj 
k 
k, xj 
) closed form of `2(xi 
kj) 
The pairwise score function 
r log `2(xkj) = 
X 
ij 
k, xj 
r log `2(xi 
kj) 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 18 / 29
Composite lik , approximation of lik 
Composite likelihoods are often much more narrow than the 
distribution of the model (See, e.g., Cooley et al., 2012) 
Hence, 
I produces too narrow posterior distributions 
I confidence intervals or credibility intervals are not reliable 
But, 
E(r log `2(xj)) = 
Z 
r 
 
log `2(xj) 
 
`(xj)dx 
= 
X 
ij 
Z 
r 
 
log `2(xi, xjj) 
 
`2(xi, xjj)dxidxj 
= 
X 
ij 
Z 
r`2(xi, xjj) 
`2(xi, xjj) 
`2(xi, xjj)dxidxj 
= 
X 
ij 
r 
 Z 
`2(xi, xjj)dxidxj 
 
= 0 
SPaufdelo (wINRitAh UE. MLonbtpeellicera2)use we only AuBsCeel GpenoPospition of its mode SSB 09/2012 19 / 29
Pairwise likelihood: a simple case 
Assumptions 
I sample  closed, panmictic 
population at equilibrium 
I marker: microsatellite 
I mutation rate: =2 
k et xj 
if xi 
k are two genes of the 
sample, 
`2(xi 
k, xj 
kj) depends only on 
k - xj 
 = xi 
k 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 20 / 29
Pairwise likelihood: a simple case 
Assumptions 
I sample  closed, panmictic 
population at equilibrium 
I marker: microsatellite 
I mutation rate: =2 
k et xj 
if xi 
k are two genes of the 
sample, 
`2(xi 
k, xj 
kj) depends only on 
k - xj 
 = xi 
k 
`2(j) = 
1 
p 
1 + 2 
 ()jj 
with 
() = 
 
1 +  + 
p 
1 + 2 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 20 / 29
Pairwise likelihood: a simple case 
Assumptions 
I sample  closed, panmictic 
population at equilibrium 
I marker: microsatellite 
I mutation rate: =2 
k et xj 
if xi 
k are two genes of the 
sample, 
`2(xi 
k, xj 
kj) depends only on 
k - xj 
 = xi 
k 
`2(j) = 
1 
p 
1 + 2 
 ()jj 
with 
() = 
 
1 +  + 
p 
1 + 2 
Pairwise score function 
@ log `2(j) = 
1 
- 
1 + 2 
+ 
jj 
 
p 
1 + 2 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 20 / 29
Pairwise likelihood: 2 diverging populations 
MRCA 
POP a POP b 
 
Assumptions 
I : divergence date of pop. 
a and b 
I =2: mutation rate 
Let xi 
k and xj 
k be two genes 
coming resp. from pop. a and b 
Set  = xi 
k - xj 
k. 
Then `2(j, ) = 
e- 
p 
1 + 2 
X+1 
k=-1 
()jkjI-k(). 
where 
In(z) nth-order modified Bessel 
function of the first kind 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 21 / 29
Pairwise likelihood: 2 diverging populations 
MRCA 
POP a POP b 
 
Assumptions 
I : divergence date of pop. 
a and b 
I =2: mutation rate 
Let xi 
k and xj 
k be two genes 
coming resp. from pop. a and b 
Set  = xi 
k - xj 
k. 
A 2-dim score function 
@ log `2(j, ) = 
- + 
 
2 
`2( - 1j, ) + `2( + 1j, ) 
`2(j, ) 
@ log `2(j, ) = 
1 
- - 
1 + 2 
+ 
 
2 
`2( - 1j, ) + `2( + 1j, ) 
`2(j, ) 
+ 
q(j, ) 
`2(j, ) 
where 
q(j, ) := 
e- 
p 
1 + 2 
0() 
() 
1X 
k=-1 
jkj()jkjI-k() 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 21 / 29
Table of contents 
1 Likelihood free methods 
2 Empirical likelihood 
3 Population genetics 
4 Numerical experiments 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 22 / 29
A first experiment 
Evolutionary scenario: 
MRCA 
POP 0 POP 1 
 
Dataset: 
I 50 genes per populations, 
I 100 microsat. loci 
Assumptions: 
I Ne identical over all 
populations 
I  = (log10 , log10 ) 
I uniform prior over 
(-1., 1.5)  (-1., 1.) 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 23 / 29
A first experiment 
Evolutionary scenario: 
MRCA 
POP 0 POP 1 
 
Dataset: 
I 50 genes per populations, 
I 100 microsat. loci 
Assumptions: 
I Ne identical over all 
populations 
I  = (log10 , log10 ) 
I uniform prior over 
(-1., 1.5)  (-1., 1.) 
Comparison of the original 
ABC with ABCel 
ESS=7034 
log(theta) 
Density 
0.00 0.05 0.10 0.15 0.20 0.25 
0 5 10 15 
log(tau1) 
Density 
−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 
0 1 2 3 4 5 6 7 
histogram = ABCel 
curve = original ABC 
vertical line = “true” parameter 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 23 / 29
ABC vs. ABCel on 100 replicates of the 1st experiment 
Accuracy: 
log10  log10  
ABC ABCel ABC ABCel 
(1) 0.097 0.094 0.315 0.117 
(2) 0.071 0.059 0.272 0.077 
(3) 0.68 0.81 1.0 0.80 
(1) Root Mean Square Error of the posterior mean 
(2) Median Absolute Deviation of the posterior median 
(3) Coverage of the credibility interval of probability 0.8 
Computation time: on a recent 6-core computer (C++/OpenMP) 
I ABC  4 hours 
I ABCel 2 minutes 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 24 / 29
Second experiment 
Evolutionary scenario: 
MRCA 
POP 0 POP 1 POP 2 
2 
1 
Dataset: 
I 50 genes per populations, 
I 100 microsat. loci 
Assumptions: 
I Ne identical over all 
populations 
I  = log10(, 1, 2) 
I non-informative uniform prior 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29
Second experiment 
Evolutionary scenario: 
MRCA 
POP 0 POP 1 POP 2 
2 
1 
Dataset: 
I 50 genes per populations, 
I 100 microsat. loci 
Assumptions: 
I Ne identical over all 
populations 
I  = log10(, 1, 2) 
I non-informative uniform prior 
Comparison of the original ABC 
with ABCel 
histogram = ABCel 
curve = original ABC 
vertical line = “true” parameter 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29
Second experiment 
Evolutionary scenario: 
MRCA 
POP 0 POP 1 POP 2 
2 
1 
Dataset: 
I 50 genes per populations, 
I 100 microsat. loci 
Assumptions: 
I Ne identical over all 
populations 
I  = log10(, 1, 2) 
I non-informative uniform prior 
Comparison of the original ABC 
with ABCel 
histogram = ABCel 
curve = original ABC 
vertical line = “true” parameter 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29
Second experiment 
Evolutionary scenario: 
MRCA 
POP 0 POP 1 POP 2 
2 
1 
Dataset: 
I 50 genes per populations, 
I 100 microsat. loci 
Assumptions: 
I Ne identical over all 
populations 
I  = log10(, 1, 2) 
I non-informative uniform prior 
Comparison of the original ABC 
with ABCel 
histogram = ABCel 
curve = original ABC 
vertical line = “true” parameter 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29
ABC vs. ABCel on 100 replicates of the 2nd 
experiment 
Accuracy: 
log10  log10 1 log10 2 
ABC ABCel ABC ABCel ABC ABCel 
(1) 0.0059 0.0794 0.472 0.483 29.3 4.76 
(2) 0.048 0.053 0.32 0.28 4.13 3.36 
(3) 0.79 0.76 0.88 0.76 0.89 0.79 
(1) Root Mean Square Error of the posterior mean 
(2) Median Absolute Deviation of the posterior median 
(3) Coverage of the credibility interval of probability 0.8 
Computation time: on a recent 6-core computer (C++/OpenMP) 
I ABC  6 hours 
I ABCel 8 minutes 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 26 / 29
Why? 
On large datasets, ABCel gives more accurate results than ABC 
ABC simplifies the dataset through summary statistics 
Due to the large dimension of x, the original ABC algorithm estimates
(xobs) 
 
, 
where (xobs) is some (non-linear) projection of the observed dataset 
on a space with smaller dimension 
,! Some information is lost 
ABCelsimplifies the model through a generalized moment condition 
model. 
,! Here, the moment condition model is based on pairwise 
composition likelihood 
Pudlo (INRA  U. Montpellier 2) ABCel GenPop SSB 09/2012 27 / 29

Más contenido relacionado

Destacado

Revista giro rura 2ª edição
Revista giro rura 2ª ediçãoRevista giro rura 2ª edição
Revista giro rura 2ª edição
Revistagirorural
 
Destaque ABAG/RP 2012
Destaque ABAG/RP 2012Destaque ABAG/RP 2012
Destaque ABAG/RP 2012
blog2012
 
Visita da natura em nossa escola!
Visita da natura em nossa escola!Visita da natura em nossa escola!
Visita da natura em nossa escola!
thayscler
 
II Prêmio Professor ABAG/RP
II Prêmio Professor ABAG/RPII Prêmio Professor ABAG/RP
II Prêmio Professor ABAG/RP
blog2012
 
Dialogos entre a arte e a realidade
Dialogos entre a arte e a realidadeDialogos entre a arte e a realidade
Dialogos entre a arte e a realidade
Editora Moderna
 
Sistema babilonico
Sistema babilonicoSistema babilonico
Sistema babilonico
erikapoh
 
Hard and soft sounds of ‘c’ and 'g'
Hard and soft sounds of ‘c’ and 'g'Hard and soft sounds of ‘c’ and 'g'
Hard and soft sounds of ‘c’ and 'g'
Rita Simons Santiago
 

Destacado (20)

Revista giro rura 2ª edição
Revista giro rura 2ª ediçãoRevista giro rura 2ª edição
Revista giro rura 2ª edição
 
Destaque ABAG/RP 2012
Destaque ABAG/RP 2012Destaque ABAG/RP 2012
Destaque ABAG/RP 2012
 
Visita da natura em nossa escola!
Visita da natura em nossa escola!Visita da natura em nossa escola!
Visita da natura em nossa escola!
 
Informativo mensal da Associação Brasileira de Advogados Trabalhistas
Informativo mensal da Associação Brasileira de Advogados TrabalhistasInformativo mensal da Associação Brasileira de Advogados Trabalhistas
Informativo mensal da Associação Brasileira de Advogados Trabalhistas
 
Sopro divino
Sopro divinoSopro divino
Sopro divino
 
Clipagem da Sociedade Gaúcha de Nefrologia de agosto de 2011
Clipagem da Sociedade Gaúcha de Nefrologia de agosto de 2011Clipagem da Sociedade Gaúcha de Nefrologia de agosto de 2011
Clipagem da Sociedade Gaúcha de Nefrologia de agosto de 2011
 
Criacaode peixes
Criacaode peixesCriacaode peixes
Criacaode peixes
 
Sites de busca
Sites de buscaSites de busca
Sites de busca
 
David Aaker - Managing brand equity
David Aaker - Managing brand equityDavid Aaker - Managing brand equity
David Aaker - Managing brand equity
 
Construção de Warehouse
Construção de WarehouseConstrução de Warehouse
Construção de Warehouse
 
II Prêmio Professor ABAG/RP
II Prêmio Professor ABAG/RPII Prêmio Professor ABAG/RP
II Prêmio Professor ABAG/RP
 
Dialogos entre a arte e a realidade
Dialogos entre a arte e a realidadeDialogos entre a arte e a realidade
Dialogos entre a arte e a realidade
 
Oficina de Bacia Hidrográfica Manuelzão
Oficina de Bacia Hidrográfica ManuelzãoOficina de Bacia Hidrográfica Manuelzão
Oficina de Bacia Hidrográfica Manuelzão
 
Controle básico e administração de estoques
Controle básico e administração de estoquesControle básico e administração de estoques
Controle básico e administração de estoques
 
Como funciona Fuel Age
Como funciona Fuel AgeComo funciona Fuel Age
Como funciona Fuel Age
 
Visita virtual - Associação Apostolado do Sagrado Coração de Jesus
Visita virtual - Associação Apostolado do Sagrado Coração de Jesus Visita virtual - Associação Apostolado do Sagrado Coração de Jesus
Visita virtual - Associação Apostolado do Sagrado Coração de Jesus
 
Sistema babilonico
Sistema babilonicoSistema babilonico
Sistema babilonico
 
Feiras promocionais - check list
Feiras promocionais - check listFeiras promocionais - check list
Feiras promocionais - check list
 
Pranchas dos Arquitetos
Pranchas dos Arquitetos Pranchas dos Arquitetos
Pranchas dos Arquitetos
 
Hard and soft sounds of ‘c’ and 'g'
Hard and soft sounds of ‘c’ and 'g'Hard and soft sounds of ‘c’ and 'g'
Hard and soft sounds of ‘c’ and 'g'
 

Similar a Computing Bayesian posterior with empirical likelihood in population genetics

Similar a Computing Bayesian posterior with empirical likelihood in population genetics (20)

Approximate Bayesian computation and machine learning (BigMC 2014)
Approximate Bayesian computation and machine learning (BigMC 2014)Approximate Bayesian computation and machine learning (BigMC 2014)
Approximate Bayesian computation and machine learning (BigMC 2014)
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
 
Seminaire ihp
Seminaire ihpSeminaire ihp
Seminaire ihp
 
Individual Heterogeneity in Capture-Recapture Models
Individual Heterogeneity in Capture-Recapture ModelsIndividual Heterogeneity in Capture-Recapture Models
Individual Heterogeneity in Capture-Recapture Models
 
BA4101 Question Bank.pdf
BA4101 Question Bank.pdfBA4101 Question Bank.pdf
BA4101 Question Bank.pdf
 
Habilitation à diriger des recherches
Habilitation à diriger des recherchesHabilitation à diriger des recherches
Habilitation à diriger des recherches
 
Entropy Nucleus a nd Use i n Waste Disposal Policies
Entropy Nucleus  a nd Use  i n Waste Disposal  PoliciesEntropy Nucleus  a nd Use  i n Waste Disposal  Policies
Entropy Nucleus a nd Use i n Waste Disposal Policies
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
 
Performance of Spiked Population Models for Spectrum Sensing
Performance of Spiked Population Models for Spectrum SensingPerformance of Spiked Population Models for Spectrum Sensing
Performance of Spiked Population Models for Spectrum Sensing
 
Spike sorting: What is it? Why do we need it? Where does it come from? How is...
Spike sorting: What is it? Why do we need it? Where does it come from? How is...Spike sorting: What is it? Why do we need it? Where does it come from? How is...
Spike sorting: What is it? Why do we need it? Where does it come from? How is...
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Statistical analysis by iswar
Statistical analysis by iswarStatistical analysis by iswar
Statistical analysis by iswar
 
Lecture17 xing fei-fei
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-fei
 
Poster_PingPong
Poster_PingPongPoster_PingPong
Poster_PingPong
 
Couplings of Markov chains and the Poisson equation
Couplings of Markov chains and the Poisson equation Couplings of Markov chains and the Poisson equation
Couplings of Markov chains and the Poisson equation
 
Upm etsiccp-seminar-vf
Upm etsiccp-seminar-vfUpm etsiccp-seminar-vf
Upm etsiccp-seminar-vf
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
F0742328
F0742328F0742328
F0742328
 
Otter 2017-12-18-01-ss
Otter 2017-12-18-01-ssOtter 2017-12-18-01-ss
Otter 2017-12-18-01-ss
 

Último

Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 

Último (20)

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 

Computing Bayesian posterior with empirical likelihood in population genetics

  • 1. Computing Bayesian posterior with empirical likelihood in population genetics Pierre Pudlo INRA & U. Montpellier 2 SSB, 18 sept. 2012 Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 1 / 29
  • 2. Table of contents 1 Likelihood free methods 2 Empirical likelihood 3 Population genetics 4 Numerical experiments Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 2 / 29
  • 3. Table of contents 1 Likelihood free methods 2 Empirical likelihood 3 Population genetics 4 Numerical experiments Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 3 / 29
  • 4. When the likelihood is not completely known Likelihood intractable: `(yj) = Z U `(y,uj)du where u is a latent process Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 4 / 29
  • 5. When the likelihood is not completely known Likelihood intractable: `(yj) = Z U `(y,uj)du where u is a latent process I Hidden Markov and other dynamic models: latent process which is not observed ,! Classical answer: Markov chain Monte Carlo,. . . Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 4 / 29
  • 6. When the likelihood is not completely known Likelihood intractable: `(yj) = Z U `(y,uj)du where u is a latent process I Hidden Markov and other dynamic models: latent process which is not observed ,! Classical answer: Markov chain Monte Carlo,. . . I Population genetics: the whole gene genealogy is unobserved Likelihood is an integral over I all possible gene genealogies I all possible mutations along the genealogies ,! Classical answer: Approximate Bayesian computation (ABC) Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 4 / 29
  • 7. ABC in a nutshell Posterior distribution is the conditional distribution of ()`(xj) () knowing that x = xobs Methodology Draw a (large) set of particles (i, xi) from () and use a nonparametric estimate of the conditional density (jxobs) / ()`(xobsj) Seminal papers I Tavar ´ e, Balding, Griffith and Donnelly (1997, Genetics) I Pritchard, Seielstad, Perez-Lezuan, Feldman (1999, Molecular Biology and Evolution) Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 5 / 29
  • 8. ABC in a nutshell Posterior distribution is the conditional distribution of ()`(xj) () knowing that x = xobs Methodology Draw a (large) set of particles (i, xi) from () and use a nonparametric estimate of the conditional density (jxobs) / ()`(xobsj) Shortcomings. I time consuming – If simulation of the latent process is not straightforward Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 5 / 29
  • 9. ABC in a nutshell Posterior distribution is the conditional distribution of ()`(xj) () knowing that x = xobs Methodology Draw a (large) set of particles (i, xi) from () and use a nonparametric estimate of the conditional density (jxobs) / ()`(xobsj) Shortcomings. I time consuming – If simulation of the latent process is not straightforward I curse of dimensionality vs. lost of information – I If x lies in a high dimensional space X (often), we are unable to estimate of the conditional density I Hence, we project the (observed and simulated) datasets on a space with smaller dimension (trough summary statistics) : X ! Rd (summary statistics) Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 5 / 29
  • 10. You went to school to learn, girl (. . . ) Why 2 plus 2 makes four Now, now, now, I’m gonna teach you (. . . ) All you gotta do is repeat after me! A, B, C! It’s easy as 1, 2, 3! Or simple as Do, re, mi! (. . . ) Surveys I Beaumont (2010, Annual Review of Ecology, Evolution, and Systematics) I Lopes, Beaumont (2010, Genetics and Evolution) I Csill ` ery, Blum, Gaggiotti, Franc¸ois (2010, Trends in Ecology and Evolution) I Marin, Pudlo, Robert, Ryder (to appear, Statis. and Comput.) Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 6 / 29
  • 11. Table of contents 1 Likelihood free methods 2 Empirical likelihood 3 Population genetics 4 Numerical experiments Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 7 / 29
  • 12. Empirical likelihood (EL) Owen (1988, Biometrika), Owen (2001, Chapman Hall) Data Empirical Likelihood Parameters' definition in term of a moment constraint x = (x1, ..., xK) Profiled likelihood x xi Black box Input: I Independent data x = (x1, . . . , xK) I Definition of the parameters E(h(xi,)) = 0 Output: I A profiled “likelihood function” Lel(jx) EL do not require a fully defined and often complex (hence debatable) parametric model Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 8 / 29
  • 13. Empirical likelihood (EL) Owen (1988, Biometrika), Owen (2001, Chapman Hall) Assume that the dataset x is composed of n independent replicates x = (x1, . . . , xn) of some X F Generalized moment condition model The law F of X satisfy EF h(X,) = 0, where h is a known function, and an unknown parameter Empirical likelihood Lel(jx) = max p Yn i=1 pi () for all p such that 0 6 pi 6 1, P pi = 1, P i pih(xi,) = 0. Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 9 / 29
  • 14. Empirical likelihood (EL) Empirical likelihood Lel(jx) = max p Yn i=1 pi () for all p such that 0 6 pi 6 1, P pi = 1, P i pih(xi,) = 0. Why ()? (1) L(p) := Yn i=1 pi is the likehood of discrete distribution p = P i pixi (2) X i pih(xi,) = 0 implies that p fulfils the moment condition How to compute? Newton-Lagrange scheme. The constraint is linear (in p) See Owen (2001) chap. 12 package emplik in R Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 10 / 29
  • 15. EL examples Data Empirical Likelihood Parameters' definition in term of a moment constraint x = (x1, ..., xK) Profiled likelihood x xi Simple case: = mean h(xi,) = xi - I Lel(jx) is maximal at ^ = 1 K(x1 + + xK) I if enough data (K large), if data from a true parametric model, then Lel true likelihood Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 11 / 29
  • 16. EL examples Data Empirical Likelihood Parameters' definition in term of a moment constraint x = (x1, ..., xK) Profiled likelihood x xi h(xi,) = xi - red = param. likelihood black = Lel Example 1: Gaussian model K = 20 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0 theta el.normal K = 200 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0 theta el.normal Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 12 / 29
  • 17. EL examples Data Empirical Likelihood Parameters' definition in term of a moment constraint x = (x1, ..., xK) Profiled likelihood x xi red = true likelihood blue = Gaussian likelihood black = Lel Example 2: non Gaussian model K = 200 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.0 0.2 0.4 0.6 0.8 1.0 theta el.normal Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 13 / 29
  • 18. EL examples Data Empirical Likelihood Parameters' definition in term of a moment constraint x = (x1, ..., xK) Profiled likelihood x xi red = true likelihood blue = Gaussian likelihood black = Lel Example 2: non Gaussian model K = 200 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.0 0.2 0.4 0.6 0.8 1.0 theta el.normal I Lel is better than a false model. Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 13 / 29
  • 19. Table of contents 1 Likelihood free methods 2 Empirical likelihood 3 Population genetics 4 Numerical experiments Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 14 / 29
  • 20. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Sample of 8 genes Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29
  • 21. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Kingman’s genealogy When time axis is normalized, T(k) Exp(k(k - 1)=2) Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29
  • 22. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Kingman’s genealogy When time axis is normalized, T(k) Exp(k(k - 1)=2) Mutations according to the Simple stepwise Mutation Model (SMM) date of the mutations Poisson process with intensity =2 over the branches Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29
  • 23. Neutral model at a given microsatellite locus, in a closed panmictic population at equilibrium Observations: leafs of the tree ^ = ? Kingman’s genealogy When time axis is normalized, T(k) Exp(k(k - 1)=2) Mutations according to the Simple stepwise Mutation Model (SMM) date of the mutations Poisson process with intensity =2 over the branches MRCA = 100 independent mutations: 1 with pr. 1=2 Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29
  • 24. Much more interesting models. . . I several independent locus Independent gene genealogies and mutations I different populations linked by an evolutionary scenario made of divergences, admixtures, migrations between populations, etc. I larger sample size usually between 50 and 100 genes A typical evolutionary scenario: MRCA POP 0 POP 1 POP 2 2 1 Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 16 / 29
  • 25. Moment condition in population genetics? Main difficulty Derive a constraint EF h(X,) = 0, on the parameters of interest when X is the genotypes of the sample of individuals at a given locus E.g., in phylogeography, is composed of I dates of divergence between populations, I ratio of population sizes, I mutation rates, etc. None of them are moments of the distribution of the allelic states of the sample Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 17 / 29
  • 26. Moment condition in population genetics? Main difficulty Derive a constraint EF h(X,) = 0, on the parameters of interest when X is the genotypes of the sample of individuals at a given locus E.g., in phylogeography, is composed of I dates of divergence between populations, I ratio of population sizes, I mutation rates, etc. None of them are moments of the distribution of the allelic states of the sample ,! h = pairwise composite scores whose zero is the pairwise maximum likelihood estimator Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 17 / 29
  • 27. Pairwise composite likelihood? The intra-locus pairwise likelihood `2(xkj) = Y ij k, xj `2(xi kj) with x1 k, . . . , xnk : allelic states of the gene sample at the k-th locus Sample of size 2 ) coalescent tree is a single line joining xi k to xj k k, xj ) closed form of `2(xi kj) The pairwise score function r log `2(xkj) = X ij k, xj r log `2(xi kj) Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 18 / 29
  • 28. Composite lik , approximation of lik Composite likelihoods are often much more narrow than the distribution of the model (See, e.g., Cooley et al., 2012) Hence, I produces too narrow posterior distributions I confidence intervals or credibility intervals are not reliable But, E(r log `2(xj)) = Z r log `2(xj) `(xj)dx = X ij Z r log `2(xi, xjj) `2(xi, xjj)dxidxj = X ij Z r`2(xi, xjj) `2(xi, xjj) `2(xi, xjj)dxidxj = X ij r Z `2(xi, xjj)dxidxj = 0 SPaufdelo (wINRitAh UE. MLonbtpeellicera2)use we only AuBsCeel GpenoPospition of its mode SSB 09/2012 19 / 29
  • 29. Pairwise likelihood: a simple case Assumptions I sample closed, panmictic population at equilibrium I marker: microsatellite I mutation rate: =2 k et xj if xi k are two genes of the sample, `2(xi k, xj kj) depends only on k - xj = xi k Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 20 / 29
  • 30. Pairwise likelihood: a simple case Assumptions I sample closed, panmictic population at equilibrium I marker: microsatellite I mutation rate: =2 k et xj if xi k are two genes of the sample, `2(xi k, xj kj) depends only on k - xj = xi k `2(j) = 1 p 1 + 2 ()jj with () = 1 + + p 1 + 2 Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 20 / 29
  • 31. Pairwise likelihood: a simple case Assumptions I sample closed, panmictic population at equilibrium I marker: microsatellite I mutation rate: =2 k et xj if xi k are two genes of the sample, `2(xi k, xj kj) depends only on k - xj = xi k `2(j) = 1 p 1 + 2 ()jj with () = 1 + + p 1 + 2 Pairwise score function @ log `2(j) = 1 - 1 + 2 + jj p 1 + 2 Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 20 / 29
  • 32. Pairwise likelihood: 2 diverging populations MRCA POP a POP b Assumptions I : divergence date of pop. a and b I =2: mutation rate Let xi k and xj k be two genes coming resp. from pop. a and b Set = xi k - xj k. Then `2(j, ) = e- p 1 + 2 X+1 k=-1 ()jkjI-k(). where In(z) nth-order modified Bessel function of the first kind Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 21 / 29
  • 33. Pairwise likelihood: 2 diverging populations MRCA POP a POP b Assumptions I : divergence date of pop. a and b I =2: mutation rate Let xi k and xj k be two genes coming resp. from pop. a and b Set = xi k - xj k. A 2-dim score function @ log `2(j, ) = - + 2 `2( - 1j, ) + `2( + 1j, ) `2(j, ) @ log `2(j, ) = 1 - - 1 + 2 + 2 `2( - 1j, ) + `2( + 1j, ) `2(j, ) + q(j, ) `2(j, ) where q(j, ) := e- p 1 + 2 0() () 1X k=-1 jkj()jkjI-k() Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 21 / 29
  • 34. Table of contents 1 Likelihood free methods 2 Empirical likelihood 3 Population genetics 4 Numerical experiments Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 22 / 29
  • 35. A first experiment Evolutionary scenario: MRCA POP 0 POP 1 Dataset: I 50 genes per populations, I 100 microsat. loci Assumptions: I Ne identical over all populations I = (log10 , log10 ) I uniform prior over (-1., 1.5) (-1., 1.) Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 23 / 29
  • 36. A first experiment Evolutionary scenario: MRCA POP 0 POP 1 Dataset: I 50 genes per populations, I 100 microsat. loci Assumptions: I Ne identical over all populations I = (log10 , log10 ) I uniform prior over (-1., 1.5) (-1., 1.) Comparison of the original ABC with ABCel ESS=7034 log(theta) Density 0.00 0.05 0.10 0.15 0.20 0.25 0 5 10 15 log(tau1) Density −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 0 1 2 3 4 5 6 7 histogram = ABCel curve = original ABC vertical line = “true” parameter Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 23 / 29
  • 37. ABC vs. ABCel on 100 replicates of the 1st experiment Accuracy: log10 log10 ABC ABCel ABC ABCel (1) 0.097 0.094 0.315 0.117 (2) 0.071 0.059 0.272 0.077 (3) 0.68 0.81 1.0 0.80 (1) Root Mean Square Error of the posterior mean (2) Median Absolute Deviation of the posterior median (3) Coverage of the credibility interval of probability 0.8 Computation time: on a recent 6-core computer (C++/OpenMP) I ABC 4 hours I ABCel 2 minutes Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 24 / 29
  • 38. Second experiment Evolutionary scenario: MRCA POP 0 POP 1 POP 2 2 1 Dataset: I 50 genes per populations, I 100 microsat. loci Assumptions: I Ne identical over all populations I = log10(, 1, 2) I non-informative uniform prior Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29
  • 39. Second experiment Evolutionary scenario: MRCA POP 0 POP 1 POP 2 2 1 Dataset: I 50 genes per populations, I 100 microsat. loci Assumptions: I Ne identical over all populations I = log10(, 1, 2) I non-informative uniform prior Comparison of the original ABC with ABCel histogram = ABCel curve = original ABC vertical line = “true” parameter Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29
  • 40. Second experiment Evolutionary scenario: MRCA POP 0 POP 1 POP 2 2 1 Dataset: I 50 genes per populations, I 100 microsat. loci Assumptions: I Ne identical over all populations I = log10(, 1, 2) I non-informative uniform prior Comparison of the original ABC with ABCel histogram = ABCel curve = original ABC vertical line = “true” parameter Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29
  • 41. Second experiment Evolutionary scenario: MRCA POP 0 POP 1 POP 2 2 1 Dataset: I 50 genes per populations, I 100 microsat. loci Assumptions: I Ne identical over all populations I = log10(, 1, 2) I non-informative uniform prior Comparison of the original ABC with ABCel histogram = ABCel curve = original ABC vertical line = “true” parameter Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29
  • 42. ABC vs. ABCel on 100 replicates of the 2nd experiment Accuracy: log10 log10 1 log10 2 ABC ABCel ABC ABCel ABC ABCel (1) 0.0059 0.0794 0.472 0.483 29.3 4.76 (2) 0.048 0.053 0.32 0.28 4.13 3.36 (3) 0.79 0.76 0.88 0.76 0.89 0.79 (1) Root Mean Square Error of the posterior mean (2) Median Absolute Deviation of the posterior median (3) Coverage of the credibility interval of probability 0.8 Computation time: on a recent 6-core computer (C++/OpenMP) I ABC 6 hours I ABCel 8 minutes Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 26 / 29
  • 43. Why? On large datasets, ABCel gives more accurate results than ABC ABC simplifies the dataset through summary statistics Due to the large dimension of x, the original ABC algorithm estimates
  • 44.
  • 45.
  • 46. (xobs) , where (xobs) is some (non-linear) projection of the observed dataset on a space with smaller dimension ,! Some information is lost ABCelsimplifies the model through a generalized moment condition model. ,! Here, the moment condition model is based on pairwise composition likelihood Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 27 / 29
  • 47. Joint work with I Christian P. Robert (U. Dauphine IUF) I Kerrie Mengersen (QUT, Australia) I Rapha¨ el Leblois (INRA CBGP, Montpellier) Grant from ANR through Project “Emile” First preprint on arXiv Approximate Bayesian computation via empirical likelihood Coming soon: population genetic models which are too slow to simulate Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 28 / 29
  • 48. Joint work with I Christian P. Robert (U. Dauphine IUF) I Kerrie Mengersen (QUT, Australia) I Rapha¨ el Leblois (INRA CBGP, Montpellier) Grant from ANR through Project “Emile” First preprint on arXiv Approximate Bayesian computation via empirical likelihood Coming soon: population genetic models which are too slow to simulate Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 28 / 29
  • 49. Empirical likelihood Wilk’s theorem to EL likelihood ratio. Theorem (Owen, 1991) Let X, X1, . . . ,Xn be iid random vector with common distribution F0. For 2 , x 2 X , let h(x,) 2 Rs. Let 0 2 be such that Var(h(X, 0)) is finite and has rank q 0. If 0 satisfies E(h(X,0)) = 0, then -2 log Lel(0jX1:n) n-n ! 2 (q) Note that I n-n is the maximum value of Lel I no condition to ensure a unique value for 0 I no condition to make ^ el = argmax Lel a good estimate of 0 I provides tests and confidence intervals Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 29 / 29