phycas lightning talk iEvoBio 2011

Estimating marginal likelihoods for phylogenetic models
in Phycas

Phycas is a software package for Bayesian phylogenetic
inference (with support for ML searching planned).

Paul Lewis is the primary author. Mark Holder and Dave
Swoﬀord are co-authors.

Written in C++ and Python (using boost-python to create
python bindings to C++ code).

Compiled versions and manual: http://www.phycas.org

Source: https://github.com/mtholder/Phycas

Bayesian model selection

• Use model averaging if we can “jump” between models, or
• Compare their marginal likelihood.

The Bayes Factor between two models:

Pr(D|M1)
B10 =
Pr(D|M0)

Pr(D|M1) = Pr(D|θ, M1) Pr(θ)dθ

where θ is the set of parameters in the model.

Two simple estimators of the marginal likelihood

1. mean of likelihood evaluated at parameter values randomly
drawn from the prior.

2. harmonic mean of likelihood evaluated at parameter values
randomly drawn from the posterior (Newton and Raftery,
1994).

Sharp posterior (black) and prior (red)

40
30
density

20
10
0

−2 −1 0 1 2

x

From Dr. Radford Neal’s blog

The Harmonic Mean of the Likelihood: Worst Monte
Carlo Method Ever

“The total unsuitability of the harmonic mean
estimator should have been apparent within an hour
of its discovery.”

Steppingstone sampling (Xie et al., 2010; Fan et al., 2010)
blends two distributions:
• the posterior, Pr(D|θ, M1) Pr(θ, M1)
• a tractable reference distribution, π(θ)

β (1−β)
[Pr(D|θ, M1) Pr(θ, M1)] [π(θ)]
pβ (θ|D, M1) =
cβ

c0 = 1.0
c1 c1 c0.38 c0.1 c0.01
Pr(D|M1) = =
c0 c0.38 c0.1 c0.01 c0
c1 c0.38

c0.1

c0.01

=
c0.38

c0.1

c0.01

c0

c1 c1 c0.38 c0.1 c0.01
c0 = c0.38 c0.1 c0.01 c0

Photo by Johan Nobel http://www.flickr.com/photos/43147325@N08/4326713557/ downloaded from Wikimedia

Typically, Steppingstone sampling uses a series of slightly vaguer
distributions to estimate the ratio of normalizing constant:
Steppingstone densities

40
30
density

20
10
0

−2 −1 0 1 2

x

A reference distribution over tree topologies

We must be able to:

1. calculate the probability for any tree topology,

2. center the distribution on the posterior,

3. control the “vagueness” of the distribution,

4. eﬃciently sample trees from the distribution.

Tree-Centered Independent-Split-Probability (TCISP)
distribution

Argument: a tree with probabilities for each split.

Result: a probability distribution over all tree topologies.

G
J L
A
0. 0.5
8 0.6
E H
0.9 0.8

0.
D F

4
0.3
Input: a focal tree
to center the distribution 0.9 C
with split probabilities

I K

G
J L
A
E H
D F
We will keep the blue branches
and avoid the red ones C

I K

A G L
J H
E F
D C
I K

C A
D
F
E H
J L

One of the many resolutions
which avoid the red branches
G

I K

G C A
J L D
A F
E H E H
F J L
D
C G

I K I K

Counting trees:
Bryant and Steel (2009) provide an O(n5) algorithm for
counting the number of trees that share no splits with another
tree.

Multitree steppingstone:

• Works on tiny trees (≤ 6 leaves) with no tuning;

• We are working on more eﬃcient MCMC for larger trees;

• Code on: https://github.com/mtholder/Phycas/tree/
sampling_ref_dist

Conclusions

• Do not trust the harmonic mean estimator of the marginal
likelihood.

• Take a look at Phycas: http://www.phycas.org (under
GPLv2.0; source on GitHub).

• Watch for multitree steppingstone is a more generic, usable
form soon.

• Tree-Centered Independent-Split-Probability (TCISP) distribution
may be useful in other contexts: likelihood-based supertrees,
or MCMC proposals.

Thanks: NSF AToL and iEvoBio
See: Xie et al. (2010); Fan et al. (2010); Lartillot
and Philippe (2006) for more discussion of estimating
marginal likelihoods.

References

Bryant, D. and Steel, M. (2009). Computing the distribution of a tree
metric. IEEE IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 6(3):420–426.

Fan, Y., Wu, R., Chen, M.-H., Kuo, L., and Lewis, P. O. (2010). Choosing
among partition models in bayesian phylogenetics. Molecular Biology and
Evolution, page (advanced access).

Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using
thermodynamic integration. Systematic Biology, 55(2):195–207.

Newton, M. A. and Raftery, A. E. (1994). Approximate bayesian inference
with the weighted likelihood bootstrap. Journal of the Royal Statistical
Society, Series B (Methodological), 56(1):3–48.

Xie, W., Lewis, P. O., Fan, Y., Kuo, L., and Chen, M.-H. (2010). Improving

marginal likelihood estimation for Bayesian phylogenetic model selection.
Systematic Biology, 60(2):150–160.

phycas lightning talk iEvoBio 2011

Recomendados

Recomendados

Más contenido relacionado

Similar a phycas lightning talk iEvoBio 2011

Similar a phycas lightning talk iEvoBio 2011 (14)

Último

Último (20)

phycas lightning talk iEvoBio 2011