SlideShare a Scribd company logo
1 of 51
Download to read offline
Phylogenetic Analyses in R
Klaus Schliep
Universidad de Vigo
Porto, 15–16 July 2013
Outline
Getting started
Data Structures
Distance based methods
Maximum Parsimony
Maximum likelihood
Section 1
Getting started
About
This slides should give a short introduction into phylogenetic
reconstruction in R. It focuses mostly on the packages ape and
phangorn. I have to thank Emmanuel Paradis for his work on ape.
The slides are produced with literate programming using Latex,
Beamer, Sweave and R. So all the code and graphics are ”real”!
Help
To install an R package it is good to have administrator rights.
Download R from www.cran.r-project.org. You can easily install
packges from within R:
> install.packages("phangorn")
> install.packages("phytools")
> install.packages("pegas")
> install.packages("seqLogo")
> q()
Then you can load the packages you need:
> library("phangorn")
> library("seqLogo")
Help
The R homepage provides lots of general documentation, faqs, etc.
There are help pages for all the functions and most of them
contain examples.
> library(help="phangorn")
> help.start()
> ?pml
> help(pml)
> example(pml)
> vignette("Ancestral")
Copy and paste the parts of the code in the examples is a good
start. If you prefer reading a book (even they are fast outdated):
Paradis, E. (2012) Analysis of Phylogenetics and Evolution with R
(Second Edition) New York: Springer
There is a mailing list stat.ethz.ch/mailman/listinfo/r-sig-phylo
where you can ask questions, after browsing through the archive.
Section 2
Data Structures
Data Structures
Reminder:
1. Data in R are made of vector + attribute(s) (and
combinations of these). Vector: a series of elements all of the
same kind (a list is a vector of pointers).
2. The class is the attribute determining the action of generic
functions (plot, summary, etc.)
We will make heavily use of the following 3 data structures:
1. phyDat: sequences (DNA, AA, codons, user defined) in
phangorn
2. DNAbin: DNA sequences (ape format)
3. phylo: phylogenetic trees
Class phylo
This class represents phylogenetic trees. The tip labels may be
replicated, the node labels (which may be absent). Input:
1. read.tree: Newick files
2. read.nexus: NEXUS files
If the file contains several trees, these two functions return an
object of class multiPhylo which is a list of trees of class phylo.
And you can write objects of class phylo using write.tree or
write.nexus.
Plotting trees
ape has great plotting capabilities.
> help(plot.phylo)
Some simple example
> tree <- rtree(10)
> par(mfrow=c(2,2), mar=rep(0,4))
> plot(tree)
> plot(tree, type="fan")
> plot(tree, type="unrooted")
> plot(tree, type="cladogram")
Plotting trees
t9
t10
t4
t8
t5
t3
t6
t1
t2
t7
t9
t10
t4
t8
t5
t3
t6
t1
t2
t7
t9
t10
t4
t8
t5
t3
t6
t1
t2
t7
t9
t10
t4
t8
t5
t3
t6
t1
t2
t7
Transforming trees
There are many functions in ape and phangorn to transform trees
(i.e. objects of class phylo)
> root(tree, outgroup)
> drop.tip(tree, "t1")
> extract.clade(phy, 1)
> bind.tree(tree1, tree2)
> unroot(tree)
> multi2di(tree)
> di2multi(tree)
> nni(tree)
> rSPR(tree)
Class phyDat
The starting point for phylogenetic reconstruction are sequence
alignments. ape can call clustal,tcoffee and muscle and
phyloch can call mafft, prank and gblocks.
More frequently you will just read in an alignment
> align1 <- read.phyDat("myfile")
phangorn (phyDat) and ape (DNAbin) use different formats to
represent alignments, but it is easy to convert formats.
> align2 <- read.dna("myfile") # ape format
> align3 <- as.phyDat(align1) # phangorn format
Section 3
Distance based methods
Distance based methods
Distance methods take a distance or dissimilarity matrix as input.
Ultrametric Additive
upgmaa fastme.ols
wpgmaa fastme.bal
nj
UNJa
bionj
a in phangorn the rest in ape.
Fast methods O(n2) or O(n3) → big data sets can be
analysed.
Distances can be calculated for different kinds of data.
In phylogenetics often used to compute starting trees for ML,
MP or inside species tree methods.
Distance based methods
> set.seed(1)
> bs <- bootstrap.phyDat(Laurasiatherian, FUN = function(x
> class(bs) <- 'multiPhylo'
> cnet = consensusNet(bs, .3)
> plot(cnet, show.tip.label=FALSE, show.nodes=TRUE)
Consensusnetwork
Section 4
Maximum Parsimony
Maximum parsimony
In contrast to the distance methods (maximum) parsimony uses
sequence alignments as input. The target is to minimize an
optimality criterion, i.e. a score to a tree, given the data. For the
parsimony method the score is the minimal number of substitutions
needed to account for the data on a phylogeny.
> data(Laurasiatherian)
> tree = nj(dist.ml(Laurasiatherian))
> parsimony(tree, Laurasiatherian)
[1] 9776
> tree2 = optim.parsimony(tree, Laurasiatherian,
trace=FALSE, rearrangement="SPR")
> parsimony(tree2, Laurasiatherian)
[1] 9713
> tree3 = pratchet(Laurasiatherian, rearrangement="SPR", t
Branch and bound
Normally it is not possible to evaluate an optimality criterion for all
trees, as there are just too many trees.
> sapply(3:10, howmanytrees, FALSE)
[1] 1 3 15 105 945 10395
[7] 135135 2027025
> howmanytrees(20, FALSE)
[1] 2.216431e+20
For small datasets it is possible to find all most parsimonious trees
using a branch and bound algorithm. For datasets with more than
10 taxa this can take a long time and depends strongly on how
tree like the data are.
> besttree <- bab(subset(Laurasiatherian,1:10), trace=0)
> parsimony(besttree, Laurasiatherian)
[1] 2695
Ancestral reconstruction
To reconstruct ancestral sequences we first load some data and
reconstruct a tree:
> primates = read.phyDat("primates.dna")
> tree = pratchet(primates, trace=0)
> tree = acctran(tree, primates)
> parsimony(tree, primates)
[1] 746
In parsimony analysis the edge length represent the observed
number of changes. Reconstructiong ancestral states therefore
defines also the edge lengths of a tree. However there can exist
several equally parsimonious reconstructions or states can be
ambiguous and therefore edge length can differ (e.g. ”MPR”or
”ACCTRAN”).
> anc.acctran = ancestral.pars(tree, primates, "ACCTRAN")
> anc.mpr = ancestral.pars(tree, primates, "MPR")
Ancestral reconstruction
> seqLogo( t(subset(anc.mpr, getRoot(tree), 1:20)[[1]]), i
1 2 3 4 5 6 7 8 910 12 14 16 18 20
Position
0
0.2
0.4
0.6
0.8
1Probability
Ancestral reconstruction MPR
> plotAnc(tree, anc.mpr, 17)
> title("MPR")
Mouse
Bovine
Lemur
Tarsier
Squir Monk
Jpn Macaq
Rhesus Mac
Crab−E.Mac
BarbMacaq
Gibbon
Orang
Gorilla
Chimp
Human
a
c
g
t
MPR
Ancestral reconstruction ACCTRAN
> plotAnc(tree, anc.acctran, 17)
> title("ACCTRAN")
Mouse
Bovine
Lemur
Tarsier
Squir Monk
Jpn Macaq
Rhesus Mac
Crab−E.Mac
BarbMacaq
Gibbon
Orang
Gorilla
Chimp
Human
a
c
g
t
ACCTRAN
Section 5
Maximum likelihood
Maximum Likelihood
”[In 1961] I had visions of evolutionary tree estimation being much
the same [than linkage estimation] but with the addition of the
need to estimate the form of the tree itself, surely a fatal
complexity: my intuition was that there would be insufficient data
for the task.”
—A.W.F. Edwards (2009)
Phylogenetic likelihood is the probability f (x|θ, τ) of observing an
alignment X given a model of (nucleotide) substitution with
parameters θ and phylogenetic tree τ.
L(θ, τ, x) =
N
i=1
f (xi |θ, τ)
where N is the number of sites in the alignment. It is common to
maximise the log-likelihood function
(θ, τ, x) = N
i=1 log (f (xi |θ, τ)) which also maximises L(θ, τ, x).
Applications in phylogenetics
Felsenstein (1981) introduced the pruning algorithm which made
the computation of the likelihood feasible. Let nodes j and k have
a direct ancestor h. We can estimate the conditional likelihood
Lh(xh) =


xj
Lj (xj )pxj ,xh
(tj )

 ×
xk
Lk(xk)pxk ,xh
(tk)
The likelihood of the tree is evaluated by traversing the tree in
postorder fashion from the tips towards the root. For unrooted
trees, a root can be chosen arbitrarily as our models are
time-reversible. We get the likelihood of the tree if we multiply the
conditional likelihood of the root node r with the base composition
π, as
fh(x|θ, τ) =
xr
πxr Lr (xr ),
These formulas can be adapted to estimate ancestral sequences.
ML in phylogenetics
5
6
7
human chimp gorilla orangutan
ML in phylogenetics
a a g t
ML in phylogenetics
1|0|0|0 1|0|0|0 0|0|1|0 0|0|0|1
ML in phylogenetics
1|0|0|0 1|0|0|0 0|0|1|0 0|0|0|1
0.000988|0.000031|0.000595|0.000744
0.027161|0.000559|0.016240|0.000559
0.923613|0.000168|0.000168|0.000169
Finding the best topology
A binary unrooted tree has 5 edges and 3 distinct topologies. Here
are the general formulas for binary unrooted trees:
2n − 3 edges
(2n − 5)!! = 1 × 3 × 5 × · · · × (2n − 3) topologies
Rooted binary trees have 2n − 2 edges and (2n − 5)!! topologies.
A function exists for this:
> howmanytrees(4, rooted=FALSE)
[1] 3
> howmanytrees(10, rooted=FALSE)
[1] 2027025
> howmanytrees(20, rooted=FALSE)
[1] 2.216431e+20
Finding the best trees
The strategy of evaluating the likelihood criterion for all trees in
order to find the best tree topoology is in most cases highly
impracticable. Instead, local tree rearrangements are used to
search locally within the tree space. The idea behind such a
heuristic is to use a starting tree and search locally for improved
scores (parsimony, maximum likelihood, Least-Squares), until no
further rearrangements can lead to a tree with a better score.
Nearest neighbor interchange
For any internal edge of a binary tree there exist three different
ways to connect its four subtrees, one of which is the current tree.
A
B
C
D
A
C
B
D
A
D
B
C
Modelling rate variation
We assume that the substitution rate varies between different sites
(intron vs. exon, codon positions, etc). Two approaches are
commonly used:
define different partitions
model rate variation with different rate categories, with a
(discrete) Γ distribution and/or proportion of variables sites
Comparing trees and models
The phylogenetic likelihood allows us to compare many different
models or trees. There is often a bias vs. variance trade-off.
Simple models are easy to interpret but can often be biased.
MSE
Variance
Bias2
number of parameters
Comparing trees and models
The phylogenetic likelihood allows us to compare many different
models or trees.
If two models are nested - that is, one model can be described
as a special case of the other – then we can directly compare
their likelihoods under their ML parameter estimates for a
fixed tree using a likelihood ratio test (LRT)
For non nested models we can use the Akaike Information
Criteria (AIC) or the Bayesian Information Criteria (BIC):
AIC = − (θ, τ, x) + 2 ∗ df
BIC = − (θ, τ, x) + ln(n) ∗ df
where df is the number of parameters of the model and n the
number of sites.
Or use the Shimodaira-Hasegawa test or similar bootstrap
approaches.
Detection of molecular adaptation
We look at each triplet of nucletides and assume that only one
nucleotide can be replaced at a time. Then we can distinguish
between nucleotide substitutions that result in the same amino
acid (synonymous substitutions) or a different amino acid
(non-synonymous substitutions). The ratio dN/dS of
non-synonymous to synonymous substitutions can be an indication
of the kind of selective pressure acting on the codon site. Under
negative selection, we expect that non-synonymous substitutions
will accumulate more slowly than synonymous ones. And under
positive or diversifying selection, we expect more amino acid
changing replacements.
Applications with phangorn
The two main functions are pml to set up the model and
optim.pml for optimising parameters and the tree with ML.
Example session for Jukes Cantor, GTR and GTR+Γ+I model:
> data(Laurasiatherian)
> tr <- nj(dist.ml(Laurasiatherian))
> m0 <- pml(tr, Laurasiatherian)
> m.jc69 <- optim.pml(m0, optNni=TRUE)
> m.gtr <- optim.pml(m0, optNni=TRUE, model="GTR")
> m.gtr.G.I <- optim.pml(update(m.gtr, k=4), model=
"GTR", optNni=TRUE, optGamma=TRUE, optInv=TRUE)
By default, only the edge lengths are optimized. Currently
phangorn only supports NNI tree rearrangements (equivalent to
PhyML vers. 2)
There exist several useful generic functions like update, anova or
AIC for objects of class pml.
> methods(class="pml")
[1] anova.pml logLik.pml plot.pml print.pml
[5] update.pml vcov.pml
For example we can compare the different models as they are
nested with likelihood ratio test:
> anova(m.jc69, m.gtr, m.gtr.G.I)
Likelihood Ratio Test Table
Log lik. Df Df change Diff log lik. Pr(>|Chi|)
1 -54113 91
2 -50603 99 8 7020 < 2.2e-16 ***
3 -44527 101 2 12151 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
Partition models
pmlPart(global ∼ local, object, model)
global local
bf bf
Q Q
inv inv
shape shape
edge edge
rate
nni
Each component can be only used once in the formula.
Partition models
There are two different ways to set up partition models.
1. Setting up partition models for different genes.
> fit1 <- pml(tree, g1)
> fit2 <- pml(tree, g2)
> fit3 <- pml(tree, g3)
> fit4 <- pml(tree, g4)
> genePart <- pmlPart(Q + bf ∼ edge,
list(fit1, fit2, fit3, fit4), optRooted=TRUE)
> trees <- lapply(genePart$fits, function(x)x$tree)
> class(trees) <- "multiPhylo"
> densiTree(trees, type="phylogram", col="red")
where g1, g2, g3 and g4 are objects of class phyDat.
ML in phylogenetics
Scer
Spar
Smik
Skud
Sbay
Scas
Sklu
Calb
Partition models
2. Partitioning via a weight matrix.
> woody <- phyDat(woodmouse)
> tree <- nj(dist.ml(woody))
> fit <- pml(tree, woody)
> w <- attr(woody, "index")
> weight <- table(w, rep(c(1,2,3), length=length(w)))
> codonPart <- pmlPart(edge ∼ rate, fit,
model=c("JC", "JC", "GTR"), weight=weight)
Model / tree comparison
Alternatively we can use the Shimodaira-Hasegawa test to check
for differences between models:
> SH.test(m.jc69, m.gtr, m.gtr.G.I)
Trees ln L Diff ln L p-value
[1,] 1 -54112.74 9585.685 0.0000
[2,] 2 -50602.74 6075.683 0.0000
[3,] 3 -44527.06 0.000 0.5911
Model selection
Two possibilities
ape: phymltest
> write.phyDat(woody, "woody.phy")
> out <- phymltest("woody.phy", execname =
"~/phyml")
phangorn: modelTest
> mt <- modelTest(Laurasiatherian, model=c("JC",
"F81", "HKY", "GTR"))
modelTest works also for amino acid models similar to ProtTest.
> mt <- modelTest(myAAData, model=c("WAG", "JTT",
"LG","Dayhoff"))
Model Selection
Model df logLik AIC BIC
1 JC 91.00 -54303.67 108789.35 109341.20
2 JC+I 92.00 -50673.32 101530.63 102088.55
3 JC+G 92.00 -48684.10 97552.19 98110.11
4 JC+G+I 93.00 -48605.03 97396.06 97960.05
5 F81 94.00 -54212.64 108613.27 109183.32
6 F81+I 95.00 -50549.53 101289.06 101865.17
7 F81+G 95.00 -48500.49 97190.99 97767.10
8 F81+G+I 96.00 -48416.26 97024.51 97606.69
9 HKY 95.00 -51275.86 102741.72 103317.83
10 HKY+I 96.00 -47451.73 95095.45 95677.63
11 HKY+G 96.00 -44893.11 89978.23 90560.40
12 HKY+G+I 97.00 -44770.18 89734.36 90322.60
13 GTR 99.00 -50759.89 101717.79 102318.16
14 GTR+I 100.00 -47081.77 94363.55 94969.98
15 GTR+G 100.00 -44759.49 89718.99 90325.42
16 GTR+G+I 101.00 -44624.02 89450.04 90062.54
Bootstrap
> bs <- bootstrap.pml(m.gtr, bs=100, optNni=TRUE)
> plotBS(m.gtr$tree, bs, type="phylo", bs.adj=c(.5,0))
Platypus
Wallaroo
Possum
Bandicoot
Opposum
Armadillo
Elephant
Aardvark
Tenrec
Hedghog
Gymnure
Mole
Shrew
Rbat
FlyingFox
RyFlyFox
FruitBat
LongTBat
Horse
Donkey
WhiteRhino
IndianRhin
Pig
Alpaca
Cow
Sheep
Hippo
FinWhale
BlueWhale
SpermWhale
Rabbit
Pika
Squirrel
Dormouse
GuineaPig
Mouse
Vole
CaneRat
Baboon
Human
Loris
Cebus
Cat
Dog
HarbSeal
FurSeal
GraySeal
10058
100
100
100
58
93
100100
100100
64
58
100
86
100
100
98
96
100100
87
100
44
79
100
88
97
64
86
73
75
100
5489
100
70
47
91
55
68
67
100
100
Codon Models
qij =



0 if i and j differ in more than one position
πj for synonymous transversion
πj κ for synonymous transition
πj ω for non-synonymous transversion
πj ωκ for non-synonymous transition
or if we make abstraction of pij (frequency of base j):
qij =



0 if i and j differ in more than one position
1 for synonymous transversion
κ for synonymous transition
ω for non-synonymous transversion
ωκ for non-synonymous transition
where ω is the dN/dS ratio, κ the transition transversion ratio and
πj is the the equilibrium frequencies of codon j.
Codon Models
> (dat <- phyDat(as.character(yeast), "CODON"))
> tree <- nj(dist.ml(yeast))
> fit <- pml(tree, dat)
> ctr <- pml.control(trace=0)
> fit0 <- optim.pml(fit, control = ctr)
> fit1 <- optim.pml(fit0, model="codon1", control=ctr)
> fit2 <- optim.pml(fit0, model="codon2", control=ctr)
> fit3 <- optim.pml(fit0, model="codon3", control=ctr)
Model κ ω
codon0 1 1
codon1 free free
codon2 1 free
codon3 free 1
Additionally, the equilibrium frequencies of the codons πj can be
estimated setting the parameter optBf=TRUE.
Codon Models
> anova(fit0, fit2, fit1)
Likelihood Ratio Test Table
Log lik. Df Df change Diff log lik. Pr(>|Chi|)
1 -1054762 13
2 -648282 14 1 812961 < 2.2e-16 ***
3 -642807 15 1 10949 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
> anova(fit0, fit3, fit1)
Likelihood Ratio Test Table
Log lik. Df Df change Diff log lik. Pr(>|Chi|)
1 -1054762 13
2 -708674 14 1 692176 < 2.2e-16 ***
3 -642807 15 1 131735 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘

More Related Content

What's hot (20)

Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
 
PPT ON ALGORITHM
PPT ON ALGORITHMPPT ON ALGORITHM
PPT ON ALGORITHM
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
BLAST
BLASTBLAST
BLAST
 
BLAST
BLASTBLAST
BLAST
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
BLAST and sequence alignment
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 
BLAST
BLASTBLAST
BLAST
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
 
Phylogenetic data analysis
Phylogenetic data analysisPhylogenetic data analysis
Phylogenetic data analysis
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 

Viewers also liked

Viewers also liked (15)

Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
 
Phylogenetic tree
Phylogenetic treePhylogenetic tree
Phylogenetic tree
 
Phylogenetic studies
Phylogenetic studiesPhylogenetic studies
Phylogenetic studies
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
SeqinR - biological data handling
SeqinR - biological data handlingSeqinR - biological data handling
SeqinR - biological data handling
 
Phylogeny
PhylogenyPhylogeny
Phylogeny
 
What is a phylogenetic tree
What is a phylogenetic treeWhat is a phylogenetic tree
What is a phylogenetic tree
 
Phylogenetic trees
Phylogenetic treesPhylogenetic trees
Phylogenetic trees
 
The flipped classroom in action
The flipped classroom in actionThe flipped classroom in action
The flipped classroom in action
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
 
Incio del año 2017
Incio del año 2017Incio del año 2017
Incio del año 2017
 
RIA
RIARIA
RIA
 
Közösségi média lehetőségek és kihívások vállalatok számára 2014-ben
Közösségi média lehetőségek és kihívások vállalatok számára 2014-benKözösségi média lehetőségek és kihívások vállalatok számára 2014-ben
Közösségi média lehetőségek és kihívások vállalatok számára 2014-ben
 
Graphs towers watson
Graphs towers watsonGraphs towers watson
Graphs towers watson
 
Kv
KvKv
Kv
 

Similar to Phylogenetics Analysis in R

R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the CloudDataMine Lab
 
Random Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna QuachRandom Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna QuachWithTheBest
 
Phylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-EmondPhylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-EmondRoderic Page
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Alacart Poor man's classification trees
Alacart Poor man's classification treesAlacart Poor man's classification trees
Alacart Poor man's classification treesLeonardo Auslender
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...Amir Shokri
 
MrBayes_intro_big4ws_2016-10-10
MrBayes_intro_big4ws_2016-10-10MrBayes_intro_big4ws_2016-10-10
MrBayes_intro_big4ws_2016-10-10FredrikRonquist
 
Humans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organiHumans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organiNarcisaBrandenburg70
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...sonix022
 
Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...
Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...
Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...Hilmar Lapp
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programmingNixon Mendez
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for PhyloinformaticsRutger Vos
 
Which are more closely related- Epicrates and Iguanas or Epicrates and.docx
Which are more closely related- Epicrates and Iguanas or Epicrates and.docxWhich are more closely related- Epicrates and Iguanas or Epicrates and.docx
Which are more closely related- Epicrates and Iguanas or Epicrates and.docxmaximapikvu8
 

Similar to Phylogenetics Analysis in R (20)

R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
Random Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna QuachRandom Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna Quach
 
Ml presentation
Ml presentationMl presentation
Ml presentation
 
Phylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-EmondPhylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-Emond
 
Rsplit apply combine
Rsplit apply combineRsplit apply combine
Rsplit apply combine
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Alacart Poor man's classification trees
Alacart Poor man's classification treesAlacart Poor man's classification trees
Alacart Poor man's classification trees
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...
 
MrBayes_intro_big4ws_2016-10-10
MrBayes_intro_big4ws_2016-10-10MrBayes_intro_big4ws_2016-10-10
MrBayes_intro_big4ws_2016-10-10
 
Humans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organiHumans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organi
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
 
Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...
Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...
Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for Phyloinformatics
 
Which are more closely related- Epicrates and Iguanas or Epicrates and.docx
Which are more closely related- Epicrates and Iguanas or Epicrates and.docxWhich are more closely related- Epicrates and Iguanas or Epicrates and.docx
Which are more closely related- Epicrates and Iguanas or Epicrates and.docx
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
Alignments
AlignmentsAlignments
Alignments
 

Phylogenetics Analysis in R

  • 1. Phylogenetic Analyses in R Klaus Schliep Universidad de Vigo Porto, 15–16 July 2013
  • 2. Outline Getting started Data Structures Distance based methods Maximum Parsimony Maximum likelihood
  • 4. About This slides should give a short introduction into phylogenetic reconstruction in R. It focuses mostly on the packages ape and phangorn. I have to thank Emmanuel Paradis for his work on ape. The slides are produced with literate programming using Latex, Beamer, Sweave and R. So all the code and graphics are ”real”!
  • 5. Help To install an R package it is good to have administrator rights. Download R from www.cran.r-project.org. You can easily install packges from within R: > install.packages("phangorn") > install.packages("phytools") > install.packages("pegas") > install.packages("seqLogo") > q() Then you can load the packages you need: > library("phangorn") > library("seqLogo")
  • 6. Help The R homepage provides lots of general documentation, faqs, etc. There are help pages for all the functions and most of them contain examples. > library(help="phangorn") > help.start() > ?pml > help(pml) > example(pml) > vignette("Ancestral") Copy and paste the parts of the code in the examples is a good start. If you prefer reading a book (even they are fast outdated): Paradis, E. (2012) Analysis of Phylogenetics and Evolution with R (Second Edition) New York: Springer There is a mailing list stat.ethz.ch/mailman/listinfo/r-sig-phylo where you can ask questions, after browsing through the archive.
  • 8. Data Structures Reminder: 1. Data in R are made of vector + attribute(s) (and combinations of these). Vector: a series of elements all of the same kind (a list is a vector of pointers). 2. The class is the attribute determining the action of generic functions (plot, summary, etc.) We will make heavily use of the following 3 data structures: 1. phyDat: sequences (DNA, AA, codons, user defined) in phangorn 2. DNAbin: DNA sequences (ape format) 3. phylo: phylogenetic trees
  • 9. Class phylo This class represents phylogenetic trees. The tip labels may be replicated, the node labels (which may be absent). Input: 1. read.tree: Newick files 2. read.nexus: NEXUS files If the file contains several trees, these two functions return an object of class multiPhylo which is a list of trees of class phylo. And you can write objects of class phylo using write.tree or write.nexus.
  • 10. Plotting trees ape has great plotting capabilities. > help(plot.phylo) Some simple example > tree <- rtree(10) > par(mfrow=c(2,2), mar=rep(0,4)) > plot(tree) > plot(tree, type="fan") > plot(tree, type="unrooted") > plot(tree, type="cladogram")
  • 12. Transforming trees There are many functions in ape and phangorn to transform trees (i.e. objects of class phylo) > root(tree, outgroup) > drop.tip(tree, "t1") > extract.clade(phy, 1) > bind.tree(tree1, tree2) > unroot(tree) > multi2di(tree) > di2multi(tree) > nni(tree) > rSPR(tree)
  • 13. Class phyDat The starting point for phylogenetic reconstruction are sequence alignments. ape can call clustal,tcoffee and muscle and phyloch can call mafft, prank and gblocks. More frequently you will just read in an alignment > align1 <- read.phyDat("myfile") phangorn (phyDat) and ape (DNAbin) use different formats to represent alignments, but it is easy to convert formats. > align2 <- read.dna("myfile") # ape format > align3 <- as.phyDat(align1) # phangorn format
  • 15. Distance based methods Distance methods take a distance or dissimilarity matrix as input. Ultrametric Additive upgmaa fastme.ols wpgmaa fastme.bal nj UNJa bionj a in phangorn the rest in ape. Fast methods O(n2) or O(n3) → big data sets can be analysed. Distances can be calculated for different kinds of data. In phylogenetics often used to compute starting trees for ML, MP or inside species tree methods.
  • 16. Distance based methods > set.seed(1) > bs <- bootstrap.phyDat(Laurasiatherian, FUN = function(x > class(bs) <- 'multiPhylo' > cnet = consensusNet(bs, .3) > plot(cnet, show.tip.label=FALSE, show.nodes=TRUE)
  • 19. Maximum parsimony In contrast to the distance methods (maximum) parsimony uses sequence alignments as input. The target is to minimize an optimality criterion, i.e. a score to a tree, given the data. For the parsimony method the score is the minimal number of substitutions needed to account for the data on a phylogeny. > data(Laurasiatherian) > tree = nj(dist.ml(Laurasiatherian)) > parsimony(tree, Laurasiatherian) [1] 9776 > tree2 = optim.parsimony(tree, Laurasiatherian, trace=FALSE, rearrangement="SPR") > parsimony(tree2, Laurasiatherian) [1] 9713 > tree3 = pratchet(Laurasiatherian, rearrangement="SPR", t
  • 20. Branch and bound Normally it is not possible to evaluate an optimality criterion for all trees, as there are just too many trees. > sapply(3:10, howmanytrees, FALSE) [1] 1 3 15 105 945 10395 [7] 135135 2027025 > howmanytrees(20, FALSE) [1] 2.216431e+20 For small datasets it is possible to find all most parsimonious trees using a branch and bound algorithm. For datasets with more than 10 taxa this can take a long time and depends strongly on how tree like the data are. > besttree <- bab(subset(Laurasiatherian,1:10), trace=0) > parsimony(besttree, Laurasiatherian) [1] 2695
  • 21. Ancestral reconstruction To reconstruct ancestral sequences we first load some data and reconstruct a tree: > primates = read.phyDat("primates.dna") > tree = pratchet(primates, trace=0) > tree = acctran(tree, primates) > parsimony(tree, primates) [1] 746 In parsimony analysis the edge length represent the observed number of changes. Reconstructiong ancestral states therefore defines also the edge lengths of a tree. However there can exist several equally parsimonious reconstructions or states can be ambiguous and therefore edge length can differ (e.g. ”MPR”or ”ACCTRAN”). > anc.acctran = ancestral.pars(tree, primates, "ACCTRAN") > anc.mpr = ancestral.pars(tree, primates, "MPR")
  • 22. Ancestral reconstruction > seqLogo( t(subset(anc.mpr, getRoot(tree), 1:20)[[1]]), i 1 2 3 4 5 6 7 8 910 12 14 16 18 20 Position 0 0.2 0.4 0.6 0.8 1Probability
  • 23. Ancestral reconstruction MPR > plotAnc(tree, anc.mpr, 17) > title("MPR") Mouse Bovine Lemur Tarsier Squir Monk Jpn Macaq Rhesus Mac Crab−E.Mac BarbMacaq Gibbon Orang Gorilla Chimp Human a c g t MPR
  • 24. Ancestral reconstruction ACCTRAN > plotAnc(tree, anc.acctran, 17) > title("ACCTRAN") Mouse Bovine Lemur Tarsier Squir Monk Jpn Macaq Rhesus Mac Crab−E.Mac BarbMacaq Gibbon Orang Gorilla Chimp Human a c g t ACCTRAN
  • 26. Maximum Likelihood ”[In 1961] I had visions of evolutionary tree estimation being much the same [than linkage estimation] but with the addition of the need to estimate the form of the tree itself, surely a fatal complexity: my intuition was that there would be insufficient data for the task.” —A.W.F. Edwards (2009) Phylogenetic likelihood is the probability f (x|θ, τ) of observing an alignment X given a model of (nucleotide) substitution with parameters θ and phylogenetic tree τ. L(θ, τ, x) = N i=1 f (xi |θ, τ) where N is the number of sites in the alignment. It is common to maximise the log-likelihood function (θ, τ, x) = N i=1 log (f (xi |θ, τ)) which also maximises L(θ, τ, x).
  • 27. Applications in phylogenetics Felsenstein (1981) introduced the pruning algorithm which made the computation of the likelihood feasible. Let nodes j and k have a direct ancestor h. We can estimate the conditional likelihood Lh(xh) =   xj Lj (xj )pxj ,xh (tj )   × xk Lk(xk)pxk ,xh (tk) The likelihood of the tree is evaluated by traversing the tree in postorder fashion from the tips towards the root. For unrooted trees, a root can be chosen arbitrarily as our models are time-reversible. We get the likelihood of the tree if we multiply the conditional likelihood of the root node r with the base composition π, as fh(x|θ, τ) = xr πxr Lr (xr ), These formulas can be adapted to estimate ancestral sequences.
  • 28. ML in phylogenetics 5 6 7 human chimp gorilla orangutan
  • 30. ML in phylogenetics 1|0|0|0 1|0|0|0 0|0|1|0 0|0|0|1
  • 31. ML in phylogenetics 1|0|0|0 1|0|0|0 0|0|1|0 0|0|0|1 0.000988|0.000031|0.000595|0.000744 0.027161|0.000559|0.016240|0.000559 0.923613|0.000168|0.000168|0.000169
  • 32. Finding the best topology A binary unrooted tree has 5 edges and 3 distinct topologies. Here are the general formulas for binary unrooted trees: 2n − 3 edges (2n − 5)!! = 1 × 3 × 5 × · · · × (2n − 3) topologies Rooted binary trees have 2n − 2 edges and (2n − 5)!! topologies. A function exists for this: > howmanytrees(4, rooted=FALSE) [1] 3 > howmanytrees(10, rooted=FALSE) [1] 2027025 > howmanytrees(20, rooted=FALSE) [1] 2.216431e+20
  • 33. Finding the best trees The strategy of evaluating the likelihood criterion for all trees in order to find the best tree topoology is in most cases highly impracticable. Instead, local tree rearrangements are used to search locally within the tree space. The idea behind such a heuristic is to use a starting tree and search locally for improved scores (parsimony, maximum likelihood, Least-Squares), until no further rearrangements can lead to a tree with a better score.
  • 34. Nearest neighbor interchange For any internal edge of a binary tree there exist three different ways to connect its four subtrees, one of which is the current tree. A B C D A C B D A D B C
  • 35. Modelling rate variation We assume that the substitution rate varies between different sites (intron vs. exon, codon positions, etc). Two approaches are commonly used: define different partitions model rate variation with different rate categories, with a (discrete) Γ distribution and/or proportion of variables sites
  • 36. Comparing trees and models The phylogenetic likelihood allows us to compare many different models or trees. There is often a bias vs. variance trade-off. Simple models are easy to interpret but can often be biased. MSE Variance Bias2 number of parameters
  • 37. Comparing trees and models The phylogenetic likelihood allows us to compare many different models or trees. If two models are nested - that is, one model can be described as a special case of the other – then we can directly compare their likelihoods under their ML parameter estimates for a fixed tree using a likelihood ratio test (LRT) For non nested models we can use the Akaike Information Criteria (AIC) or the Bayesian Information Criteria (BIC): AIC = − (θ, τ, x) + 2 ∗ df BIC = − (θ, τ, x) + ln(n) ∗ df where df is the number of parameters of the model and n the number of sites. Or use the Shimodaira-Hasegawa test or similar bootstrap approaches.
  • 38. Detection of molecular adaptation We look at each triplet of nucletides and assume that only one nucleotide can be replaced at a time. Then we can distinguish between nucleotide substitutions that result in the same amino acid (synonymous substitutions) or a different amino acid (non-synonymous substitutions). The ratio dN/dS of non-synonymous to synonymous substitutions can be an indication of the kind of selective pressure acting on the codon site. Under negative selection, we expect that non-synonymous substitutions will accumulate more slowly than synonymous ones. And under positive or diversifying selection, we expect more amino acid changing replacements.
  • 39. Applications with phangorn The two main functions are pml to set up the model and optim.pml for optimising parameters and the tree with ML. Example session for Jukes Cantor, GTR and GTR+Γ+I model: > data(Laurasiatherian) > tr <- nj(dist.ml(Laurasiatherian)) > m0 <- pml(tr, Laurasiatherian) > m.jc69 <- optim.pml(m0, optNni=TRUE) > m.gtr <- optim.pml(m0, optNni=TRUE, model="GTR") > m.gtr.G.I <- optim.pml(update(m.gtr, k=4), model= "GTR", optNni=TRUE, optGamma=TRUE, optInv=TRUE) By default, only the edge lengths are optimized. Currently phangorn only supports NNI tree rearrangements (equivalent to PhyML vers. 2)
  • 40. There exist several useful generic functions like update, anova or AIC for objects of class pml. > methods(class="pml") [1] anova.pml logLik.pml plot.pml print.pml [5] update.pml vcov.pml For example we can compare the different models as they are nested with likelihood ratio test: > anova(m.jc69, m.gtr, m.gtr.G.I) Likelihood Ratio Test Table Log lik. Df Df change Diff log lik. Pr(>|Chi|) 1 -54113 91 2 -50603 99 8 7020 < 2.2e-16 *** 3 -44527 101 2 12151 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
  • 41. Partition models pmlPart(global ∼ local, object, model) global local bf bf Q Q inv inv shape shape edge edge rate nni Each component can be only used once in the formula.
  • 42. Partition models There are two different ways to set up partition models. 1. Setting up partition models for different genes. > fit1 <- pml(tree, g1) > fit2 <- pml(tree, g2) > fit3 <- pml(tree, g3) > fit4 <- pml(tree, g4) > genePart <- pmlPart(Q + bf ∼ edge, list(fit1, fit2, fit3, fit4), optRooted=TRUE) > trees <- lapply(genePart$fits, function(x)x$tree) > class(trees) <- "multiPhylo" > densiTree(trees, type="phylogram", col="red") where g1, g2, g3 and g4 are objects of class phyDat.
  • 44. Partition models 2. Partitioning via a weight matrix. > woody <- phyDat(woodmouse) > tree <- nj(dist.ml(woody)) > fit <- pml(tree, woody) > w <- attr(woody, "index") > weight <- table(w, rep(c(1,2,3), length=length(w))) > codonPart <- pmlPart(edge ∼ rate, fit, model=c("JC", "JC", "GTR"), weight=weight)
  • 45. Model / tree comparison Alternatively we can use the Shimodaira-Hasegawa test to check for differences between models: > SH.test(m.jc69, m.gtr, m.gtr.G.I) Trees ln L Diff ln L p-value [1,] 1 -54112.74 9585.685 0.0000 [2,] 2 -50602.74 6075.683 0.0000 [3,] 3 -44527.06 0.000 0.5911
  • 46. Model selection Two possibilities ape: phymltest > write.phyDat(woody, "woody.phy") > out <- phymltest("woody.phy", execname = "~/phyml") phangorn: modelTest > mt <- modelTest(Laurasiatherian, model=c("JC", "F81", "HKY", "GTR")) modelTest works also for amino acid models similar to ProtTest. > mt <- modelTest(myAAData, model=c("WAG", "JTT", "LG","Dayhoff"))
  • 47. Model Selection Model df logLik AIC BIC 1 JC 91.00 -54303.67 108789.35 109341.20 2 JC+I 92.00 -50673.32 101530.63 102088.55 3 JC+G 92.00 -48684.10 97552.19 98110.11 4 JC+G+I 93.00 -48605.03 97396.06 97960.05 5 F81 94.00 -54212.64 108613.27 109183.32 6 F81+I 95.00 -50549.53 101289.06 101865.17 7 F81+G 95.00 -48500.49 97190.99 97767.10 8 F81+G+I 96.00 -48416.26 97024.51 97606.69 9 HKY 95.00 -51275.86 102741.72 103317.83 10 HKY+I 96.00 -47451.73 95095.45 95677.63 11 HKY+G 96.00 -44893.11 89978.23 90560.40 12 HKY+G+I 97.00 -44770.18 89734.36 90322.60 13 GTR 99.00 -50759.89 101717.79 102318.16 14 GTR+I 100.00 -47081.77 94363.55 94969.98 15 GTR+G 100.00 -44759.49 89718.99 90325.42 16 GTR+G+I 101.00 -44624.02 89450.04 90062.54
  • 48. Bootstrap > bs <- bootstrap.pml(m.gtr, bs=100, optNni=TRUE) > plotBS(m.gtr$tree, bs, type="phylo", bs.adj=c(.5,0)) Platypus Wallaroo Possum Bandicoot Opposum Armadillo Elephant Aardvark Tenrec Hedghog Gymnure Mole Shrew Rbat FlyingFox RyFlyFox FruitBat LongTBat Horse Donkey WhiteRhino IndianRhin Pig Alpaca Cow Sheep Hippo FinWhale BlueWhale SpermWhale Rabbit Pika Squirrel Dormouse GuineaPig Mouse Vole CaneRat Baboon Human Loris Cebus Cat Dog HarbSeal FurSeal GraySeal 10058 100 100 100 58 93 100100 100100 64 58 100 86 100 100 98 96 100100 87 100 44 79 100 88 97 64 86 73 75 100 5489 100 70 47 91 55 68 67 100 100
  • 49. Codon Models qij =    0 if i and j differ in more than one position πj for synonymous transversion πj κ for synonymous transition πj ω for non-synonymous transversion πj ωκ for non-synonymous transition or if we make abstraction of pij (frequency of base j): qij =    0 if i and j differ in more than one position 1 for synonymous transversion κ for synonymous transition ω for non-synonymous transversion ωκ for non-synonymous transition where ω is the dN/dS ratio, κ the transition transversion ratio and πj is the the equilibrium frequencies of codon j.
  • 50. Codon Models > (dat <- phyDat(as.character(yeast), "CODON")) > tree <- nj(dist.ml(yeast)) > fit <- pml(tree, dat) > ctr <- pml.control(trace=0) > fit0 <- optim.pml(fit, control = ctr) > fit1 <- optim.pml(fit0, model="codon1", control=ctr) > fit2 <- optim.pml(fit0, model="codon2", control=ctr) > fit3 <- optim.pml(fit0, model="codon3", control=ctr) Model κ ω codon0 1 1 codon1 free free codon2 1 free codon3 free 1 Additionally, the equilibrium frequencies of the codons πj can be estimated setting the parameter optBf=TRUE.
  • 51. Codon Models > anova(fit0, fit2, fit1) Likelihood Ratio Test Table Log lik. Df Df change Diff log lik. Pr(>|Chi|) 1 -1054762 13 2 -648282 14 1 812961 < 2.2e-16 *** 3 -642807 15 1 10949 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ > anova(fit0, fit3, fit1) Likelihood Ratio Test Table Log lik. Df Df change Diff log lik. Pr(>|Chi|) 1 -1054762 13 2 -708674 14 1 692176 < 2.2e-16 *** 3 -642807 15 1 131735 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘