SlideShare una empresa de Scribd logo
1 de 83
Descargar para leer sin conexión
Bastien Boussau
LBBE, CNRS, Université de Lyon
Models of gene
duplication, transfer and loss
to study genome evolution
Collaborators
Lyon collaborators:
• Adrián Arellano Davín
• Gergely Szöllősi (Budapest)
• Vincent Daubin
• Eric Tannier
• Thomas Bigot
• Magali Semeria
• Manolo Gouy
• Laurent Duret
• Nicolas Lartillot
Austin/Illinois collaborators:
• Siavash Mirarab
• Md. Shamsuzzoha Bayzid
• Tandy Warnow
RevBayes collaborators:
• Sebastian Hoehna
• Michael Landis
• Tracy Heath
• Fredrik Ronquist
• Brian Moore
• John Huelsenbeck
• …
Plan
1. Gene duplications and losses
• Mammalian genomes
2. Gene duplications, losses and transfers
• Fungi and Cyanobacteria
3. A fast approach to dealing with incomplete
lineage sorting
• Birds
4. 2 vignettes
To study genome evolution:
1. One species tree:
!
!
!
2. Thousands of gene trees:
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
To study genome evolution:
1. One species tree:
!
!
!
2. Thousands of gene trees:
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Why	
  our	
  current	
  pipeline	
  can	
  be	
  improved
Why	
  our	
  current	
  pipeline	
  can	
  be	
  improved
•Gene	
  alignments:	
  
•Error	
  prone	
  (Genes	
  are	
  
short)	
  
•Point	
  es:mates	
  
Why	
  our	
  current	
  pipeline	
  can	
  be	
  improved
•Gene	
  trees:	
  
•based	
  on	
  alignments	
  
•Point	
  es:mates	
  
•Gene	
  alignments:	
  
•Error	
  prone	
  (Genes	
  are	
  
short)	
  
•Point	
  es:mates	
  
Why	
  our	
  current	
  pipeline	
  can	
  be	
  improved
•Gene	
  trees:	
  
•based	
  on	
  alignments	
  
•Point	
  es:mates	
  
•Species	
  trees:	
  
•based	
  on	
  gene	
  trees	
  
•Gene	
  alignments:	
  
•Error	
  prone	
  (Genes	
  are	
  
short)	
  
•Point	
  es:mates	
  
Why	
  our	
  current	
  pipeline	
  can	
  be	
  improved
•Gene	
  trees:	
  
•based	
  on	
  alignments	
  
•Point	
  es:mates	
  
•Species	
  trees:	
  
•based	
  on	
  gene	
  trees	
  
•Gene	
  alignments:	
  
•Error	
  prone	
  (Genes	
  are	
  
short)	
  
•Point	
  es:mates	
  
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D DL
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGTD DL
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILSD DL
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
DL: Boussau et al., Genome Research 2013
D DL
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
(thousands	
  of	
  alignments)
PHYLDOG
All gene families
Rooted species tree,
numbers of duplications
and losses,
rooted gene trees D1
D2
D3
D4
D5
D6
L2
L1
L4
L3
L5
L6
Joint reconstruction of
the species tree,
gene trees, and
numbers of duplications and losses
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D1
D3
D2 D4
D5 D6
L1
L3
L2 L4
L5 L6
Boussau et al., Genome Research 2013
(thousands	
  of	
  alignments)
PHYLDOG
All gene families
Rooted species tree,
numbers of duplications
and losses,
rooted gene trees D1
D2
D3
D4
D5
D6
L2
L1
L4
L3
L5
L6
Joint reconstruction of
the species tree,
gene trees, and
numbers of duplications and losses
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D1
D3
D2 D4
D5 D6
L1
L3
L2 L4
L5 L6
Probabilis5c	
  models:	
  
• sequence	
  evolu1on	
  
• gene	
  family	
  evolu1on
Boussau et al., Genome Research 2013
PHYLDOG: a model of
gene duplication and loss
Assumptions!
•Genes evolve along the species tree:!
•birth events:!
•duplications (rate of duplication)!
•death events:!
•losses (rate of loss)!
•Each gene family is independent of other genes!
•Each gene copy is independent of other copies!
!
!
Study	
  of	
  mammalian	
  genome	
  evolu:on
10
• Challenging	
  but	
  well-­‐studied	
  phylogeny	
  
• 36	
  mammalian	
  genomes	
  available	
  in	
  Ensembl	
  v.	
  57	
  
• About	
  7000	
  gene	
  families	
  
• Correc:on	
  for	
  poorly	
  sequenced	
  genomes
PHYLDOG finds a good species tree
Sus scrofa
Felis catus
Ornithorhynchus anatinus
Oryctolagus cuniculus
Loxodonta africana
Mus musculus
Gorilla gorilla
Dipodomys ordii
Monodelphis domestica
Vicugna pacos
Macaca mulatta
Tupaia belangeri
Procavia capensis
Spermophilus tridecemlineatus
Pongo pygmaeus
Tursiops truncatus
Microcebus murinus
Callithrix jacchus
Equus caballus
Erinaceus europaeus
Tarsius syrichta
Choloepus hoffmanni
Ochotona princeps
Cavia porcellus
Pan troglodytes
Bos taurus
Rattus norvegicus
Homo sapiens
Otolemur garnettii
Dasypus novemcinctus
Echinops telfairi
Pteropus vampyrus
Macropus eugenii
Canis familiaris
Sorex araneus
Myotis lucifugus
Laurasiatheria
Afrotheria
Xenarthra
Marsupials
Primates
Glires
Quality	
  of	
  the	
  gene	
  trees
12
Comparison	
  between:	
  
PhyML	
  (used	
  for	
  the	
  PhylomeDB	
  and	
  Homolens	
  databases	
  )	
  
TreeBeST	
  (used	
  for	
  the	
  Ensembl-­‐Compara	
  database)	
  
PHYLDOG
Two	
  approaches:	
  
• Looking	
  at	
  ancestral	
  genome	
  sizes	
  
• Assessing	
  how	
  well	
  one	
  can	
  recover	
  ancestral	
  syntenies	
  
using	
  reconstructed	
  gene	
  trees	
  (Bérard	
  et	
  al.,	
  
Bioinforma:cs	
  2012)
Sus scrofa
Felis catus
Ornithorhynchus anatinus
Oryctolagus cuniculus
Loxodonta africana
Mus musculus
Gorilla gorilla
Dipodomys ordii
Monodelphis domestica
Vicugna pacos
Macaca mulatta
Tupaia belangeri
Procavia capensis
Spermophilus tridecemlineatus
Pongo pygmaeus
Tursiops truncatus
Microcebus murinus
Callithrix jacchus
Equus caballus
Erinaceus europaeus
Tarsius syrichta
Choloepus hoffmanni
Ochotona princeps
Cavia porcellus
Pan troglodytes
Bos taurus
Rattus norvegicus
Homo sapiens
Otolemur garnettii
Dasypus novemcinctus
Echinops telfairi
Pteropus vampyrus
Macropus eugenii
Canis familiaris
Sorex araneus
Myotis lucifugus
Laurasiatheria
Afrotheria
Xenarthra
Marsupials
Primates
Glires
010000
010000
010000
010000
010000
010000
010000
PHYLDOG
TreeBeST
PhyML
PHYLDOG: better trees for better ancestral genomes
An example gene family
0.1
Ornithorhynchus anatinus
0.3
Ornithorhynchus anatinus
Mus musculus
Mus musculus
Mus musculus
Cavia porcellus
Mus musculus
Oryctolagus cuniculus
Canis familiaris
Bos taurus
Homo sapiens
Pongo pygmaeus
Oryctolagus cuniculus
Cavia porcellus
Equus caballus
Equus caballus
Bos taurus
Callithrix jacchus
Homo sapiens
Monodelphis domestica
Spermophilus tridecemlineatus
Homo sapiens
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Mus musculus
Mus musculus
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Mus musculus
Mus musculus
Mus musculus
Cavia porcellus
Mus musculus
Oryctolagus cuniculus
Canis familiaris
Bos taurus
Homo sapiens
Pongo pygmaeus
Oryctolagus cuniculus
Cavia porcellus
Equus caballus
Equus caballus
Bos taurus
Callithrix jacchus
Homo sapiens
Monodelphis domestica
Spermophilus tridecemlineatus
Homo sapiens
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Mus musculus
Mus musculus
TreeBeST PHYLDOG
Boussau et al., Genome Research 2013
Recent improvements to PHYLDOG
• Easier installation using Cmake or a virtual machine!
• Better algorithms for gene tree inference!
• Better algorithm for starting species tree!
• Faster computations using the Phylogenetic Likelihood Library
(PLL, A. Stamatakis group)!
• Python scripts to help run the program
Plan
1. Gene duplications and losses
• Mammalian genomes
2. Gene duplications, losses and transfers
• Fungi and Cyanobacteria
3. A fast approach to dealing with incomplete
lineage sorting
• Birds
4. 2 vignettes
Species: A B C D
T
I
M
E
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
DL+T:!
Szöllősi et al. "
PNAS 2013
Species: A B C D
T
I
M
E
LGT ILS
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
Gene	
  transfers	
  and	
  the	
  quixo:c	
  pursuit	
  of	
  the	
  TOL
DooliYle	
  WF,	
  
	
  Science	
  1999
Gene	
  transfers	
  and	
  the	
  quixo:c	
  pursuit	
  of	
  the	
  TOL
DooliYle	
  WF,	
  
	
  Science	
  1999
Gene	
  transfers	
  and	
  the	
  quixo:c	
  pursuit	
  of	
  the	
  TOL
DooliYle	
  WF,	
  
	
  Science	
  1999
“The monistic concept of a single universal tree appears […]
increasingly obsolete. […][It is] no longer the most
scientifically productive position to hold[…][It] accounts for
only a minority of observations from genomes.”!
Bapteste, O’Malley, Beiko, Ereshefsky, Gogarten, Franklin-Hall,
Lapointe, Dupré, Dagan, Boucher, Martin, !
Biology Direct 2009.
exODT: a model of
gene duplication, transfer, and loss
Assumptions!
•Genes evolve along the species tree:!
•birth events:!
•duplications (rate of duplication)!
•transfers (rate of receiving a gene)!
•death events:!
•losses (rate of loss)!
•Each gene family is independent of other genes!
•Each gene copy is independent of other copies!
•Transfers can go through unsampled/extinct species!
!
!
exODT: a model of
gene duplication, transfer, and loss
Szöllősi et al., Syst. Biol. a 2013
exODT: a model of
gene duplication, transfer, and loss
Szöllősi et al., Syst. Biol. a 2013
Better gene trees, fewer transfers
Usual
approach
ALE
+DTL
RFdistancetorealtree
Szöllősi et al., Syst. Biol. b 2013
Better gene trees, fewer transfers
Usual
approach
ALE
+DTL
Transfereventsperfamily
Usual
approach
ALE
+DTL
RFdistancetorealtree
Szöllősi et al., Syst. Biol. b 2013
Application to real data:
Cyanobacteria and Fungi
Cyanobacteria!
• > 2.4 billion years old! !
• 40 species!
• 1,200 to 4,500 protein coding genes!
• 7,410 gene families!
!
Fungi (Dikarya)!
• ~ 1 billion years old!
• 28 species!
• 5,200 to 10,000 protein coding genes!
• 11,387 gene families!
!!
Both cases: !
• fixed species tree, gene trees inferred using the
Duplication, Transfer and Loss model! Szöllősi et al., under review
Application to real data:
Cyanobacteria and Fungi
Cyanobacteria!
• > 2.4 billion years old! !
• 40 species!
• 1,200 to 4,500 protein coding genes!
• 7,410 gene families!
!
Fungi (Dikarya)!
• ~ 1 billion years old!
• 28 species!
• 5,200 to 10,000 protein coding genes!
• 11,387 gene families!
!!
Both cases: !
• fixed species tree, gene trees inferred using the
Duplication, Transfer and Loss model!
Transfers are expected
Transfers should be less frequent
Szöllősi et al., under review
Cyanobacteria
Szöllősi et al., under review
Cyanobacteria
Szöllősi et al., under review
Cyanobacteria
0.18 transfer per gene
Szöllősi et al., under review
Fungi
Szöllősi et al., under review
Fungi
Szöllősi et al., under review
Fungi
0.07 transfer per gene
Szöllősi et al., under review
Comparing transfer rates
• Cyanobacteria and Fungi differ in their age:!
!
We can compare normalized numbers of events:!
T/(T+D)!
!
• The Cyanobacteria and Fungi data sets differ in their
number of species:!
!
We can perform rarefaction studies
Szöllősi et al., under review
Comparing transfer rates
Szöllősi et al., under review
Similar transfer rates in Fungi and
Cyanobacteria
Szöllősi et al., under review
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Because we can identify gene transfers, we have information for
ordering the nodes of a species tree
Bayesian species tree inference
accounting for DTL events
• STRALE:
• A Bayesian probabilistic method that can interpret thousands of
gene trees in terms of:
• speciation events
• duplication events (D)
• transfer events (T)
• loss events (L)
• A method able to estimate the DTL rates
• A method able to reconstruct the species tree
• A method able to order the nodes of the species tree
Simulation to test the species tree reconstruction
• 20 species
• 200 gene families
1 5
1
3
1 4
1 0
6
8
1 2
1 8
1 3
5
4
2
9
0
1 1
1 9
7
1 6
1 7
2
1 3
7
1 7
1 5
1
5
1 2
1 0
1 6
1 1
9
0
4
8
3
1 4
1 9
6
1 8
Simulated Inferred
Conclusion on DTL models
• The use of DTL models shows that the number of gene
transfers has so far been overestimated
• DTL models can be used to study genome evolution
and in particular rates of gene transfer
• DTL models can be used to date the nodes of a species
phylogeny
• DTL models should provide a powerful tool to infer an
accurate account of the history of life
Plan
1. Gene duplications and losses
• Mammalian genomes
2. Gene duplications, losses and transfers
• Fungi and Cyanobacteria
3. A fast approach to dealing with incomplete
lineage sorting
• Birds
4. 2 vignettes
Species: A B C D
T
I
M
E
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
DL+T:!
Szöllősi et al. "
PNAS 2013
Species: A B C D
T
I
M
E
LGT ILS
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
35
The multispecies coalescent
Rannala and Yang, Genetics 2003
• Divergence times in the species tree!
• Divergence times in the gene trees!
• Effective population sizes in the species tree
Faster alternatives to the multispecies coalescent
use fixed gene trees
E.g.: MP-EST (Liu, Yu and Edwards, 2010)!
Input: fixed gene trees!
Output: species tree with branch lengths in coalescent units!
!
Has been shown to be consistent, under one notable assumption: !
gene trees are correct.
Errors in gene trees decrease the accuracy of
estimated species trees
Mirarab, Bayzid and Warnow, Syst. Biol 2014
38
Statistical binning
Mirarab et al., Science 2014
38
Statistical binning
Mirarab et al., Science 2014
MP-EST
39
Statistical binning
Mirarab et al., Science 2014
MP-EST
39
Statistical binning
Mirarab et al., Science 2014
MP-EST
MP-EST
40
Statistical binning
improves
species tree inference
Mirarab et al., Science 2014
41
Statistical binning also improves the
estimation of the gene tree distribution
Mirarab et al., Science 2014
42
Jarvis et al., Science 2014
Statistical binning and birds
43Mirarab et al., PLoS One, accepted
Improving statistical binning: weighted statistical binning
44Mirarab et al., PLoS One, accepted
Improving statistical binning: weighted statistical binning
Practice: weighted binning and unweighted binning have about the same
accuracy !
Theory: weighted statistical binning can be shown to be consistent,
unweighted statistical binning is not.
Plan
1. Gene duplications and losses
• Mammalian genomes
2. Gene duplications, losses and transfers
• Fungi and Cyanobacteria
3. A fast approach to dealing with incomplete
lineage sorting
• Birds
4. 2 vignettes
RevBayes
• R-like language
• Model-based phylogenetics
• Many models of sequence evolution
• Models for dating
• Models for phylogeography
• Models for continuous traits
• Models for gene tree/species tree inference
• http://revbayes.net
• Sebastian Hoehna
• Michael Landis
• Tracy Heath
• Fredrik Ronquist
• Nicolas Lartillot
• Brian Moore
• John Huelsenbeck
• …
One more thing..
One more thing..
One more thing..
Conclusions
• We develop methods for gene tree and species
tree inference
• Improvement of gene trees and species trees in the
presence of:
• duplications and losses,
• transfers,
• incomplete lineage sorting
• Parallel algorithms applicable to genome-scale data
• We study the evolution of life, ancient and recent
RevBayes collaborators:
• Sebastian Hoehna
• Michael Landis
• Tracy Heath
• Fredrik Ronquist
• Brian Moore
• John Huelsenbeck
• …
Lyon collaborators:
• Adrián Arellano Davín
• Gergely Szöllősi (Budapest)
• Vincent Daubin
• Eric Tannier
• Thomas Bigot
• Magali Semeria
• Manolo Gouy
• Laurent Duret
• Nicolas Lartillot
Austin/Illinois collaborators:
• Siavash Mirarab
• Md. Shamsuzzoha Bayzid
• Tandy Warnow
Thanks!

Más contenido relacionado

La actualidad más candente

A. meiosis check your learning
A. meiosis   check your learningA. meiosis   check your learning
A. meiosis check your learning
kcangial
 
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Ellinor Michel
 

La actualidad más candente (11)

ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)
 
Talk at Institut Jean Nicod on 6 October 2010
Talk at Institut Jean Nicod on 6 October 2010Talk at Institut Jean Nicod on 6 October 2010
Talk at Institut Jean Nicod on 6 October 2010
 
A. meiosis check your learning
A. meiosis   check your learningA. meiosis   check your learning
A. meiosis check your learning
 
So many different kinds of mistakes: Or why systematic error is the 21st cent...
So many different kinds of mistakes: Or why systematic error is the 21st cent...So many different kinds of mistakes: Or why systematic error is the 21st cent...
So many different kinds of mistakes: Or why systematic error is the 21st cent...
 
D. genes and protein check your learning
D. genes and protein   check your learningD. genes and protein   check your learning
D. genes and protein check your learning
 
Improving Interoperability of Text Mining Tools with BioC
Improving Interoperability of Text Mining Tools with BioCImproving Interoperability of Text Mining Tools with BioC
Improving Interoperability of Text Mining Tools with BioC
 
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomy
 
PaulaTataruCSHL
PaulaTataruCSHLPaulaTataruCSHL
PaulaTataruCSHL
 
What are we DOIng about the missing links? Connecting taxonomic names to the ...
What are we DOIng about the missing links? Connecting taxonomic names to the ...What are we DOIng about the missing links? Connecting taxonomic names to the ...
What are we DOIng about the missing links? Connecting taxonomic names to the ...
 
Microbial Phylogenomics (EVE161) Class 4
Microbial Phylogenomics (EVE161) Class 4Microbial Phylogenomics (EVE161) Class 4
Microbial Phylogenomics (EVE161) Class 4
 

Similar a Models of gene duplication, transfer and loss to study genome evolution

Nemes and Price 2015
Nemes and Price 2015Nemes and Price 2015
Nemes and Price 2015
Simone Nemes
 
Whole genome duplication and diversification of plant genomes
Whole genome duplication and diversification of plant genomesWhole genome duplication and diversification of plant genomes
Whole genome duplication and diversification of plant genomes
SimonRB
 

Similar a Models of gene duplication, transfer and loss to study genome evolution (20)

U1 and U2 Exam Review from 28May
U1 and U2 Exam Review from 28MayU1 and U2 Exam Review from 28May
U1 and U2 Exam Review from 28May
 
When models mislead
When models misleadWhen models mislead
When models mislead
 
Nemes and Price 2015
Nemes and Price 2015Nemes and Price 2015
Nemes and Price 2015
 
So many different kinds of mistakes
So many different kinds of mistakesSo many different kinds of mistakes
So many different kinds of mistakes
 
2016 10-27 timbers
2016 10-27 timbers2016 10-27 timbers
2016 10-27 timbers
 
Dna barcoding
Dna  barcoding Dna  barcoding
Dna barcoding
 
Bioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsBioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogenetics
 
Taxonomy
TaxonomyTaxonomy
Taxonomy
 
"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15
"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15
"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15
 
The Right Answers to the Wrong Questions
The Right Answers to the Wrong QuestionsThe Right Answers to the Wrong Questions
The Right Answers to the Wrong Questions
 
Sally Adamowicz - Invertebrates Plenary
Sally Adamowicz - Invertebrates PlenarySally Adamowicz - Invertebrates Plenary
Sally Adamowicz - Invertebrates Plenary
 
K.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi PlenaryK.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi Plenary
 
Cave animals at the dawn of speleogenomics
Cave animals at the dawn of speleogenomicsCave animals at the dawn of speleogenomics
Cave animals at the dawn of speleogenomics
 
Hereditas
HereditasHereditas
Hereditas
 
Mitochondrial DNA in Taxonomy and Phylogeny
Mitochondrial DNA in Taxonomy and PhylogenyMitochondrial DNA in Taxonomy and Phylogeny
Mitochondrial DNA in Taxonomy and Phylogeny
 
FOR CO EVIDENCE OF EVOLUTION.ppt
FOR CO EVIDENCE OF EVOLUTION.pptFOR CO EVIDENCE OF EVOLUTION.ppt
FOR CO EVIDENCE OF EVOLUTION.ppt
 
Ch10 molevo
Ch10 molevoCh10 molevo
Ch10 molevo
 
Whole genome duplication and diversification of plant genomes
Whole genome duplication and diversification of plant genomesWhole genome duplication and diversification of plant genomes
Whole genome duplication and diversification of plant genomes
 
Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02
 
Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02
 

Último

SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 

Último (20)

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 

Models of gene duplication, transfer and loss to study genome evolution

  • 1. Bastien Boussau LBBE, CNRS, Université de Lyon Models of gene duplication, transfer and loss to study genome evolution
  • 2. Collaborators Lyon collaborators: • Adrián Arellano Davín • Gergely Szöllősi (Budapest) • Vincent Daubin • Eric Tannier • Thomas Bigot • Magali Semeria • Manolo Gouy • Laurent Duret • Nicolas Lartillot Austin/Illinois collaborators: • Siavash Mirarab • Md. Shamsuzzoha Bayzid • Tandy Warnow RevBayes collaborators: • Sebastian Hoehna • Michael Landis • Tracy Heath • Fredrik Ronquist • Brian Moore • John Huelsenbeck • …
  • 3. Plan 1. Gene duplications and losses • Mammalian genomes 2. Gene duplications, losses and transfers • Fungi and Cyanobacteria 3. A fast approach to dealing with incomplete lineage sorting • Birds 4. 2 vignettes
  • 4. To study genome evolution: 1. One species tree: ! ! ! 2. Thousands of gene trees: Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E
  • 5. To study genome evolution: 1. One species tree: ! ! ! 2. Thousands of gene trees: Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E
  • 6. Why  our  current  pipeline  can  be  improved
  • 7. Why  our  current  pipeline  can  be  improved •Gene  alignments:   •Error  prone  (Genes  are   short)   •Point  es:mates  
  • 8. Why  our  current  pipeline  can  be  improved •Gene  trees:   •based  on  alignments   •Point  es:mates   •Gene  alignments:   •Error  prone  (Genes  are   short)   •Point  es:mates  
  • 9. Why  our  current  pipeline  can  be  improved •Gene  trees:   •based  on  alignments   •Point  es:mates   •Species  trees:   •based  on  gene  trees   •Gene  alignments:   •Error  prone  (Genes  are   short)   •Point  es:mates  
  • 10. Why  our  current  pipeline  can  be  improved •Gene  trees:   •based  on  alignments   •Point  es:mates   •Species  trees:   •based  on  gene  trees   •Gene  alignments:   •Error  prone  (Genes  are   short)   •Point  es:mates  
  • 11. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E
  • 12. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E
  • 13. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E
  • 14. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E D
  • 15. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E D DL
  • 16. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E LGTD DL
  • 17. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E LGT ILSD DL
  • 18. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E LGT ILS DL: Boussau et al., Genome Research 2013 D DL
  • 19. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E LGT ILS DL: Boussau et al., Genome Research 2013 D DL DL+T:! Szöllősi et al. " PNAS 2013
  • 20. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E LGT ILS ILS: ! Mirarab et al. Science 2014 DL: Boussau et al., Genome Research 2013 D DL DL+T:! Szöllősi et al. " PNAS 2013
  • 21. (thousands  of  alignments) PHYLDOG All gene families Rooted species tree, numbers of duplications and losses, rooted gene trees D1 D2 D3 D4 D5 D6 L2 L1 L4 L3 L5 L6 Joint reconstruction of the species tree, gene trees, and numbers of duplications and losses Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E D1 D3 D2 D4 D5 D6 L1 L3 L2 L4 L5 L6 Boussau et al., Genome Research 2013
  • 22. (thousands  of  alignments) PHYLDOG All gene families Rooted species tree, numbers of duplications and losses, rooted gene trees D1 D2 D3 D4 D5 D6 L2 L1 L4 L3 L5 L6 Joint reconstruction of the species tree, gene trees, and numbers of duplications and losses Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E D1 D3 D2 D4 D5 D6 L1 L3 L2 L4 L5 L6 Probabilis5c  models:   • sequence  evolu1on   • gene  family  evolu1on Boussau et al., Genome Research 2013
  • 23. PHYLDOG: a model of gene duplication and loss Assumptions! •Genes evolve along the species tree:! •birth events:! •duplications (rate of duplication)! •death events:! •losses (rate of loss)! •Each gene family is independent of other genes! •Each gene copy is independent of other copies! ! !
  • 24. Study  of  mammalian  genome  evolu:on 10 • Challenging  but  well-­‐studied  phylogeny   • 36  mammalian  genomes  available  in  Ensembl  v.  57   • About  7000  gene  families   • Correc:on  for  poorly  sequenced  genomes
  • 25. PHYLDOG finds a good species tree Sus scrofa Felis catus Ornithorhynchus anatinus Oryctolagus cuniculus Loxodonta africana Mus musculus Gorilla gorilla Dipodomys ordii Monodelphis domestica Vicugna pacos Macaca mulatta Tupaia belangeri Procavia capensis Spermophilus tridecemlineatus Pongo pygmaeus Tursiops truncatus Microcebus murinus Callithrix jacchus Equus caballus Erinaceus europaeus Tarsius syrichta Choloepus hoffmanni Ochotona princeps Cavia porcellus Pan troglodytes Bos taurus Rattus norvegicus Homo sapiens Otolemur garnettii Dasypus novemcinctus Echinops telfairi Pteropus vampyrus Macropus eugenii Canis familiaris Sorex araneus Myotis lucifugus Laurasiatheria Afrotheria Xenarthra Marsupials Primates Glires
  • 26. Quality  of  the  gene  trees 12 Comparison  between:   PhyML  (used  for  the  PhylomeDB  and  Homolens  databases  )   TreeBeST  (used  for  the  Ensembl-­‐Compara  database)   PHYLDOG Two  approaches:   • Looking  at  ancestral  genome  sizes   • Assessing  how  well  one  can  recover  ancestral  syntenies   using  reconstructed  gene  trees  (Bérard  et  al.,   Bioinforma:cs  2012)
  • 27. Sus scrofa Felis catus Ornithorhynchus anatinus Oryctolagus cuniculus Loxodonta africana Mus musculus Gorilla gorilla Dipodomys ordii Monodelphis domestica Vicugna pacos Macaca mulatta Tupaia belangeri Procavia capensis Spermophilus tridecemlineatus Pongo pygmaeus Tursiops truncatus Microcebus murinus Callithrix jacchus Equus caballus Erinaceus europaeus Tarsius syrichta Choloepus hoffmanni Ochotona princeps Cavia porcellus Pan troglodytes Bos taurus Rattus norvegicus Homo sapiens Otolemur garnettii Dasypus novemcinctus Echinops telfairi Pteropus vampyrus Macropus eugenii Canis familiaris Sorex araneus Myotis lucifugus Laurasiatheria Afrotheria Xenarthra Marsupials Primates Glires 010000 010000 010000 010000 010000 010000 010000 PHYLDOG TreeBeST PhyML PHYLDOG: better trees for better ancestral genomes
  • 28. An example gene family 0.1 Ornithorhynchus anatinus 0.3 Ornithorhynchus anatinus Mus musculus Mus musculus Mus musculus Cavia porcellus Mus musculus Oryctolagus cuniculus Canis familiaris Bos taurus Homo sapiens Pongo pygmaeus Oryctolagus cuniculus Cavia porcellus Equus caballus Equus caballus Bos taurus Callithrix jacchus Homo sapiens Monodelphis domestica Spermophilus tridecemlineatus Homo sapiens Ornithorhynchus anatinus Ornithorhynchus anatinus Ornithorhynchus anatinus Ornithorhynchus anatinus Mus musculus Mus musculus Ornithorhynchus anatinus Ornithorhynchus anatinus Mus musculus Mus musculus Mus musculus Cavia porcellus Mus musculus Oryctolagus cuniculus Canis familiaris Bos taurus Homo sapiens Pongo pygmaeus Oryctolagus cuniculus Cavia porcellus Equus caballus Equus caballus Bos taurus Callithrix jacchus Homo sapiens Monodelphis domestica Spermophilus tridecemlineatus Homo sapiens Ornithorhynchus anatinus Ornithorhynchus anatinus Ornithorhynchus anatinus Ornithorhynchus anatinus Mus musculus Mus musculus TreeBeST PHYLDOG Boussau et al., Genome Research 2013
  • 29. Recent improvements to PHYLDOG • Easier installation using Cmake or a virtual machine! • Better algorithms for gene tree inference! • Better algorithm for starting species tree! • Faster computations using the Phylogenetic Likelihood Library (PLL, A. Stamatakis group)! • Python scripts to help run the program
  • 30. Plan 1. Gene duplications and losses • Mammalian genomes 2. Gene duplications, losses and transfers • Fungi and Cyanobacteria 3. A fast approach to dealing with incomplete lineage sorting • Birds 4. 2 vignettes
  • 31. Species: A B C D T I M E ILS: ! Mirarab et al. Science 2014 DL: Boussau et al., Genome Research 2013 DL+T:! Szöllősi et al. " PNAS 2013
  • 32. Species: A B C D T I M E LGT ILS ILS: ! Mirarab et al. Science 2014 DL: Boussau et al., Genome Research 2013 D DL DL+T:! Szöllősi et al. " PNAS 2013
  • 33. Gene  transfers  and  the  quixo:c  pursuit  of  the  TOL DooliYle  WF,    Science  1999
  • 34. Gene  transfers  and  the  quixo:c  pursuit  of  the  TOL DooliYle  WF,    Science  1999
  • 35. Gene  transfers  and  the  quixo:c  pursuit  of  the  TOL DooliYle  WF,    Science  1999 “The monistic concept of a single universal tree appears […] increasingly obsolete. […][It is] no longer the most scientifically productive position to hold[…][It] accounts for only a minority of observations from genomes.”! Bapteste, O’Malley, Beiko, Ereshefsky, Gogarten, Franklin-Hall, Lapointe, Dupré, Dagan, Boucher, Martin, ! Biology Direct 2009.
  • 36. exODT: a model of gene duplication, transfer, and loss Assumptions! •Genes evolve along the species tree:! •birth events:! •duplications (rate of duplication)! •transfers (rate of receiving a gene)! •death events:! •losses (rate of loss)! •Each gene family is independent of other genes! •Each gene copy is independent of other copies! •Transfers can go through unsampled/extinct species! ! !
  • 37. exODT: a model of gene duplication, transfer, and loss Szöllősi et al., Syst. Biol. a 2013
  • 38. exODT: a model of gene duplication, transfer, and loss Szöllősi et al., Syst. Biol. a 2013
  • 39. Better gene trees, fewer transfers Usual approach ALE +DTL RFdistancetorealtree Szöllősi et al., Syst. Biol. b 2013
  • 40. Better gene trees, fewer transfers Usual approach ALE +DTL Transfereventsperfamily Usual approach ALE +DTL RFdistancetorealtree Szöllősi et al., Syst. Biol. b 2013
  • 41. Application to real data: Cyanobacteria and Fungi Cyanobacteria! • > 2.4 billion years old! ! • 40 species! • 1,200 to 4,500 protein coding genes! • 7,410 gene families! ! Fungi (Dikarya)! • ~ 1 billion years old! • 28 species! • 5,200 to 10,000 protein coding genes! • 11,387 gene families! !! Both cases: ! • fixed species tree, gene trees inferred using the Duplication, Transfer and Loss model! Szöllősi et al., under review
  • 42. Application to real data: Cyanobacteria and Fungi Cyanobacteria! • > 2.4 billion years old! ! • 40 species! • 1,200 to 4,500 protein coding genes! • 7,410 gene families! ! Fungi (Dikarya)! • ~ 1 billion years old! • 28 species! • 5,200 to 10,000 protein coding genes! • 11,387 gene families! !! Both cases: ! • fixed species tree, gene trees inferred using the Duplication, Transfer and Loss model! Transfers are expected Transfers should be less frequent Szöllősi et al., under review
  • 45. Cyanobacteria 0.18 transfer per gene Szöllősi et al., under review
  • 46. Fungi Szöllősi et al., under review
  • 47. Fungi Szöllősi et al., under review
  • 48. Fungi 0.07 transfer per gene Szöllősi et al., under review
  • 49. Comparing transfer rates • Cyanobacteria and Fungi differ in their age:! ! We can compare normalized numbers of events:! T/(T+D)! ! • The Cyanobacteria and Fungi data sets differ in their number of species:! ! We can perform rarefaction studies Szöllősi et al., under review
  • 50. Comparing transfer rates Szöllősi et al., under review
  • 51. Similar transfer rates in Fungi and Cyanobacteria Szöllősi et al., under review
  • 52. Using transfers to date clades ? T I M E
  • 53. Using transfers to date clades ? T I M E
  • 54. Using transfers to date clades ? T I M E
  • 55. Using transfers to date clades ? T I M E
  • 56. Using transfers to date clades ? T I M E
  • 57. Using transfers to date clades ? T I M E
  • 58. Using transfers to date clades ? T I M E Because we can identify gene transfers, we have information for ordering the nodes of a species tree
  • 59. Bayesian species tree inference accounting for DTL events • STRALE: • A Bayesian probabilistic method that can interpret thousands of gene trees in terms of: • speciation events • duplication events (D) • transfer events (T) • loss events (L) • A method able to estimate the DTL rates • A method able to reconstruct the species tree • A method able to order the nodes of the species tree
  • 60. Simulation to test the species tree reconstruction • 20 species • 200 gene families 1 5 1 3 1 4 1 0 6 8 1 2 1 8 1 3 5 4 2 9 0 1 1 1 9 7 1 6 1 7 2 1 3 7 1 7 1 5 1 5 1 2 1 0 1 6 1 1 9 0 4 8 3 1 4 1 9 6 1 8 Simulated Inferred
  • 61. Conclusion on DTL models • The use of DTL models shows that the number of gene transfers has so far been overestimated • DTL models can be used to study genome evolution and in particular rates of gene transfer • DTL models can be used to date the nodes of a species phylogeny • DTL models should provide a powerful tool to infer an accurate account of the history of life
  • 62. Plan 1. Gene duplications and losses • Mammalian genomes 2. Gene duplications, losses and transfers • Fungi and Cyanobacteria 3. A fast approach to dealing with incomplete lineage sorting • Birds 4. 2 vignettes
  • 63. Species: A B C D T I M E ILS: ! Mirarab et al. Science 2014 DL: Boussau et al., Genome Research 2013 DL+T:! Szöllősi et al. " PNAS 2013
  • 64. Species: A B C D T I M E LGT ILS ILS: ! Mirarab et al. Science 2014 DL: Boussau et al., Genome Research 2013 D DL DL+T:! Szöllősi et al. " PNAS 2013
  • 65. 35 The multispecies coalescent Rannala and Yang, Genetics 2003 • Divergence times in the species tree! • Divergence times in the gene trees! • Effective population sizes in the species tree
  • 66. Faster alternatives to the multispecies coalescent use fixed gene trees E.g.: MP-EST (Liu, Yu and Edwards, 2010)! Input: fixed gene trees! Output: species tree with branch lengths in coalescent units! ! Has been shown to be consistent, under one notable assumption: ! gene trees are correct.
  • 67. Errors in gene trees decrease the accuracy of estimated species trees Mirarab, Bayzid and Warnow, Syst. Biol 2014
  • 69. 38 Statistical binning Mirarab et al., Science 2014 MP-EST
  • 70. 39 Statistical binning Mirarab et al., Science 2014 MP-EST
  • 71. 39 Statistical binning Mirarab et al., Science 2014 MP-EST MP-EST
  • 72. 40 Statistical binning improves species tree inference Mirarab et al., Science 2014
  • 73. 41 Statistical binning also improves the estimation of the gene tree distribution Mirarab et al., Science 2014
  • 74. 42 Jarvis et al., Science 2014 Statistical binning and birds
  • 75. 43Mirarab et al., PLoS One, accepted Improving statistical binning: weighted statistical binning
  • 76. 44Mirarab et al., PLoS One, accepted Improving statistical binning: weighted statistical binning Practice: weighted binning and unweighted binning have about the same accuracy ! Theory: weighted statistical binning can be shown to be consistent, unweighted statistical binning is not.
  • 77. Plan 1. Gene duplications and losses • Mammalian genomes 2. Gene duplications, losses and transfers • Fungi and Cyanobacteria 3. A fast approach to dealing with incomplete lineage sorting • Birds 4. 2 vignettes
  • 78. RevBayes • R-like language • Model-based phylogenetics • Many models of sequence evolution • Models for dating • Models for phylogeography • Models for continuous traits • Models for gene tree/species tree inference • http://revbayes.net • Sebastian Hoehna • Michael Landis • Tracy Heath • Fredrik Ronquist • Nicolas Lartillot • Brian Moore • John Huelsenbeck • …
  • 82. Conclusions • We develop methods for gene tree and species tree inference • Improvement of gene trees and species trees in the presence of: • duplications and losses, • transfers, • incomplete lineage sorting • Parallel algorithms applicable to genome-scale data • We study the evolution of life, ancient and recent
  • 83. RevBayes collaborators: • Sebastian Hoehna • Michael Landis • Tracy Heath • Fredrik Ronquist • Brian Moore • John Huelsenbeck • … Lyon collaborators: • Adrián Arellano Davín • Gergely Szöllősi (Budapest) • Vincent Daubin • Eric Tannier • Thomas Bigot • Magali Semeria • Manolo Gouy • Laurent Duret • Nicolas Lartillot Austin/Illinois collaborators: • Siavash Mirarab • Md. Shamsuzzoha Bayzid • Tandy Warnow Thanks!