1. A Phylogeny-Driven Genomic
Encyclopedia of Bacteria and Archaea
Jonathan A. Eisen
Talk for ASBMB
April 25, 2010
Sunday, April 25, 2010
2. Eisen Lab - Phylogenomics of Novelty
Origin of New Genome
Functions and Dynamics
Processes
•Evolvability
•New genes •Repair and
•Changes in old genes recombination processes
•Changes in pathways •Intragenomic variation
Species Evolution
•Phylogenetic history
•Vertical vs. horizontal descent
•Needed to track gain/loss of
processes, infer convergence
Sunday, April 25, 2010
7. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Sunday, April 25, 2010
8. The Tree is not Happy
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Sunday, April 25, 2010
9. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Sunday, April 25, 2010
10. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Sunday, April 25, 2010
11. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Sunday, April 25, 2010
12. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Same trend in
Dictyoglomus
Aquificae
Thermudesulfobacteria
Archaea
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Sunday, April 25, 2010
13. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Same trend in
Dictyoglomus
Aquificae
Thermudesulfobacteria
Eukaryotes
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Sunday, April 25, 2010
14. Filling in the Genomic Phylogenetic Gaps
• Common approach within some eukaryotic
groups
• Many small projects funded to fill in some
bacterial or archaeal gaps
• Phylogenetic gaps in bacterial and archaeal
projects commonly lamented in literature
Sunday, April 25, 2010
15. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of
OP8
Project Nitrospira
Bacteroides bacteria
Chlorobi
• A genome Fibrobacteres
Marine GroupA • Genome
WS3
from each of Gemmimonas sequences are
Firmicutes
eight phyla Fusobacteria
mostly from
Actinobacteria
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are only
Planctomycetes
Spriochaetes sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Solution I:
Dictyoglomus
Aquificae sequence more
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1 phyla
OP11
Sunday, April 25, 2010
16. The Tree of Life is Still Angry
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Sunday, April 25, 2010
19. Proteobacteria
TM6
OS-K
• At least 100 phyla of
Acidobacteria
Termite Group
OP8
bacteria
Nitrospira
Bacteroides
Chlorobi
• Genome sequences are
Fibrobacteres
Marine GroupA mostly from three phyla
WS3
Gemmimonas
Firmicutes • Most phyla with cultured
Fusobacteria
Actinobacteria species are sparsely
OP9
Cyanobacteria
Synergistes
sampled
Deferribacteres
Chrysiogenetes
NKB19 • Lineages with no cultured
Verrucomicrobia
Chlamydia
OP3
taxa even more poorly
Planctomycetes
Spriochaetes sampled
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
• Solution - use tree to really
TM7
Deinococcus-Thermus fill gaps
Dictyoglomus
Aquificae Well sampled phyla
Thermudesulfobacteria
Thermotogae
OP1
OP11
Sunday, April 25, 2010
20. Why Increase Phylogenetic Coverage?
• Gene discovery
• Annotation, functional prediction
• Metagenomic analysis
• Mechanisms of diversification
• Species phylogeny and classification
Sunday, April 25, 2010
22. GEBA Pilot Project Overview
• Identify major branches in rRNA tree for
which no genomes are available
• Identify a cultured representative for each
group
• Grow > 200 of these and prep. DNA
• Sequence and finish 100
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
• Paper published in Nature Dec. 2009.
Sunday, April 25, 2010
23. B:
Ac
tin
ob
ac
te
B: ria # of Genomes
Am (H
Sunday, April 25, 2010
in igh
10
15
20
25
30
35
0
5
an G
a C
B: B: er )
Ba Aq ob
ct uif ia
e i
B: B: ro cae
D Ch ide
B: efe lo te
r s
D rri ofl
ef ba e
B: e c xi
B: De B rrib ter
Ep lta : D act es
si Pr ei er
lo o n es
n te oc
Pr ob oc
ot a ci
B: e ct
G B: oba eri
am B F ct a
: ir e
B: m Fu mi ria
a
G P so cut
em ro ba e
t c s
B: ma eo te
ba ri
H tim c a
a t
B: loa ona eri
a
B: Pl nae de
an r te
Th c o s
Phyla
er B: to bia
m S m le
y s
B: od piro ce
es c te
T u h
B: he lfo ae s
rm b te
GEBA Pilot Target List
Th o a s
er de cte
m s ri
u a
A: ove lfo
H n bi
A: alo abu a
A: A b la
M rc ac e
A: et ha te
M han eo ria
et g
ha ob lob
ac i
A: no te
m r
A: The icr ia
Th rm obi
er oc a
m oc
op ci
ro
te
i
24. GEBA and Openness
• All data being released as quickly
as possible with no restrictions to
IMG-GEBA; Genbank, etc
• Data also available in Biotorrents
(http://biotorrents.net)
• Individual genome reports being
published in new Open Access
journal “Standards in Genome
Sciences (SIGS)”
• Main GEBA paper in Nature
freely available and published
using Creative Commons License
Sunday, April 25, 2010
25. Assess Benefits of GEBA
• All genomes have some value
• But what, if any, is the benefit of tree-
guided sequencing over other selection
methods
• Lessons for other large scale microbial
genome projects?
Sunday, April 25, 2010
26. GEBA Lesson 1
rRNA Tree is Useful for Identifying
Phylogenetically Novel Genomes
rRNA Tree topology is not perfect;
Genome-based trees better
Sunday, April 25, 2010
27. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Sunday, April 25, 2010
29. Network of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Sunday, April 25, 2010
30. Wh
Whole genome tree
built using
AMPHORA
by Martin Wu and
Dongying Wu
Sunday, April 25, 2010
31. PD of rRNA, Genome Trees Similar
From Wu et al. 2009
Sunday, April 25, 2010
34. Predicting Function
• Key step in genome projects
• More accurate predictions help guide
experimental and computational analyses
• Many diverse approaches
• Comparative and evolutionary analysis
greatly improves most predictions
Sunday, April 25, 2010
35. Most/All Functional Prediction Improves
w/ Better Phylogenetic Sampling
• Better definition of protein family sequence
“patterns”
• Conversion of hypothetical into conserved
hypotheticals
• Greatly improves “comparative” and
“evolutionary” based predictions
• Linking distantly related members of protein
families
• Improved non-homology prediction
Sunday, April 25, 2010
41. Shotgun Sequencing Allows Use of
Alternative Anchors (e.g., RecA)
Venter et al., 2004
Sunday, April 25, 2010
42. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
t eo
Be b ac
ta
pr t er
ot
e ia
G
Sunday, April 25, 2010
am ob
ac
m t er
ap ia
ro
Ep te
si ob
lo ac
np t er
ro ia
De t eo
lta b ac
pr te
ot ria
eo
b
C ac
ya t er
n ob ia
ac
t er
Fi ia
rm
ic
u te
Ac s
tin
ob
ac
ter
C ia
hl
or
ob
i
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
hl
or
of
le
Sp xi
iro
c ha
et
Fu es
so
De ba
in ct
er
oc ia
oc
cu
s-
Eu The
ry r
ar mu
ch s
ae
C ot
re a
na
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
EFG
EFTu
rRNA
RecA
RpoB
HSP70
43. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
t eo
Be b ac
ta
pr t er
ot
e ia
G
Sunday, April 25, 2010
am ob
ac
m t er
ap ia
ro
Ep te
si ob
lo ac
np t er
ro ia
De t eo
lta b ac
pr te
ot ria
eo
b
C ac
ya t er
n ob ia
ac
t er
Fi ia
rm
ic
u te
Ac s
tin
ob
ac
ter
C ia
hl
or
ob
i
without good
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
Cannot be done
hl
or
of
le
Sp xi
iro
c ha
et
Fu es
so
De ba
in ct
er
oc ia
sampling of genomes
oc
cu
s-
Eu The
ry r
ar mu
ch s
ae
C ot
re a
na
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
EFG
EFTu
rRNA
RecA
RpoB
HSP70
44. Binning challenge
A T
B U
C V
D W
E X
F Y
G Z
Sunday, April 25, 2010
45. Binning challenge
A T
B U
C V
D W
E X
F Y
G Best binning method: reference genomes Z
Sunday, April 25, 2010
46. Binning challenge
A T
B U
C V
D W
E X
F Y
G No reference genome? What do you do? Z
Sunday, April 25, 2010
47. Binning challenge
A T
B U
C V
D W
E X
F Y
G No reference genome? What do you do? Z
Phylogeny ....
Sunday, April 25, 2010
48. Al
ph
ap
ro
Be te
ta o ba
G p
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
am ro ct
te er
m o ia
Sunday, April 25, 2010
ap ba
ro ct
D t er
el eo ia
ta ba
Ep pr ct
ot er
U si
lo eo ia
nc ba
la np
ct
ss ro er
ifi te ia
ed ob
Pr ac
ot te
eo ria
ba
Cy ct
an er
ob ia
ac
Ch te
ria
la
m
Ac yd
id ia
ob e
Ba ac
te
ct ria
er
Ac oi
de
tin te
ob s
ac
te
ria
Aq
Pl ui
an fic
ct
om ae
yc
Sp et
AMPHORA - each read on its own tree
iro es
ch
ae
Fi te
rm s
ic
ut
Ch es
lo
ro
U fle
nc xi
la Ch
ss lo
ifi ro
ed bi
Ba
ct
er
ia
Phylogenetic Binning Using AMPHORA
frr
tsf
pgk
rplL
rplF
rplP
rplT
rplE
infC
rpsI
rplS
rplA
rplB
rplK
rplC
rpsJ
rplN
rplD
rplM
rpsE
rpsS
rpsB
rpsK
rpsC
rpoB
rpsM
pyrG
nusA
dnaG
rpmA
smpB
49. Phylogenetic Binning Using AMPHORA
dnaG
0.7
frr
infC
0.6 nusA
pgk
pyrG
0.5
0.4
Cannot be done rplA
rplB
rplC
rplD
0.3 without good rplE
rplF
rplK
rplL
0.2
0.1
sampling of genomes rplM
rplN
rplP
rplS
rplT
rpmA
0 rpoB
rpsB
es
ia
s
es
s
ria
ia
ia
bi
ia
ia
om ae
e
ia
ria
ria
ria
xi
te
te
ia
er
er
er
er
er
er
fle
ro
et
ut
rpsC
fic
te
te
te
te
yd
de
ae
ct
ct
ct
ct
ct
ct
lo
yc
ro
ic
ac
ac
ac
ac
ui
m
ch
oi
ba
ba
Ch
ba
ba
ba
Ba
rm
rpsE
lo
Aq
ob
ob
ob
ob
er
la
iro
eo
Ch
o
eo
o
eo
Fi
ed
Ch
ct
an
te
te
te
id
tin
ct
rpsI
Sp
ot
ot
t
Ba
Ac
ro
ro
ro
ro
ifi
an
Cy
Ac
Pr
pr
ss
ap
p
ap
np
rpsJ
Pl
ta
ta
ed
la
ph
m
lo
el
Be
nc
rpsK
si
ifi
am
Al
D
Ep
U
ss
rpsM
G
la
nc
rpsS
U
smpB
tsf
AMPHORA - each read on its own tree
Sunday, April 25, 2010
50. GEBA Phylogenomic Lesson 5
We have still only scratched the
surface of microbial diversity
Sunday, April 25, 2010
51. Protein Family Rarefaction Curves
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
Sunday, April 25, 2010
59. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Sunday, April 25, 2010
60. Phylogenetic Diversity:
Sequenced Bacteria & Archaea
From Wu et al. 2009. http://www.nature.com/nature/journal/v462/n7276/full/nature08656.html
Sunday, April 25, 2010
61. Phylogenetic Diversity with GEBA
From Wu et al. 2009. http://www.nature.com/nature/journal/v462/n7276/full/nature08656.html
Sunday, April 25, 2010
63. Phylogenetic Diversity: All
From Wu et al. 2009. http://www.nature.com/nature/journal/v462/n7276/full/nature08656.html
Sunday, April 25, 2010
64. Proteobacteria
TM6
OS-K
• At least 40 phyla of
Acidobacteria
Termite Group
OP8
bacteria
Nitrospira
Bacteroides
Chlorobi
• Genome sequences are
Fibrobacteres
Marine GroupA mostly from three phyla
WS3
Gemmimonas
Firmicutes • Most phyla with cultured
Fusobacteria
Actinobacteria species are sparsely
OP9
Cyanobacteria
Synergistes
sampled
Deferribacteres
Chrysiogenetes
NKB19 • Lineages with no cultured
Verrucomicrobia
Chlamydia
OP3
taxa even more poorly
Planctomycetes
Spriochaetes sampled
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae Well sampled phyla
Thermudesulfobacteria
Thermotogae Poorly sampled
OP1
OP11
No cultured taxa
Sunday, April 25, 2010
65. Uncultured Lineages:
Technical Approaches
• Get into culture
• Enrichment cultures
• If abundant in low diversity ecosystems
• Flow sorting
• Microbeads
• Microfluidic sorting
• Single cell amplification
Sunday, April 25, 2010
66. GEBA Phylogenomic Lesson 6
Need Experiments from Across the
Tree of Life too
Sunday, April 25, 2010
67. As of 2002 Proteobacteria
TM6
OS-K • At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Sunday, April 25, 2010
68. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Experimental
WS3
Gemmimonas
Firmicutes
studies are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Sunday, April 25, 2010
69. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Experimental
WS3
Gemmimonas
Firmicutes
studies are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some studies
Verrucomicrobia
Chlamydia
OP3
in other phyla
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Sunday, April 25, 2010