The document discusses the known knowns from the past GEBA project. It notes that as of 2002, genome sequences were mostly from three bacterial phyla, while at least 40 phyla of bacteria were known to exist based on rRNA studies. Some other phyla had only sparse sampling, and the same trend occurred in Archaea and Eukaryotes. More genome sequences were needed from the underrepresented phyla to gain a more comprehensive view of microbial diversity.
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
Jonathan Eisen talk at #ievobio 2010
1. Phylogenomics of microbes:
the dark matter of biology
Jonathan A. Eisen
UC Davis
Talk for iEVOBIO
June 29, 2010
Tuesday, June 29, 2010
2. Eisen Lab - Phylogenomics of Novelty
Origin of New Genome
Functions and Dynamics
Processes
•Evolvability
•New genes •Repair and recombination
•Changes in old genes processes
•Changes in pathways •Intragenomic variation
Species Evolution
•Phylogenetic history
•Vertical vs. horizontal descent
•Needed to track gain/loss of
processes, infer convergence
Tuesday, June 29, 2010
6. An homage to Donald Rumsfeld
• There are known knowns. These are things we
know that we know.
• There are known unknowns. That is to say,
there are things that we know we don't know.
• But there are also unknown unknowns. There
are things we don't know we don't know.
Tuesday, June 29, 2010
7. Outline
• Known knowns (background)
–rRNA Tree of Life
–Genomics
–rRNA PCR
–Metagenomics
• Known unknowns
–GEBA project - past
–GEBA project - present
–GEBA project - future
• Unknown unknowns?
Tuesday, June 29, 2010
10. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, June 29, 2010
11. The Tree ofEukaryotes
Life: Three Main
Domains
The Tree of Life
Bacteria
Archaea
Unrooted Tree of Life from Barton et al. Evolution
Tuesday, June 29, 2010
12. Known Knowns 2:
Genomics and Phylogenomics
Tuesday, June 29, 2010
14. Microbial genomes
From http://genomesonline.org
Tuesday, June 29, 2010
15. Genome Sequences Have
Revolutionized Microbiology
• Predictions of metabolic processes
• Better vaccine and drug design
• New insights into mechanisms of evolution
• Genomes serve as template for functional
studies
• New enzymes and materials for engineering
and synthetic biology
Tuesday, June 29, 2010
21. Great Plate Count Anomaly
Culturing Microscope
Count Count
Tuesday, June 29, 2010
22. Great Plate Count Anomaly
Culturing Microscope
Count <<<< Count
Tuesday, June 29, 2010
23. Great Plate Count Anomaly
DNA
Culturing Microscope
Count <<<< Count
Tuesday, June 29, 2010
24. PCR Revolution
Extract DNA
PCR w/ Universal rDNA Primers
Sequence
Align and compare
to other rDNAs
Phylogenetic OTUs Ecology
classification
Tuesday, June 29, 2010
25. Uses of rDNA PCR
Bohannan and Hughes
2003
Hugenholtz 2002
Tuesday, June 29, 2010
26. rRNA challenges
• Massive amounts of data from next-gen
• Need for full automation but
–Non overlapping
–Alignments not always straightforward
–BLAST insufficient
–Phylogenetic methods that have been automated
still need work
• Tree of everything might be useful
Tuesday, June 29, 2010
35. Shotgun Sequencing Allows Use of
Alternative Anchors (e.g., RecA)
Venter et al., 2004
Tuesday, June 29, 2010
36. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
t eo
ba
Be ct
t er
Tuesday, June 29, 2010
ap ia
ro
t eo
G ba
ct
am er
m ia
ap
ro
t eo
Ep ba
si ct
lo er
np ia
ro
t eo
De ba
lta ct
er
pr ia
ot
eo
ba
ct
C er
ia
ya
no
ba
ct
er
ia
Fi
rm
ic
ut
es
Ac
tin
ob
ac
te
ria
C
hl
or
ob
i
Major Phylogenetic Group
C
FB
Sargasso Phylotypes
C
hl
or
of
le
xi
Sp
iro
ch
ae
te
s
Fu
so
ba
ct
De er
in ia
oc
oc
cu
s-
Th
er
Eu m
ry us
ar
ch
ae
ot
C a
re
n ar
ch
ae
ot
a
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
EFG
EFTu
rRNA
RecA
RpoB
HSP70
37. Functional Inference from
Metagenomics
• Can work well for individual genes
• Predicting “community” function is
challenging because treating community as a
bag of genes does not work well
• Better to “compartmentalize” data ...
Tuesday, June 29, 2010
38. Binning challenge
A T
B U
C V
D W
E X
F Y
G Z
Tuesday, June 29, 2010
39. Binning challenge
A T
B U
C V
D W
E X
F Y
G Best binning method: reference genomes Z
Tuesday, June 29, 2010
41. Binning challenge
A T
B U
C V
D W
E X
F Y
G No reference genome? What do you do? Z
Tuesday, June 29, 2010
42. Binning challenge
A T
B U
C V
D W
E X
F Y
G No reference genome? What do you do? Z
Assembly? Composition? Get more references?
Tuesday, June 29, 2010
43. Binning challenge
A T
B U
C V
D W
E X
F Y
G No reference genome? What do you do? Z
Phylogeny ....
Tuesday, June 29, 2010
44. Metagenomic challenges
• Massive amounts of data from next-gen
• Need for full automation but
–Data fragmentary
–BLAST insufficient
–Automation of phylogenetic methods a bit better for
protein coding genes b/c alignments better
–Reference databases incomplete
Tuesday, June 29, 2010
46. Microbial genomes
From http://genomesonline.org
Tuesday, June 29, 2010
47. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Tuesday, June 29, 2010
48. 2002
Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Tuesday, June 29, 2010
49. 2002
Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia
phyla are only
OP3
Planctomycetes
Spriochaetes
sparsely
Coprothmermobacter
OP10
sampled
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Tuesday, June 29, 2010
50. 2002
Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia
phyla are only
OP3
Planctomycetes
Spriochaetes
sparsely
Coprothmermobacter
OP10
sampled
Thermomicrobia
Chloroflexi
TM7
• Same trend in
Deinococcus-Thermus
Dictyoglomus
Aquificae
Archaea
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Tuesday, June 29, 2010
51. 2002
Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia
phyla are only
OP3
Planctomycetes
Spriochaetes
sparsely
Coprothmermobacter
OP10
sampled
Thermomicrobia
Chloroflexi
TM7
• Same trend in
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eukaryotes
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Tuesday, June 29, 2010
52. The Tree is not Happy
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, June 29, 2010
53. Proteobacteria
• NSF-funded TM6 • At least 40 phyla
OS-K
Tree of Life Acidobacteria
Termite Group of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
Chlorobi
• A genome Fibrobacteres
Marine GroupA
sequences are
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
Synergistes
• Some other
Deferribacteres
Chrysiogenetes phyla are only
NKB19
Verrucomicrobia
Chlamydia
sparsely sampled
OP3
Planctomycetes
Spriochaetes
• Solution I:
Coprothmermobacter
OP10 sequence more
Thermomicrobia
Chloroflexi
TM7
phyla
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Tuesday, June 29, 2010
55. The Tree of Life is Still Angry
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, June 29, 2010
56. Proteobacteria
TM6
OS-K
• At least 100 phyla of bacteria
Acidobacteria
Termite Group
OP8
• Genome sequences are mostly
Nitrospira
Bacteroides from three phyla
Chlorobi
Fibrobacteres
Marine GroupA • Most phyla with cultured
WS3
Gemmimonas
Firmicutes
species are sparsely sampled
Fusobacteria
Actinobacteria • Lineages with no cultured
OP9
Cyanobacteria
Synergistes
taxa even more poorly
Deferribacteres
Chrysiogenetes
NKB19
sampled
Verrucomicrobia
Chlamydia
OP3
• Solution - use tree to really
Planctomycetes
Spriochaetes fill gaps
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Well sampled phyla
Thermudesulfobacteria
Thermotogae
OP1
OP11
Tuesday, June 29, 2010
58. GEBA Pilot Project Overview
• Identify major branches in rRNA tree for
which no genomes are available
• Identify branches with a cultured
representative in DSMZ
• Grow > 200 of these and prep. DNA
• Sequence and finish 100 (covering breadth of
bacterial/archaea diversity)
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
Tuesday, June 29, 2010
59. GEBA Pilot Project: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen,
Eddy Rubin, Jim Bristow)
• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat
Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer,
Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova,
Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, Eddy Rubin, Jim Bristow)
Tuesday, June 29, 2010
60. GEBA and Openness
• All data released as quickly as
possible w/ no restrictions to
IMG-GEBA; Genbank, etc
• Data also available in Biotorrents
(http://biotorrents.net)
• Individual genome reports
published in OA “Standards in
Genome Sciences (SIGS)”
• 1st GEBA paper in Nature freely
available and published using
Creative Commons License
Tuesday, June 29, 2010
62. GEBA Lesson 1
rRNA Tree is Useful for Identifying
Phylogenetically Novel Genomes
Tuesday, June 29, 2010
63. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, June 29, 2010
64. Network of Life
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, June 29, 2010
65. “Whole Genome” Concatenation
Tree w/ AMPHORA
See Wu and Eisen, Genome Biology 2008 9: R151
http://bobcat.genomecenter.ucdavis.edu/AMPHORA/
Tuesday, June 29, 2010
67. PD of rRNA, Genome Trees Similar
From Wu et al. 2009 Nature 462, 1056-1060
Tuesday, June 29, 2010
68. GEBA Lesson 2
Phylogeny-driven genome selection
helps discover new genetic diversity
Tuesday, June 29, 2010
69. Network of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, June 29, 2010
70. Protein Family Rarefaction Curves
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
Tuesday, June 29, 2010
78. Predicting Function
• Key step in genome projects
• More accurate predictions help guide
experimental and computational analyses
• Many diverse approaches
• Comparative and evolutionary analysis greatly
improves most predictions
Tuesday, June 29, 2010
79. Phylogeny vs. Blast
Many methods focus on “top EXAMPLE A METHOD EXAMPLE B
blast hits” 2A CHOOSE GENE(S) OF INTEREST 5
1 3 4
3A 2
2B 5
1A 2A 1B 3B IDENTIFY HOMOLOGS 6
ALIGN SEQUENCES
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
CALCULATE GENE TREE
Duplication?
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
OVERLAY KNOWN
FUNCTIONS ONTO TREE
But much better to build 1A 2A 3A 1B
Duplication?
2B 3B 1 2 3 4 5 6
phylogenetic trees of genes and INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
compare to relatives
Ambiguous
Duplication?
Species 1 Species 2 Species 3
1A 1B 2A 2B 3A 3B 1 2 3 4 5 6
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Allows better integration of Duplication
evolutionary history (e.g.,
orthologs and paralogs) Based on Eisen,
1998 Genome
Res 8: 163-167.
Tuesday, June 29, 2010
80. Wu et al. 2005 PLoS Genetics 1: e65.
Tuesday, June 29, 2010
81. Most/All Functional Prediction Improves
w/ Better Phylogenetic Sampling
• Conversion of hypothetical into
conserved hypotheticals
• Improved phylogenomics
• Linking distantly related members of
protein families
• Improved non-homology prediction
Tuesday, June 29, 2010
83. GEBA Future 1
How much further should we go?
Tuesday, June 29, 2010
84. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, June 29, 2010
85. Phylogenetic Diversity:
Sequenced Bacteria & Archaea
From Wu et al. 2009
Tuesday, June 29, 2010
89. Proteobacteria
TM6
OS-K
• At least 40 phyla of bacteria
Acidobacteria
Termite Group
OP8
• Genome sequences are mostly
Nitrospira
Bacteroides from three phyla
Chlorobi
Fibrobacteres
Marine GroupA • Most phyla with cultured
WS3
Gemmimonas
Firmicutes
species are sparsely sampled
Fusobacteria
Actinobacteria • Lineages with no cultured
OP9
Cyanobacteria
Synergistes
taxa even more poorly
Deferribacteres
Chrysiogenetes
NKB19
sampled
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Well sampled phyla
Thermudesulfobacteria
Thermotogae Poorly sampled
OP1
OP11 No cultured taxa
Tuesday, June 29, 2010
90. Proteobacteria
TM6
OS-K
Acidobacteria
Termite Group
• At least 40 phyla of bacteria
OP8
Nitrospira • Genome sequences are mostly
Bacteroides
Chlorobi
Fibrobacteres
from three phyla
Marine GroupA
WS3 • Most phyla with cultured
Gemmimonas
Firmicutes species are sparsely sampled
Fusobacteria
Actinobacteria
OP9
• Lineages with no cultured taxa
Cyanobacteria
Synergistes even more poorly sampled
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Well sampled phyla
Thermudesulfobacteria
Thermotogae Poorly sampled
OP1
OP11 No cultured taxa
Tuesday, June 29, 2010
91. Uncultured Lineages:
Technical Approaches
• Get into culture
• Enrichment cultures
• If abundant in low diversity ecosystems
• Flow sorting
• Microbeads
• Microfluidic sorting
• Single cell amplification
Tuesday, June 29, 2010
92. GEBA Future 2
How many gene families are there?
Tuesday, June 29, 2010
96. Gene Families vs PD
PD vs. Gene Families (per genome)
0.4
0.3
PD/Genome
0.2
0.1
0
0 275 550 825 1100
Gene families / genome
Tuesday, June 29, 2010
97. How many protein families?
GEBA Genomes
PD/Genome
~0.1
PFAMs/Genome
Text ~1000
PFAMs/PD
~10000
Total PFAMS
~10,000,000
From Wu et al. 2009
Tuesday, June 29, 2010
98. Caveats (of many)
• Novel protein families per genome likely taxon
specific
• Parameters other than PD clearly important
• Does not include viruses, eukaryotes
Tuesday, June 29, 2010
99. GEBA Future 3
Need to better leverage improved
phylogenetic sampling
Tuesday, June 29, 2010
100. Example 1: Protein Family Space
• Much less biased sampling of protein family
space now available
• Need to rebuild / reassess many protein family
databases (e.g., HMMs)
• Structural space
Tuesday, June 29, 2010
102. As of 2002 Proteobacteria
TM6
OS-K • At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Tuesday, June 29, 2010
103. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Experimental
WS3
Gemmimonas studies are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Tuesday, June 29, 2010
104. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Experimental
WS3
Gemmimonas studies are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some studies
NKB19
Verrucomicrobia
Chlamydia
in other phyla
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Tuesday, June 29, 2010
105. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia
phyla are only
OP3
Planctomycetes
Spriochaetes
sparsely
Coprothmermobacter
OP10
sampled
Thermomicrobia
Chloroflexi
TM7
• Same trend in
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eukaryotes
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Tuesday, June 29, 2010
106. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia
phyla are only
OP3
Planctomycetes
Spriochaetes
sparsely
Coprothmermobacter
OP10
sampled
Thermomicrobia
Chloroflexi
TM7
• Same trend in
Deinococcus-Thermus
Dictyoglomus
Aquificae
Viruses
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Tuesday, June 29, 2010
107. Proteobacteria
TM6
OS-K
Need
Acidobacteria
Termite Group
OP8
experimental
Nitrospira
Bacteroides
Chlorobi
studies from
Fibrobacteres
Marine GroupA
WS3
across the tree
Gemmimonas
Firmicutes
too
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes 0.1
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae Tree based on
Thermudesulfobacteria
Thermotogae
Hugenholtz (2002)
OP1 with some
OP11 modifications.
Tuesday, June 29, 2010
108. Example 3: Improving the tree
• To make best use of GEBA data we need a
better tree
Tuesday, June 29, 2010
109. Wh
Concatenated
alignment “whole
genome tree” built
using AMPHORA
Tuesday, June 29, 2010
110. Why Wh
does the
tree
matter?
Whole genome tree
built using
AMPHORA
by Martin Wu and
Dongying Wu
Tuesday, June 29, 2010
114. Many Alternatives to Concatenation
• Gene presence/absence
• Supertrees / consensus methods
• Separate phylogeny of genes and then
integration of results (e.g., networks)
• Models that incorporate gain/loss as well as
gene phylogeny
Tuesday, June 29, 2010
116. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
t eo
ba
Be ct
t er
Tuesday, June 29, 2010
ap ia
ro
t eo
G ba
ct
am er
m ia
ap
ro
t eo
Ep ba
si ct
lo er
np ia
ro
t eo
De ba
lta ct
er
pr ia
ot
eo
ba
ct
C er
ia
ya
no
ba
ct
er
ia
Fi
rm
ic
ut
es
Ac
tin
ob
ac
te
ria
C
hl
or
ob
i
Major Phylogenetic Group
C
FB
Sargasso Phylotypes
C
hl
or
of
le
xi
Sp
iro
ch
ae
te
s
Fu
so
ba
ct
De er
in ia
oc
oc
cu
s-
Th
er
Eu m
ry us
ar
ch
Phylogeny for Typing and Binning
ae
ot
C a
re
n ar
ch
ae
ot
a
Venter et al., 2004
EFG
EFTu
rRNA
RecA
RpoB
HSP70
117. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
t eo
ba
Be ct
t er
Tuesday, June 29, 2010
ap ia
ro
t eo
G ba
ct
am er
m ia
ap
ro
t eo
Ep ba
si ct
lo er
np ia
ro
t eo
De ba
lta ct
er
pr ia
ot
eo
ba
ct
C er
ia
ya
no
ba
ct
er
ia
Fi
rm
ic
ut
es
Ac
tin
ob
ac
te
ria
C
hl
or
ob
i
Major Phylogenetic Group
C
FB
Sargasso Phylotypes
C
hl
or
of
le
xi
Sp
iro
ch
ae
te
s
Fu
so
ba
ct
De er
in ia
oc
Should improve with
oc
cu
s-
Th
er
Eu m
better genomic sampling
ry us
ar
ch
Phylogeny for Typing and Binning
ae
ot
C a
re
n ar
ch
ae
ot
a
Venter et al., 2004
EFG
EFTu
rRNA
RecA
RpoB
HSP70