13. Once You Find Something Alive …
You find a CLE
Separate Origin
from Known Life?
Common Origin
with Known Life?
14. Once You Find Something Alive …
You find a CLE
Separate Origin
from Known Life?
Common Origin
with Known Life?
Homologies w/ Known Life?
15. Once You Find Something Alive …
You find a CLE
Separate Origin
from Known Life?
Common Origin
with Known Life?
Homologies w/ Known Life?
No
16. Once You Find Something Alive …
You find a CLE
Separate Origin
from Known Life?
Common Origin
with Known Life?
Homologies w/ Known Life?
Yes
How Novel Is It?
17. Once You Find Something Alive …
You find a CLE
Separate Origin
from Known Life?
Common Origin
with Known Life?
Homologies w/ Known Life?
Yes
How Novel Is It?
18. • Novel form
• Novel function
• Novel phylogeny
How Novel Is It?
19. • Novel form
• Novel function
• Novel phylogeny
How Novel Is It?
21. Archaea
Worse Classification of Cultured Taxa by rRNA
rRNA rRNArRNA
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
EukaryotesBacteria
Carl
Woese
23. rRNA Phylotyping: One Taxon
DNA
ACTGC
ACCTAT
CGTTCG
ACTGC
ACCTAT
CGTTCG
ACTGC
ACCTAT
CGTTCG
Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACTGCACCTATCGTTCG
EukaryotesBacteria Archaea
Many
sequences
from one
sample all
point to the
same branch
on the tree
Norm
Pace
24. Expanded Tree (Pace 1997)
Archaea
Eukaryotes
Bacteria
Figure from Barton, Eisen et al. “Evolution”, CSHL Press.
2007.
Based on tree from Pace 1997 Science 276:734-740
25. Is There Anything Like This?
Archaea
Eukaryotes
Bacteria
Figure from Barton, Eisen et al. “Evolution”, CSHL Press.
2007.
Based on tree from Pace 1997 Science 276:734-740
??????
27. rRNA Tree of Life
Figure from Barton, Eisen et al. “Evolution”, CSHL Press.
2007.
Based on tree from Pace 1997 Science 276:734-740
Eukaryotes
??????
Archaea
Bacteria
Scanned through
GOS data for
rRNAs that fit
this pattern
28. rRNA Tree of Life
Figure from Barton, Eisen et al. “Evolution”, CSHL Press.
2007.
Based on tree from Pace 1997 Science 276:734-740
Eukaryotes
??????
Archaea
Bacteria
??????????
31. RecA Tree of Life?
Archaea
Eukaryotes
Bacteria
???????????
Figure from Barton, Eisen et al. “Evolution”, CSHL Press.
2007.
Based on tree from Pace 1997 Science 276:734-740
32. GOS 1
GOS 2
GOS 3
GOS 4
GOS 5
Novel RecA Sequences in GOS Data
Wu et al PLoS One 2011
34. GOS 1
GOS 2
GOS 3
GOS 4
GOS 5
Wu et al PLoS One 2011
I am happy to wellcome you as a new member
of the 4th domain club. If by chance you are
passing through Europe I will be delighted to
invite you to give a seminar in Marseille and
show you our strange bugs.
Kind regards
Didier
Phylogenetic ID of Novel Lineages
42. Chlorobi
)LUPLFXWHV
Tenericutes
)XVREDFWHULD
Chrysiogenetes
Proteobacteria
)LEUREDFWHUHV
TG3
Spirochaetes
WWE1 (Cloacamonetes)
70
ZB3
093í
'HLQRFRFFXVí7KHUPXV
OP1 (Acetothermia)
Bacteriodetes
TM7
GN02 (Gracilibacteria)
SR1
BH1
OD1 (Parcubacteria)
:6
OP11 (Microgenomates)
Euryarchaeota
Micrarchaea
DSEG (Aenigmarchaea)
Nanohaloarchaea
Nanoarchaea
Cren MCG
Thaumarchaeota
Cren C2
Aigarchaeota
Cren pISA7
Cren Thermoprotei
Korarchaeota
pMC2A384 (Diapherotrites)
BACTERIA ARCHAEA
archaeal toxins (Nanoarchaea)
lytic murein transglycosylase
stringent response
(Diapherotrites, Nanoarchaea)
ppGpp
limiting
amino acids
SpotT RelA
(GTP or GDP)
+ PPi
GTP or GDP
+ATP
limiting
phosphate,
fatty acids,
carbon, iron
DksA
Expression of components
for stress response
sigma factor (Diapherotrites, Nanoarchaea)
ı4
ȕ ȕ¶
ı2ı3 ı1
-35 -10
Į17'
Į7'
51$ SROPHUDVH
oxidoretucase
+ +e- donor e- acceptor
H
1
Ribo
ADP
+
1+2
O
Reduction
Oxidation
H
1
Ribo
ADP
1+
O
2H
1$' + H 1$'++ + -
HGT from Eukaryotes (Nanoarchaea)
Eukaryota
O
+2+2
OH
1+
2+3
O
O
+2+2
1+
2+3
O
tetra-
peptide
O
+2+2
OH
1+
2+3
O
O
+2+2
1+
2+3
O
tetra-
peptide
murein (peptido-glycan)
archaeal type purine synthesis
(Microgenomates)
PurF
PurD
3XU1
PurL/Q
PurM
PurK
PurE
3XU
PurB
PurP
?
Archaea
adenine guanine
O
+ 12
+
1
1+2
1
1
H
H
1
1
1
H
H
H1 1
H
PRPP )$,$5
IMP
$,$5
A
GUA
G U
G
U
A
G
U
A U
A U
A U
Growing
AA chain
W51$*O
44. Microbial Dark Matter Part 2
• Ramunas
Stepanauskas
• Tanja Woyke
• Jonathan Eisen
• Duane Moser
• Tullis Onstott
45. • More accurate phylogeny
• Rooting
• Incorporating New and Fragmented Data
• Lateral gene transfer
• More biology about the “novel” lineages
Challenge: Reference Information
54. Automated Genome Tree
Lang JM, Darling AE, Eisen JA (2013) Phylogeny of
Bacterial and Archaeal Genomes Using Conserved
Genes: Supertrees and Supermatrices. PLoS ONE
8(4): e62510. doi:10.1371/journal.pone.0062510
Jenna Lang
55. Better Reference Data (e.g., PhyEco Markers)
Phylogenetic group Genome Number Gene Number Maker Candidates
Archaea 62 145415 106
Actinobacteria 63 267783 136
Alphaproteobacteria 94 347287 121
Betaproteobacteria 56 266362 311
Gammaproteobacteria 126 483632 118
Deltaproteobacteria 25 102115 206
Epislonproteobacteria 18 33416 455
Bacteriodes 25 71531 286
Chlamydae 13 13823 560
Chloroflexi 10 33577 323
Cyanobacteria 36 124080 590
Firmicutes 106 312309 87
Spirochaetes 18 38832 176
Thermi 5 14160 974
Thermotogae 9 17037 684
Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families
for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological
Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE
8(10): e77033. doi:10.1371/journal.pone.0077033
56. Better Binning (e.g., HIC)
Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA,
Darling AE. (2014) Strain- and plasmid-level deconvolution of a
synthetic metagenome by sequencing proximity ligation products.
PeerJ 2:e415 http://dx.doi.org/10.7717/peerj.415
Table 1 Species alignment fractions. The number of reads aligning to each replicon present in the
synthetic microbial community are shown before and after filtering, along with the percent of total
constituted by each species. The GC content (“GC”) and restriction site counts (“#R.S.”) of each replicon,
species, and strain are shown. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome
2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus,
K12: E. coli K12 DH10B, BL21: E. coli BL21. An expanded version of this table can be found in Table S2.
Sequence Alignment % of Total Filtered % of aligned Length GC #R.S.
Lac0 10,603,204 26.17% 10,269,562 96.85% 2,291,220 0.462 629
Lac1 145,718 0.36% 145,478 99.84% 13,413 0.386 3
Lac2 691,723 1.71% 665,825 96.26% 35,595 0.385 16
Lac 11,440,645 28.23% 11,080,865 96.86% 2,340,228 0.46 648
Ped 2,084,595 5.14% 2,022,870 97.04% 1,832,387 0.373 863
BL21 12,882,177 31.79% 2,676,458 20.78% 4,558,953 0.508 508
K12 9,693,726 23.92% 1,218,281 12.57% 4,686,137 0.507 568
E. coli 22,575,903 55.71% 3,894,739 17.25% 9,245,090 0.51 1076
Bur1 1,886,054 4.65% 1,797,745 95.32% 2,914,771 0.68 144
Bur2 2,536,569 6.26% 2,464,534 97.16% 3,809,201 0.672 225
Bur 4,422,623 10.91% 4,262,279 96.37% 6,723,972 0.68 369
Figure 1 Hi-C insert distribution. The distribution of genomic distances between Hi-C read pairs is
shown for read pairs mapping to each chromosome. For each read pair the minimum path length on
the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded.
The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin
was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and
plotted.
E. coli K12 genome were distributed in a similar manner as previously reported (Fig. 1;
(Lieberman-Aiden et al., 2009)). We observed a minor depletion of alignments spanning
the linearization point of the E. coli K12 assembly (e.g., near coordinates 0 and 4686137)
due to edge eVects induced by BWA treating the sequence as a linear chromosome rather
than circular.
OI 10.7717/peerj.415 9/19
Figure 2 Metagenomic Hi-C associations. The log-scaled, normalized number of Hi-C read pairs
associating each genomic replicon in the synthetic community is shown as a heat map (see color scale,
blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome
1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2:
L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21.
reference assemblies of the members of our synthetic microbial community with the same
alignment parameters as were used in the top ranked clustering (described above). We first
counted the number of Hi-C reads associating each reference assembly replicon (Fig. 2;
Figure 3 Contigs associated by Hi-C reads. A graph is drawn with nodes depicting contigs and
depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count t
depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see l
with node size reflecting contig size. Contigs below 5 kb and edges with weights less than 5 were exc
Contig associations were normalized for variation in contig size.
typically represent the reads and variant sites as a variant graph wherein variant sit
represented as nodes, and sequence reads define edges between variant sites observ
the same read (or read pair). We reasoned that variant graphs constructed from H
data would have much greater connectivity (where connectivity is defined as the m
path length between randomly sampled variant positions) than graphs constructed
Chris Beitel
@datscimed
Aaron Darling
@koadman
57. Phylosift - Automated Bayesian Phylogenomics
Input Sequences
rRNA workflow
protein workflow
profile HMMs used to align
candidates to reference alignment
Taxonomic
Summaries
parallel option
hmmalign
multiple alignment
LAST
fast candidate search
pplacer
phylogenetic placement
LAST
fast candidate search
LAST
fast candidate search
search input against references
hmmalign
multiple alignment
hmmalign
multiple alignment
Infernal
multiple alignment
LAST
fast candidate search
600 bp
600 bp
Sample Analysis
Comparison
Krona plots,
Number of reads placed
for each marker gene
Edge PCA,
Tree visualization,
Bayes factor tests
eachinputsequencescannedagainstbothworkflows
Aaron Darling
@koadman
Erik Matsen
@ematsen
Holly Bik
@hollybik
Guillaume Jospin
@guillaumejospin
Darling AE, Jospin G, Lowe E,
Matsen FA IV, Bik HM, Eisen JA.
(2014) PhyloSift: phylogenetic
analysis of genomes and
metagenomes. PeerJ 2:e243
http://dx.doi.org/10.7717/peerj.
243
Erik Lowe
58. Normalizing Across Genes Tree OTU
Wu, D., Doroud, L, Eisen,
JA 2013. arXiv. TreeOTU:
Operational Taxonomic Unit
Classification Based on
Phylogenetic
Dongying Wu
60. The Rise of Citizen Microbiology
Darlene Cavalier
61. Eisen Lab Citizen Microbiology
Kitty Microbiome
Georgia Barguil
Jack Gilbert
Project MERCCURI
Phone
and
Shoes
tinyurl/kittybiome
Holly Ganz
David Coil
62. Acknowledgements
DOE JGI Sloan GBMF NSF
DHS DARPA
Aaron Darling
Lizzy
Wilbanks
Jenna Lang Russell
Neches
Rob Knight
Jack Gilbert Tanja Woyke Rob Dunn
Katie Pollard
Jessica
Green
Darlene
Cavalier
Eddy RubinWendy Brown
Dongying Wu
Phil
Hugenholtz
DSMZ
Sundar
Srijak
Bhatnagar David Coil
Alex Alexiev
Hannah
Holland-Moritz
Holly Bik
John Zhang
Holly
Menninger
Guillaume
Jospin
David Lang
Cassie
Ettinger
Tim HarkinsJennifer Gardy
Holly Ganz