SlideShare una empresa de Scribd logo
1 de 95
Descargar para leer sin conexión
Phylogeny-Driven Approaches to
Studies of Microbial and Microbiome
Diversity
Jonathan A. Eisen
University of California, Davis
@phylogenomics
February 7, 2015
UCSB EEMB Graduate Student Symposium
Phylogeny-Driven Approaches to
Studies of Microbial and Microbiome
Diversity
Jonathan A. Eisen
University of California, Davis
@phylogenomics
February 7, 2015
UCSB EEMB Graduate Student Symposium
Some Lessons I
Think I Have
Learned
Phylogeny-Driven Approaches to
Studies of Microbial and Microbiome
Diversity
Jonathan A. Eisen
University of California, Davis
@phylogenomics
February 7, 2015
UCSB EEMB Graduate Student Symposium
Lesson 1:
Go With Your
Obsessions
Open Science
Open Science
X
Social Media & Science
Social Media & Science
X
• RedSox
RedSox
• RedSox
RedSox
X
Microbial Evolution
Microbial Evolution
Lesson 2:
History Matters
Microbial Evolution
Lesson 2:
History (of
species, genes,
people, science)
Matters
Example I: Lost in Graduate School?
Lost in Graduate School?
Get A Map
Tree from Woese. 1987.
Microbiological Reviews 51:221
Map for Graduate School
Carl Woese
Limited Sampling of RRR Studies
Tree from Woese. 1987.
Microbiological Reviews 51:221
My Study Organisms
Tree from Woese. 1987.
Microbiological Reviews 51:221
H. volcanii Excision Repair
0
0.2
0.4
0.6
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
Avg. Mol. Wt.(Base Pairs)
H. volcanii UV Repair Label 7 - 45J / m2)
45 J/m2 Dark 24 Hours
45 J/m2 Photoreac.
45 J/m2 t0
0 J/m2 t0
By Grombo - from Wikipedia
1E-07
1E-06
1E-05
0.0001
0.001
0.01
0.1
1
Relative
Survival
0 50 100 150 200 250 300 350 400
UV J/m2
UV Survival E.coli vs H.volcanii
H.volcanii WFD11
E.coli NR10125 mfd+
E.coli NR10121 mfd-
From Eisen 1998. PhD Thesis.
Tree from Woese. 1987.
Microbiological Reviews 51:221
Map for Graduate School
Lesson 3:
Go Fishing Where
Nobody Else Has
Example II: Rice Microbiomes and Phylogeny
Joseph
Edwards
@Bulk_Soil
Sundar
@sundarlab
Cameron
Johnson
Srijak
Bhatnagar
@srijakbhatnagar
Edwards et al. 2015. Structure, variation,
and assembly of the root-associated
microbiomes of rice. PNAS
Supplementary Figures1
2
Fig. S1 Map depicting soil collection locations for greenhouse experiment.3
10
234
Fig. S2. Sampling and collection of the rhizocompartments. Roots are collected from rice235
plants and soil is shaken off the roots to leave ~1mm of soil around the roots. The ~1 mm of soil236
DNA
extraction
PCR
Sequence
rRNA genes
Sequence alignment = Data matrixPhylogenetic tree
PCR
rRNA1
rRNA2
Makes lots of
copies of the
rRNA genes
in sample
rRNA1
5’...ACACACATAGGTGGAGCTA
GCGATCGATCGA... 3’
E. coli
Humans
A
T
T
A
G
A
A
C
A
T
C
A
C
A
A
C
A
G
G
A
G
T
T
C
rRNA1
E. coli Humans
rRNA2
rRNA2
5’..TACAGTATAGGTGGAGCTAG
CGACGATCGA... 3’
rRNA3
5’...ACGGCAAAATAGGTGGATT
CTAGCGATATAGA... 3’
rRNA4
5’...ACGGCCCGATAGGTGGATT
CTAGCGCCATAGA... 3’
rRNA3 C A C T G T
rRNA4 C A C A G T
Yeast T A C A G T
Yeast
rRNA3
rRNA4
Phylogeny
PCR and phylogenetic analysis of rRNA genes
STAP
An Automated Phylogenetic Tree-Based Small Subunit
rRNA Taxonomy and Alignment Pipeline (STAP)
Dongying Wu1
*, Amber Hartman1,6
, Naomi Ward4,5
, Jonathan A. Eisen1,2,3
1 UC Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 Section of Evolution and Ecology, College of Biological Sciences,
University of California Davis, Davis, California, United States of America, 3 Department of Medical Microbiology and Immunology, School of Medicine, University of
California Davis, Davis, California, United States of America, 4 Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, United States of America,
5 Center of Marine Biotechnology, Baltimore, Maryland, United States of America, 6 The Johns Hopkins University, Department of Biology, Baltimore, Maryland, United
States of America
Abstract
Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know
about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline
and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of
data has opened many new windows into microbial diversity and evolution, and at the same time has created significant
methodological challenges. Those processes which commonly require time-consuming human intervention, such as the
preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated
methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though
computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple
sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully-
automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments
and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic
assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages
(PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly,
this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that
are unattainable by manual efforts.
Citation: Wu D, Hartman A, Ward N, Eisen JA (2008) An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP). PLoS
ONE 3(7): e2566. doi:10.1371/journal.pone.0002566
multiple alignment and phylogeny was deemed unfeasible.
However, this we believe can compromise the value of the results.
For example, the delineation of OTUs has also been automated
via tools that do not make use of alignments or phylogenetic trees
(e.g., Greengenes). This is usually done by carrying out pairwise
comparisons of sequences and then clustering of sequences that
have better than some cutoff threshold of similarity with each
other). This approach can be powerful (and reasonably efficient)
but it too has limitations. In particular, since multiple sequence
alignments are not used, one cannot carry out standard
phylogenetic analyses. In addition, without multiple sequence
alignments one might end up comparing and contrasting different
regions of a sequence depending on what it is paired with.
The limitations of avoiding multiple sequence alignments and
phylogenetic analysis are readily apparent in tools to classify
sequences. For example, the Ribosomal Database Project’s
Classifier program [29] focuses on composition characteristics of
each sequence (e.g., oligonucleotide frequency) and assigns
taxonomy based upon clustering genes by their composition.
Though this is fast and completely automatable, it can be misled in
cases where distantly related sequences have converged on similar
composition, something known to be a major problem in ss-rRNA
sequences [30]. Other taxonomy assignment systems focus
primarily on the similarity of sequences. The simplest of these is
classification tools it does have some limitations. For example,
the generation of new alignments for each sequence is both
computational costly, and does not take advantage of available
curated alignments that make use of ss-RNA secondary structure
to guide the primary sequence alignment. Perhaps most
importantly however is that the tool is not fully automated. In
addition, it does not generate multiple sequence alignments for all
sequences in a dataset which would be necessary for doing many
analyses.
Automated methods for analyzing rRNA sequences are also
available at the web sites for multiple rRNA centric databases,
such as Greengenes and the Ribosomal Database Project (RDPII).
Though these and other web sites offer diverse powerful tools, they
do have some limitations. For example, not all provide multiple
sequence alignments as output and few use phylogenetic
approaches for taxonomy assignments or other analyses. More
importantly, all provide only web-based interfaces and their
integrated software, (e.g., alignment and taxonomy assignment),
cannot be locally installed by the user. Therefore, the user cannot
take advantage of the speed and computing power of parallel
processing such as is available on linux clusters, or locally alter and
potentially tailor these programs to their individual computing
needs (Table 1).
Given the limited automated tools that are available for
Table 1. Comparison of STAP’s computational abilities relative to existing commonly-used ss-RNA analysis tools.
STAP ARB Greengenes RDP
Installed where? Locally Locally Web only Web only
User interface Command line GUI Web portal Web portal
Parallel processing YES NO NO NO
Manual curation for taxonomy assignment NO YES NO NO
Manual curation for alignment NO YES NO* NO
Open source YES** NO NO NO
Processing speed Fast Slow Medium Medium
It is important to note, that STAP is the only software that runs on the command line and can take advantage of parallel processing on linux clusters and, further, is
more amenable to downstream code manipulation.
*
Note: Greengenes alignment output is compatible with upload into ARB and downstream manual alignment.
**
The STAP program itself is open source, the programs it depends on are freely available but not open source.
doi:10.1371/journal.pone.0002566.t001
ss-rRNA Taxonomy Pipeline
STAP database, and the query sequence is aligned to them using
the CLUSTALW profile alignment algorithm [40] as described
above for domain assignment. By adapting the profile alignment
algorithm, th
while gaps ar
sequence ac
Figure 1. A flow chart of the STAP pipeline.
doi:10.1371/journal.pone.0002566.g001
STAP database, and the query sequence is aligned to them using
the CLUSTALW profile alignment algorithm [40] as described
above for domain assignment. By adapting the profile alignment
algorithm, the alignments from the STAP database remain intact,
while gaps are inserted and nucleotides are trimmed for the query
sequence according to the profile defined by the previous
alignments from the databases. Thus the accuracy and quality of
the alignment generated at this step depends heavily on the quality
of the Bacterial/Archaeal ss-rRNA alignments from the
Greengenes project or the Eukaryotic ss-rRNA alignments from
the RDPII project.
Phylogenetic analysis using multiple sequence alignments rests on
the assumption that the residues (nucleotides or amino acids) at the
same position in every sequence in the alignment are homologous.
Thus, columns in the alignment for which ‘‘positional homology’’
cannot be robustly determined must be excluded from subsequent
analyses. This process of evaluating homology and eliminating
questionable columns, known as masking, typically requires time-
consuming, skillful, human intervention. We designed an automat-
ed masking method for ss-rRNA alignments, thus eliminating this
bottleneck in high-throughput processing.
First, an alignment score is calculated for each aligned column
by a method similar to that used in the CLUSTALX package [42].
Specifically, an R-dimensional sequence space representing all the
possible nucleotide character states is defined. Then for each
aligned column, the nucleotide populating that column in each of
the aligned sequences is assigned a score in each of the R
dimensions (Sr) according to the IUB matrix [42]. The consensus
‘‘nucleotide’’ for each column (X) also has R dimensions, with the
Figure 2. Domain assignment. In Step 1, STAP assigns a domain to
each query sequence based on its position in a maximum likelihood
tree of representative ss-rRNA sequences. Because the tree illustrated
here is not rooted, domain assignment would not be accurate and
Figure 1. A flow chart of the STAP pipeline.
doi:10.1371/journal.pone.0002566.g001
ss-rRNA Taxonomy Pipeline
Dongying Wu
Amber
Hartman Naomi Ward
WATERsPage 2 of 14
chimeric sequences generated during PCR identifying
closely related sets of sequences (also known as opera-
tional taxonomic units or OTUs), removing redundant
sequences above a certain percent identity cutoff, assign-
ing putative taxonomic identifiers to each sequence or
representative of a group, inferring a phylogenetic tree of
the sequences, and comparing the phylogenetic structure
Figure 1 Overview of WATERS. Schema of WATERS where white
boxes indicate "behind the scenes" analyses that are performed in WA-
TERS. Quality control files are generated for white boxes, but not oth-
erwise routinely analyzed. Black arrows indicate that metadata (e.g.,
sample type) has been overlaid on the data for downstream interpre-
tation. Colored boxes indicate different types of results files that are
generated for the user for further use and biological interpretation.
Colors indicate different types of WATERS actors from Fig. 2 which
were used: green, Diversity metrics, WriteGraphCoordinates, Diversity
graphs; blue, Taxonomy, BuildTree, Rename Trees, Save Trees; Create-
Unifrac; yellow, CreateOtuTable, CreateCytoscape, CreateOTUFile;
white, remaining unnamed actors.
Align
Check
chimeras
Cluster Build
Tree
Assign
Taxonomy
Tree w/
Taxonomy
Diversity
statistics &
graphs
Unifrac
files
Cytoscape
network
OTU table
Hartman et al 2010. W.A.T.E.R.S.: a Workflow for the Alignment,
Taxonomy, and Ecology of Ribosomal Sequences. BMC Bioinformatics
2010, 11:317 doi:10.1186/1471-2105-11-317
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Page 9 of 14
default is 97% and 99%), and they are also generated for
every metadata variable comparison that the user
includes.
Data pruning
To assist in troubleshooting and quality control,
WATERS returns to the user three fasta files of sequences
that were removed at various steps in the workflow. A
short_sequences.fas file is created that contains all
Figure 3 Biologically similar results automatically produced by WATERS on published colonic microbiota samples. (A) Rarefaction curves sim-
ilar to curves shown in Eckburg et al. Fig. 2; 70-72, indicate patient numbers, i.e., 3 different individuals. (B) Weighted Unifrac analysis based on phylo-
genetic tree and OTU data produced by WATERS very similar to Eckburg et al. Fig. 3B. (C) Neighbor-joining phylogenetic tree (Quicktree) representing
the sequences analyzed by WATERS, which is clearly similar to Fig. S1 in Eckburg et al.
BA
3 3HUFHQW YDULDWLRQ H[SODLQHG
33HUFHQWYDULDWLRQH[SODLQHG
$%
&
'(
)
6
$ %
&
'(
)
6
$
%&
'
()
6
3&$ 3 YV 3
C
%$&7(52,'(7(6
%$&7(52,'$/(6
'(/7$3527(2%$&7(5,$
$&7,12%$&7(5,$
9(558&20,&52%,$
(36,/213527(2%$&7(5,$
),50,&87(6
&/2675,',$
&/2675,',$/(6
*$00$3527(2%$&7(5,$
&<$12%$&7(5,$
$/3+$3527(2%$&7(5,$
)862%$&7(5,$
),50,&87(6
%$&,//,
),50,&87(6
02//,&87(6
Amber
Hartman
Bertram
Ludaescer
alignment used to build the profile, resulting in a multiple
sequence alignment of full-length reference sequences and
PD versus PID clustering, 2) to explore overlap between PhylOTU
clusters and recognized taxonomic designations, and 3) to quantify
Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generalize
workflow of PhylOTU. See Results section for details.
doi:10.1371/journal.pcbi.1001061.g001
Finding Metagenomic OTUs
Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA,
Pollard KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial
Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput
Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061
PhylOTU
Tom Sharpton
@tjsharpton
QIIME Phylotyping and Phylogenetic Ecology
296
Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297
compartment in the greenhouse experiment. (A) Number of OTU298
they belong to that are enriched across all rhizocompartments in the299
A subset of the Proteobacteria and the classes and families they belo300
enriched across all rhizocompartments in the greenhouse.301
https://evomics.org/2014/01/the-glories-of-the-gut-ask-a-fat-mouse/
QIIME Phylotyping and Phylogenetic Ecology
296
Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297
compartment in the greenhouse experiment. (A) Number of OTU298
they belong to that are enriched across all rhizocompartments in the299
A subset of the Proteobacteria and the classes and families they belo300
enriched across all rhizocompartments in the greenhouse.301
https://evomics.org/2014/01/the-glories-of-the-gut-ask-a-fat-mouse/
Lesson 4:
Accept When You
Are Defeated
Rice Microbiome: Variation w/in Plant
Joseph
Edwards
@Bulk_Soil
Sundar
@sundarlab
Cameron
Johnson
Srijak
Bhatnagar
@srijakbhatnagar
growth. For our study, the rhizosphere compartment was com-
the un
sitive t
zocomp
indicat
microb
and SI
ration
the ext
terior o
(PERM
talizati
microb
P < 0.0
howeve
the sec
P < 0.0
perform
(CAP)
iance a
Materia
PCoA
analysi
terest t
on the
soil typ
quenci
agreem
Fig. 1. Root-associated microbial communities are separable by rhizo-
compartment and soil type. (A) A representation of a rice root cross-section
depicting the locations of the microbial communities sampled. (B) Within-
sample diversity (α-diversity) measurements between rhizospheric compart-
ments indicate a decreasing gradient in microbial diversity from the rhizo-
sphere to the endosphere independent of soil type. Estimated species
Edwards et al. 2015. Structure, variation,
and assembly of the root-associated
microbiomes of rice. PNAS
Rice Genotype Affects Microbiome
rhizocompartments were analyzed as before. Unfortunately,
collection of bulk soil controls for the field experiment was not
Fig. 3. Host plant genotype significantly affects microbial communities in
the rhizospheric compartments. (A) Ordination of CAP analysis using the
WUF metric constrained to rice genotype. (B) Within-sample diversity
measurements of rhizosphere samples of each cultivar grown in each soil.
Estimated species richness was calculated as eShannon_entropy
. The horizontal
Edwards et al. 2015. Structure, variation,
and assembly of the root-associated
microbiomes of rice. PNAS
Rice: Cultivation Site Effects
Edwards et al. 2015.
Structure, variation, and
assembly of the root-
associated
microbiomes of rice.
PNAS
the field plants again showed that the rhizosphere had the
highest microbial diversity, whereas the endosphere had the least
found to be enriche
greenhouse plants (S
OTUs were classifiabl
sisted of taxa in the fa
and Myxococcaceae, al
bidopsis root endosphe
Cultivation Practice Result
The rice fields that we
practices, organic farmi
tion called ecofarming
farming in that chemica
are all permitted but g
harvest fumigants are n
itself does significantly
partments overall (P =
a significant interaction
the rhizocompartments
indicating that the α-d
affected differentially by
the rhizosphere compa
practice, with the mean
zospheres than organic
Dataset S14), whereas
crobial communities (P
tests; Dataset S14). Un
practices are separable a
the WUF metric (Fig.
Rice: Functional Enrichment x Genotype
and mitochondrial) reads to analyze microbial abundance in
the endosphere over time (Fig. 6A). Using this technique, we
confirmed the sterility of seedling roots before transplantation.
(13 d) approach the endosphere and rhizoplane microbiome
compositions for plants that have been grown in the green-
house for 42 d.
Fig. 5. OTU coabundance network reveals modules of OTUs associated with methane cycling. (A) Subset of the entire network corresponding to 11
modules with methane cycling potential. Each node represents one OTU and an edge is drawn between OTUs if they share a Pearson correlation of
greater than or equal to 0.6. (B) Depiction of module 119 showing the relationship between methanogens, syntrophs, methanotrophs, and other
methane cycling taxonomies. Each node represents one OTU and is labeled by the presumed function of that OTU’s taxonomy in methane cycling. An
edge is drawn between two OTUs if they have a Pearson correlation of greater than or equal to 0.6. (C) Mean abundance profile for OTUs in module 119
across all rhizocompartments and field sites. The position along the x axis corresponds to a different field site. Error bars represent SE. The x and y axes
represent no particular scale.
Edwards et al. 2015. Structure, variation, and assembly of the root-associated
microbiomes of rice. PNAS
Rice Developmental Time Series
of magnitude greater than in any single plant species
Under controlled greenhouse conditions, the rhizocomp
described the largest source of variation in the microb
munities sampled (Dataset S5A). The pattern of separ
tween the microbial communities in each compar
consistent with a spatial gradient from the bulk soil a
rhizosphere and rhizoplane into the endosphere (F
Similarly, microbial diversity patterns within samples
same pattern where there is a gradient in α-diversity
rhizosphere to the endosphere (Fig. 1B). Enrichment
pletion of certain microbes across the rhizocompartme
cates that microbial colonization of rice roots is not a
process and that plants have the ability to select for ce
crobial consortia or that some microbes are better at f
root colonizing niche. Similar to studies in Arabidopsis, w
that the relative abundance of Proteobacteria is increas
endosphere compared with soil, and that the relative abu
of Acidobacteria and Gemmatimonadetes decrease from
to the endosphere (9–11), suggesting that the distrib
different bacterial phyla inside the roots might be simil
land plants (Fig. 1D and Dataset S6). Under controlle
house conditions, soil type described the second large
of variation within the microbial communities of each
However, the soil source did not affect the pattern of se
between the rhizospheric compartments, suggesting
rhizocompartments exert a recruitment effect on micro
sortia independent of the microbiome source.
By using differential OTU abundance analysis in t
partments, we observed that the rhizosphere serves an
ment role for a subset of microbial OTUs relative to
(Fig. 2). Further, the majority of the OTUs enriche
rhizosphere are simultaneously enriched in the rhizoplan
endosphere of rice roots (Fig. 2B and SI Appendix, Fig
consistent with a recruitment model in which factors pro
the root attract taxa that can colonize the endosphere. W
that the rhizoplane, although enriched for OTUs that
enriched in the endosphere, is also uniquely enriched for
of OTUs, suggesting that the rhizoplane serves as a sp
Edwards et al. 2015.
Structure, variation, and
assembly of the root-
associated
microbiomes of rice.
PNAS
Tree from Woese. 1987.
Microbiological Reviews 51:221
Example III: rRNA Not Perfect
Lesson 5:
Nothing is Perfect
Tree from Woese. 1987.
Microbiological Reviews 51:221
Taxa Phylogeny III: rRNA Not Perfect
rRNA Copy # Correction by Phylogeny
Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates
of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi:10.1371/journal.pcbi.1002743
Jessica Green
@jessicaleegreen
Steven Kembel
@stevenkembel
Martin Wu
DNA
extraction
PCR
Sequence
all genes
Phylogenetic tree
Shotgun
GeneX
E. coli Humans
GeneX
Yeast
GeneX
GeneX
Phylotyping
Phylogeny in Shotgun Metagenomics
RecA vs. rRNA
Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
RecA vs. rRNA
Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
Lesson 6:
Keep Going Back
to Your Past
Phylotyping w/ Protein Markers
AMPHORA
http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
D
eltaproteobacteria
Epsilonproteobacteria
U
nclassified
proteobacteria
Bacteroidetes
C
hlam
ydiae
C
yanobacteria
Acidobacteria
Therm
otogae
Fusobacteria
ActinobacteriaAquificae
Planctom
ycetes
Spirochaetes
Firm
icutes
C
hloroflexiC
hlorobi
U
nclassified
bacteria
dnaG
frr
infC
nusA
pgk
pyrG
rplA
rplB
rplC
rplD
rplE
rplF
rplK
rplL
rplM
rplN
rplP
rplS
rplT
rpmA
rpoB
rpsB
rpsC
rpsE
rpsI
rpsJ
rpsK
rpsM
rpsS
smpB
tsf
Relativeabundance
Martin Wu
GOS 1
GOS 2
GOS 3
GOS 4
GOS 5
Phylogenetic ID of Novel Lineages
Wu et al PLoS One 2011
Dongying Wu
Phylogenetic Diversity of Metagenomes
typically used as a qualitative measure because duplicate s
quences are usually removed from the tree. However, the
test may be used in a semiquantitative manner if all clone
even those with identical or near-identical sequences, are i
cluded in the tree (13).
Here we describe a quantitative version of UniFrac that w
call “weighted UniFrac.” We show that weighted UniFrac b
haves similarly to the FST test in situations where both a
FIG. 1. Calculation of the unweighted and the weighted UniFr
measures. Squares and circles represent sequences from two differe
environments. (a) In unweighted UniFrac, the distance between t
circle and square communities is calculated as the fraction of t
branch length that has descendants from either the square or the circ
environment (black) but not both (gray). (b) In weighted UniFra
branch lengths are weighted by the relative abundance of sequences
the square and circle communities; square sequences are weight
twice as much as circle sequences because there are twice as many tot
circle sequences in the data set. The width of branches is proportion
to the degree to which each branch is weighted in the calculations, an
gray branches have no weight. Branches 1 and 2 have heavy weigh
since the descendants are biased toward the square and circles, respe
tively. Branch 3 contributes no value since it has an equal contributio
from circle and square sequences after normalization.
Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of
Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
Jessica
Green
Steven
Kembel
Katie
Pollard
Phylosift/ pplacer Workflow
Input Sequences
rRNA workflow
protein workflow
profile HMMs used to align
candidates to reference alignment
Taxonomic
Summaries
parallel option
hmmalign
multiple alignment
LAST
fast candidate search
pplacer
phylogenetic placement
LAST
fast candidate search
LAST
fast candidate search
search input against references
hmmalign
multiple alignment
hmmalign
multiple alignment
Infernal
multiple alignment
LAST
fast candidate search
<600 bp
>600 bp
Sample Analysis &
Comparison
Krona plots,
Number of reads placed
for each marker gene
Edge PCA,
Tree visualization,
Bayes factor tests
eachinputsequencescannedagainstbothworkflows
Aaron Darling
@koadman
Erik Matsen
@ematsen
Holly Bik
@hollybik
Guillaume Jospin
@guillaumejospin
Darling AE, Jospin G, Lowe E,
Matsen FA IV, Bik HM, Eisen JA.
(2014) PhyloSift: phylogenetic
analysis of genomes and
metagenomes. PeerJ 2:e243
http://dx.doi.org/10.7717/peerj.
243
Erik Lowe
Whole Genome Tree of 2000 Taxa
Lang JM, Darling AE, Eisen JA (2013)
Phylogeny of Bacterial and Archaeal
Genomes Using Conserved Genes:
Supertrees and Supermatrices. PLoS
ONE 8(4): e62510. doi:10.1371/
journal.pone.0062510
Jenna Lang
@jennnomics
Aaron Darling
@koadman
Phylosift Markers
• PMPROK – Dongying Wu’s Bac/Arch
markers
• Eukaryotic Orthologs – Parfrey 2011 paper
• 16S/18S rRNA
• Mitochondria - protein-coding genes
• Viral Markers – Markov clustering on
genomes
• Codon Subtrees – finer scale taxonomy
• Extended Markers – plastids, gene families
PhyEco Markers
Phylogenetic group Genome Number Gene Number Maker Candidates
Archaea 62 145415 106
Actinobacteria 63 267783 136
Alphaproteobacteria 94 347287 121
Betaproteobacteria 56 266362 311
Gammaproteobacteria 126 483632 118
Deltaproteobacteria 25 102115 206
Epislonproteobacteria 18 33416 455
Bacteriodes 25 71531 286
Chlamydae 13 13823 560
Chloroflexi 10 33577 323
Cyanobacteria 36 124080 590
Firmicutes 106 312309 87
Spirochaetes 18 38832 176
Thermi 5 14160 974
Thermotogae 9 17037 684
Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families
for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological
Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE
8(10): e77033. doi:10.1371/journal.pone.0077033
Edge PCA: Identify
lineages that explain most
variation among samples
Edge PCA - Matsen and Evans 2013
Output: Edge PCA
QIIME Phylotyping and Phylogenetic Ecology
296
Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297
compartment in the greenhouse experiment. (A) Number of OTU298
they belong to that are enriched across all rhizocompartments in the299
A subset of the Proteobacteria and the classes and families they belo300
enriched across all rhizocompartments in the greenhouse.301
https://evomics.org/2014/01/the-glories-of-the-gut-ask-a-fat-mouse/
Lesson 7:
Don’t Accept
When You Are
Defeated
Example IV: Functional Evolution
My Study Organisms
Tree from Woese. 1987.
Microbiological Reviews 51:221
1st Genome Sequence
Fleischmann et al.
1995
TIGR Genome Projects
Tree from Woese. 1987.
Microbiological Reviews 51:221
1st Genome Sequence
Fleischmann et al.
1995
Lesson 8:
If you can’t beat
them, critique
them or join them
• Leveraging an understanding of the
evolution of function to better prediction
functions
Function & Phylogeny
PHYLOGENENETIC PREDICTION OF GENE FUNCTION
IDENTIFY HOMOLOGS
OVERLAY KNOWN
FUNCTIONS ONTO TREE
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
1 2 3 4 5 6
3 5
3
1A 2A 3A 1B 2B 3B
2A 1B
1A
3A
1B
2B
3B
ALIGN SEQUENCES
CALCULATE GENE TREE
1
2
4
6
CHOOSE GENE(S) OF INTEREST
2A
2A
5
3
Species 3Species 1 Species 2
1
1 2
2
2 31
1A 3A
1A 2A 3A
1A 2A 3A
4 6
4 5 6
4 5 6
2B 3B
1B 2B 3B
1B 2B 3B
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Duplication?
EXAMPLE A EXAMPLE B
Duplication?
Duplication?
Duplication
5
METHOD
Ambiguous
Based on
Eisen, 1998
Genome Res 8:
163-167.
Phylogenomics
PHYLOGENENETIC PREDICTION OF GENE FUNCTION
IDENTIFY HOMOLOGS
OVERLAY KNOWN
FUNCTIONS ONTO TREE
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
1 2 3 4 5 6
3 5
3
1A 2A 3A 1B 2B 3B
2A 1B
1A
3A
1B
2B
3B
ALIGN SEQUENCES
CALCULATE GENE TREE
1
2
4
6
CHOOSE GENE(S) OF INTEREST
2A
2A
5
3
Species 3Species 1 Species 2
1
1 2
2
2 31
1A 3A
1A 2A 3A
1A 2A 3A
4 6
4 5 6
4 5 6
2B 3B
1B 2B 3B
1B 2B 3B
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Duplication?
EXAMPLE A EXAMPLE B
Duplication?
Duplication?
Duplication
5
METHOD
Ambiguous
Based on
Eisen, 1998
Genome Res 8:
163-167.
Phylogenomics
Lesson 9:
If you invent your
own omics word,
you are stuck with it
so use it for
branding
Phylogenomics ~~ Phylotyping
Eisen et al.
1992Eisen et al. 1992. J. Bact.174: 3416
Phylogenomics ~~ Phylotyping
Eisen et al.
1992Eisen et al. 1992. J. Bact.174: 3416
Lesson 10:
Stealing (with
acknowledgement)
is OK
Proteorhodopsin Functional Diversity
Venter et al., Science 304: 66. 2004
• Leveraging understanding of gene gain
and loss to better predict genome
functions
Lesson 11:
Who you hang out
with matters
Carboxydothermus hydrogenoformans
• Isolated from a Russian hotspring
• Thermophile (grows at 80°C)
• Anaerobic
• Grows very efficiently on CO (Carbon
Monoxide)
• Produces hydrogen gas
• Low GC Gram positive (Firmicute)
• Genome Determined (Wu et al. 2005
PLoS Genetics 1: e65. )
Homologs of Sporulation Genes
Wu et al. 2005 PLoS
Genetics 1: e65.
Carboxydothermus sporulates
Wu et al. 2005 PLoS Genetics 1: e65.
Non-Homology Predictions:
Phylogenetic Profiling
• Step 1: Search all genes in
organisms of interest against all
other genomes
• Ask: Yes or No, is each gene
found in each other species
• Cluster genes by distribution
patterns (profiles)
Sporulation Gene Profile
Wu et al. 2005 PLoS Genetics 1: e65.
B. subtilis new sporulation genes
J Bacteriol. 2013 Jan;195(2):253-60. doi: 10.1128/JB.01778-12
Bjorn Traag
Richard Losick
Tree from Woese. 1987.
Microbiological Reviews 51:221
Example V: More Gaps
Lesson 12:
Keep Returning to
the Same Theme
Over and Over
and Over
Yet Another Map
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Genomes Poorly Sampled
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
TIGR Tree of Life Project
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Genomic Encyclopedia of Bacteria & Archaea
Wu et al. 2009 Nature 462, 1056-1060
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Genomic Encyclopedia of Bacteria & Archaea
Wu et al. 2009 Nature 462, 1056-1060
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Family Diversity vs. PD
Wu et al. 2009 Nature 462, 1056-1060
GEBA Cyanobacteria
Shih et al. 2013. PNAS 10.1073/pnas.1217107110
0.3
B1
B2
C1
Paulinella
Glaucophyte
Green
Red
Chromalveolates
C2
C3
A
E
F
G
B3
D
A
B
Fig.
mum
noba
Haloarchaeal GEBA-like
Lynch et al. (2012) PLoS ONE 7(7): e41389. doi:10.1371/journal.pone.0041389
The Dark Matter of Biology
From Wu et al. 2009 Nature 462, 1056-1060
75
Number of SAGs from Candidate Phyla
OD1
OP11
OP3
SAR406
Site A: Hydrothermal vent 4 1 - -
Site B: Gold Mine 6 13 2 -
Site C: Tropical gyres (Mesopelagic) - - - 2
Site D: Tropical gyres (Photic zone) 1 - - -
Sample collections at 4 additional sites are underway.
Phil Hugenholtz
GEBA Uncultured
JGI Dark Matter Project
environmental
samples (n=9)
isolation of single
cells (n=9,600)
whole genome
amplification (n=3,300)
SSU rRNA gene
based identification
(n=2,000)
genome sequencing,
assembly and QC (n=201)
draft genomes
(n=201)
SAK
HSM ETLTG
HOT
GOM
GBS
EPR
TAETL T
PR
EBS
AK E
SM G TATTG
OM
OT
seawater brackish/freshwater hydrothermal sediment bioreactor
GN04
WS3 (Latescibacteria)
GN01
+Gí
LD1
WS1
Poribacteria
BRC1
Lentisphaerae
Verrucomicrobia
OP3 (Omnitrophica)
Chlamydiae
Planctomycetes
NKB19 (Hydrogenedentes)
WYO
Armatimonadetes
WS4
Actinobacteria
Gemmatimonadetes
NC10
SC4
WS2
Cyanobacteria
:36í2
Deltaproteobacteria
EM19 (Calescamantes)
2FW6SDí )HUYLGLEDFWHULD
GAL35
Aquificae
EM3
Thermotogae
Dictyoglomi
SPAM
GAL15
CD12 (Aerophobetes)
OP8 (Aminicenantes)
AC1
SBR1093
Thermodesulfobacteria
Deferribacteres
Synergistetes
OP9 (Atribacteria)
:36í2
Caldiserica
AD3
Chloroflexi
Acidobacteria
Elusimicrobia
Nitrospirae
49S1 2B
Caldithrix
GOUTA4
6$5 0DULQLPLFURELD
Chlorobi
)LUPLFXWHV
Tenericutes
)XVREDFWHULD
Chrysiogenetes
Proteobacteria
)LEUREDFWHUHV
TG3
Spirochaetes
WWE1 (Cloacamonetes)
70
ZB3
093í
'HLQRFRFFXVí7KHUPXV
OP1 (Acetothermia)
Bacteriodetes
TM7
GN02 (Gracilibacteria)
SR1
BH1
OD1 (Parcubacteria)
:6
OP11 (Microgenomates)
Euryarchaeota
Micrarchaea
DSEG (Aenigmarchaea)
Nanohaloarchaea
Nanoarchaea
Cren MCG
Thaumarchaeota
Cren C2
Aigarchaeota
Cren pISA7
Cren Thermoprotei
Korarchaeota
pMC2A384 (Diapherotrites)
BACTERIA ARCHAEA
archaeal toxins (Nanoarchaea)
lytic murein transglycosylase
stringent response
(Diapherotrites, Nanoarchaea)
ppGpp
limiting
amino acids
SpotT RelA
(GTP or GDP)
+ PPi
GTP or GDP
+ATP
limiting
phosphate,
fatty acids,
carbon, iron
DksA
Expression of components
for stress response
sigma factor (Diapherotrites, Nanoarchaea)
ı4
ȕ  ȕ¶
ı2ı3 ı1
-35 -10
Į17'
Į7'
51$ SROPHUDVH
oxidoretucase
+ +e- donor e- acceptor
H
1
Ribo
ADP
+
1+2
O
Reduction
Oxidation
H
1
Ribo
ADP
1+
O
2H
1$'  +  H 1$'++ + -
HGT from Eukaryotes (Nanoarchaea)
Eukaryota
O
+2+2
OH
1+
2+3
O
O
+2+2
1+
2+3
O
tetra-
peptide
O
+2+2
OH
1+
2+3
O
O
+2+2
1+
2+3
O
tetra-
peptide
murein (peptido-glycan)
archaeal type purine synthesis
(Microgenomates)
PurF
PurD
3XU1
PurL/Q
PurM
PurK
PurE
3XU
PurB
PurP
?
Archaea
adenine guanine
O
+ 12
+
1
1+2
1
1
H
H
1
1
1
H
H
H1 1
H
PRPP )$,$5
IMP
$,$5
A

GUA 
G U
G
U
A

G
U
A U
A  U
A  U
Growing
AA chain
W51$*O
recognizes
UGA
P51$
UGA recoded for Gly (Gracilibacteria)
ribosome
Woyke et al. Nature 2013.
A Genomic Encyclopedia of Microbes (GEM)
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Tetrahymena Genome Project
A Genomic Encyclopedia of Microbes (GEM)
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Tree from Woese. 1987.
Microbiological Reviews 51:221
Example VI: Beyond Sequence
Lesson 13:
Don’t Overdo It
With That Theme
DNA
extraction
PCR
Sequence
all genes
Shotgun
Shotgun Metagenomics
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes vitamins and cofactors
Sulcia makes amino acids
Phylogenetic Binning
HiC Crosslinking  Sequencing
Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore
RW, Eisen JA, Darling AE. (2014) Strain- and plasmid-
level deconvolution of a synthetic metagenome by
sequencing proximity ligation products. PeerJ 2:e415
http://dx.doi.org/10.7717/peerj.415
Table 1 Species alignment fractions. The number of reads aligning to each replicon present in the
synthetic microbial community are shown before and after filtering, along with the percent of total
constituted by each species. The GC content (“GC”) and restriction site counts (“#R.S.”) of each replicon,
species, and strain are shown. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome
2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus,
K12: E. coli K12 DH10B, BL21: E. coli BL21. An expanded version of this table can be found in Table S2.
Sequence Alignment % of Total Filtered % of aligned Length GC #R.S.
Lac0 10,603,204 26.17% 10,269,562 96.85% 2,291,220 0.462 629
Lac1 145,718 0.36% 145,478 99.84% 13,413 0.386 3
Lac2 691,723 1.71% 665,825 96.26% 35,595 0.385 16
Lac 11,440,645 28.23% 11,080,865 96.86% 2,340,228 0.46 648
Ped 2,084,595 5.14% 2,022,870 97.04% 1,832,387 0.373 863
BL21 12,882,177 31.79% 2,676,458 20.78% 4,558,953 0.508 508
K12 9,693,726 23.92% 1,218,281 12.57% 4,686,137 0.507 568
E. coli 22,575,903 55.71% 3,894,739 17.25% 9,245,090 0.51 1076
Bur1 1,886,054 4.65% 1,797,745 95.32% 2,914,771 0.68 144
Bur2 2,536,569 6.26% 2,464,534 97.16% 3,809,201 0.672 225
Bur 4,422,623 10.91% 4,262,279 96.37% 6,723,972 0.68 369
Figure 1 Hi-C insert distribution. The distribution of genomic distances between Hi-C read pairs is
shown for read pairs mapping to each chromosome. For each read pair the minimum path length on
the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded.
The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin
was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and
plotted.
E. coli K12 genome were distributed in a similar manner as previously reported (Fig. 1;
(Lieberman-Aiden et al., 2009)). We observed a minor depletion of alignments spanning
the linearization point of the E. coli K12 assembly (e.g., near coordinates 0 and 4686137)
due to edge eVects induced by BWA treating the sequence as a linear chromosome rather
than circular.
10.7717/peerj.415 9/19
Figure 2 Metagenomic Hi-C associations. The log-scaled, normalized number of Hi-C read pairs
associating each genomic replicon in the synthetic community is shown as a heat map (see color scale,
blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome
1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2:
L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21.
reference assemblies of the members of our synthetic microbial community with the same
alignment parameters as were used in the top ranked clustering (described above). We first
Figure 3 Contigs associated by Hi-C reads. A graph is drawn with nodes depicting contigs and edges
depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count thereof
depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see legend)
with node size reflecting contig size. Contigs below 5 kb and edges with weights less than 5 were excluded.
Contig associations were normalized for variation in contig size.
typically represent the reads and variant sites as a variant graph wherein variant sites are
represented as nodes, and sequence reads define edges between variant sites observed in
the same read (or read pair). We reasoned that variant graphs constructed from Hi-C
data would have much greater connectivity (where connectivity is defined as the mean
path length between randomly sampled variant positions) than graphs constructed from
mate-pair sequencing data, simply because Hi-C inserts span megabase distances. Such
Figure 4 Hi-C contact maps for replicons of Lactobacillus brevis. Contact maps show the number of
Hi-C read pairs associating each region of the L. brevis genome. The L. brevis chromosome (Lac0, (A),
Chris Beitel
@datscimed
Aaron Darling
@koadman
Sequence Isn’t Everything
PB-PSB1
(Purple sulfur bacteria)
PB-SRB1
(Sulfate reducing bacteria)
(sulfate)
(sulfide)
Wilbanks, E.G. et al (2014). Environmental Microbiology
Lizzy Wilbanks
@lizzywilbanks
12
C, 12
C14
N, 32
S
Biomass
(RGB composite)
0.044 0.080
34S-incorporation
(34S/32S ratio)
Wilbanks, E.G. et al (2014). Environmental Microbiology
Transfer of 34
S from SRB to PSB
Long Reads Help, A Lot
Hiseq  Miseq
100-250 bp
Moleculo
2-20 kb
Pacbio RSII
2-20kb
Micky Kertesz,
Tim Blauwcamp
Meredith Ashby
Cheryl Heiner
Illumina-based
synthetic long reads”
Real-time single molecul
sequencing
(p4-c2, p5-c3)
295 Megabases 474 Megabases61 Gigabases
Light-responsive sulfate reducer?
rhodopsin
w/ Susumu Yoshizawa
Lesson 14:
Asking for, and
getting, help, is a
good thing
Seagrass Microbiome
1000 samples collected.
Not a blade of seagrass touched.
YEAR ONE


ZEN (Zostera Experimental Network)

25 partner sites
leaves, roots, sediment, and water samples

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Prosite
PrositeProsite
Prosite
 
Biological database
Biological databaseBiological database
Biological database
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
Proteome databases
Proteome databasesProteome databases
Proteome databases
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...
 
NCBI
NCBINCBI
NCBI
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meeting
 
Structural database and their classification by abdul qahar
Structural database and their classification by abdul qaharStructural database and their classification by abdul qahar
Structural database and their classification by abdul qahar
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.
 
Ncbi
NcbiNcbi
Ncbi
 

Similar a Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonathan Eisen at UCSB Feb 2015

Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
International Journal of Engineering Inventions www.ijeijournal.com
 

Similar a Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonathan Eisen at UCSB Feb 2015 (20)

Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
phy prAC.pptx
phy prAC.pptxphy prAC.pptx
phy prAC.pptx
 
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional Predictions
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
De novo transcriptome assembly of solid sequencing data in cucumis melo
De novo transcriptome assembly of solid sequencing data in cucumis meloDe novo transcriptome assembly of solid sequencing data in cucumis melo
De novo transcriptome assembly of solid sequencing data in cucumis melo
 
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELODE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
D1803012022
D1803012022D1803012022
D1803012022
 
Article
ArticleArticle
Article
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Kirmitzoglou_PhD_Final
Kirmitzoglou_PhD_FinalKirmitzoglou_PhD_Final
Kirmitzoglou_PhD_Final
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 

Más de Jonathan Eisen

EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
Jonathan Eisen
 

Más de Jonathan Eisen (20)

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
 

Último

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 

Último (20)

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 

Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonathan Eisen at UCSB Feb 2015

  • 1. Phylogeny-Driven Approaches to Studies of Microbial and Microbiome Diversity Jonathan A. Eisen University of California, Davis @phylogenomics February 7, 2015 UCSB EEMB Graduate Student Symposium
  • 2. Phylogeny-Driven Approaches to Studies of Microbial and Microbiome Diversity Jonathan A. Eisen University of California, Davis @phylogenomics February 7, 2015 UCSB EEMB Graduate Student Symposium Some Lessons I Think I Have Learned
  • 3. Phylogeny-Driven Approaches to Studies of Microbial and Microbiome Diversity Jonathan A. Eisen University of California, Davis @phylogenomics February 7, 2015 UCSB EEMB Graduate Student Symposium Lesson 1: Go With Your Obsessions
  • 6. Social Media & Science
  • 7. Social Media & Science X
  • 12. Microbial Evolution Lesson 2: History (of species, genes, people, science) Matters
  • 13. Example I: Lost in Graduate School?
  • 14. Lost in Graduate School? Get A Map
  • 15. Tree from Woese. 1987. Microbiological Reviews 51:221 Map for Graduate School Carl Woese
  • 16. Limited Sampling of RRR Studies Tree from Woese. 1987. Microbiological Reviews 51:221
  • 17. My Study Organisms Tree from Woese. 1987. Microbiological Reviews 51:221
  • 18. H. volcanii Excision Repair 0 0.2 0.4 0.6 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Avg. Mol. Wt.(Base Pairs) H. volcanii UV Repair Label 7 - 45J / m2) 45 J/m2 Dark 24 Hours 45 J/m2 Photoreac. 45 J/m2 t0 0 J/m2 t0 By Grombo - from Wikipedia 1E-07 1E-06 1E-05 0.0001 0.001 0.01 0.1 1 Relative Survival 0 50 100 150 200 250 300 350 400 UV J/m2 UV Survival E.coli vs H.volcanii H.volcanii WFD11 E.coli NR10125 mfd+ E.coli NR10121 mfd- From Eisen 1998. PhD Thesis.
  • 19. Tree from Woese. 1987. Microbiological Reviews 51:221 Map for Graduate School Lesson 3: Go Fishing Where Nobody Else Has
  • 20. Example II: Rice Microbiomes and Phylogeny Joseph Edwards @Bulk_Soil Sundar @sundarlab Cameron Johnson Srijak Bhatnagar @srijakbhatnagar Edwards et al. 2015. Structure, variation, and assembly of the root-associated microbiomes of rice. PNAS Supplementary Figures1 2 Fig. S1 Map depicting soil collection locations for greenhouse experiment.3 10 234 Fig. S2. Sampling and collection of the rhizocompartments. Roots are collected from rice235 plants and soil is shaken off the roots to leave ~1mm of soil around the roots. The ~1 mm of soil236
  • 21. DNA extraction PCR Sequence rRNA genes Sequence alignment = Data matrixPhylogenetic tree PCR rRNA1 rRNA2 Makes lots of copies of the rRNA genes in sample rRNA1 5’...ACACACATAGGTGGAGCTA GCGATCGATCGA... 3’ E. coli Humans A T T A G A A C A T C A C A A C A G G A G T T C rRNA1 E. coli Humans rRNA2 rRNA2 5’..TACAGTATAGGTGGAGCTAG CGACGATCGA... 3’ rRNA3 5’...ACGGCAAAATAGGTGGATT CTAGCGATATAGA... 3’ rRNA4 5’...ACGGCCCGATAGGTGGATT CTAGCGCCATAGA... 3’ rRNA3 C A C T G T rRNA4 C A C A G T Yeast T A C A G T Yeast rRNA3 rRNA4 Phylogeny PCR and phylogenetic analysis of rRNA genes
  • 22. STAP An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) Dongying Wu1 *, Amber Hartman1,6 , Naomi Ward4,5 , Jonathan A. Eisen1,2,3 1 UC Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 Section of Evolution and Ecology, College of Biological Sciences, University of California Davis, Davis, California, United States of America, 3 Department of Medical Microbiology and Immunology, School of Medicine, University of California Davis, Davis, California, United States of America, 4 Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, United States of America, 5 Center of Marine Biotechnology, Baltimore, Maryland, United States of America, 6 The Johns Hopkins University, Department of Biology, Baltimore, Maryland, United States of America Abstract Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of data has opened many new windows into microbial diversity and evolution, and at the same time has created significant methodological challenges. Those processes which commonly require time-consuming human intervention, such as the preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully- automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages (PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly, this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that are unattainable by manual efforts. Citation: Wu D, Hartman A, Ward N, Eisen JA (2008) An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP). PLoS ONE 3(7): e2566. doi:10.1371/journal.pone.0002566 multiple alignment and phylogeny was deemed unfeasible. However, this we believe can compromise the value of the results. For example, the delineation of OTUs has also been automated via tools that do not make use of alignments or phylogenetic trees (e.g., Greengenes). This is usually done by carrying out pairwise comparisons of sequences and then clustering of sequences that have better than some cutoff threshold of similarity with each other). This approach can be powerful (and reasonably efficient) but it too has limitations. In particular, since multiple sequence alignments are not used, one cannot carry out standard phylogenetic analyses. In addition, without multiple sequence alignments one might end up comparing and contrasting different regions of a sequence depending on what it is paired with. The limitations of avoiding multiple sequence alignments and phylogenetic analysis are readily apparent in tools to classify sequences. For example, the Ribosomal Database Project’s Classifier program [29] focuses on composition characteristics of each sequence (e.g., oligonucleotide frequency) and assigns taxonomy based upon clustering genes by their composition. Though this is fast and completely automatable, it can be misled in cases where distantly related sequences have converged on similar composition, something known to be a major problem in ss-rRNA sequences [30]. Other taxonomy assignment systems focus primarily on the similarity of sequences. The simplest of these is classification tools it does have some limitations. For example, the generation of new alignments for each sequence is both computational costly, and does not take advantage of available curated alignments that make use of ss-RNA secondary structure to guide the primary sequence alignment. Perhaps most importantly however is that the tool is not fully automated. In addition, it does not generate multiple sequence alignments for all sequences in a dataset which would be necessary for doing many analyses. Automated methods for analyzing rRNA sequences are also available at the web sites for multiple rRNA centric databases, such as Greengenes and the Ribosomal Database Project (RDPII). Though these and other web sites offer diverse powerful tools, they do have some limitations. For example, not all provide multiple sequence alignments as output and few use phylogenetic approaches for taxonomy assignments or other analyses. More importantly, all provide only web-based interfaces and their integrated software, (e.g., alignment and taxonomy assignment), cannot be locally installed by the user. Therefore, the user cannot take advantage of the speed and computing power of parallel processing such as is available on linux clusters, or locally alter and potentially tailor these programs to their individual computing needs (Table 1). Given the limited automated tools that are available for Table 1. Comparison of STAP’s computational abilities relative to existing commonly-used ss-RNA analysis tools. STAP ARB Greengenes RDP Installed where? Locally Locally Web only Web only User interface Command line GUI Web portal Web portal Parallel processing YES NO NO NO Manual curation for taxonomy assignment NO YES NO NO Manual curation for alignment NO YES NO* NO Open source YES** NO NO NO Processing speed Fast Slow Medium Medium It is important to note, that STAP is the only software that runs on the command line and can take advantage of parallel processing on linux clusters and, further, is more amenable to downstream code manipulation. * Note: Greengenes alignment output is compatible with upload into ARB and downstream manual alignment. ** The STAP program itself is open source, the programs it depends on are freely available but not open source. doi:10.1371/journal.pone.0002566.t001 ss-rRNA Taxonomy Pipeline STAP database, and the query sequence is aligned to them using the CLUSTALW profile alignment algorithm [40] as described above for domain assignment. By adapting the profile alignment algorithm, th while gaps ar sequence ac Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 STAP database, and the query sequence is aligned to them using the CLUSTALW profile alignment algorithm [40] as described above for domain assignment. By adapting the profile alignment algorithm, the alignments from the STAP database remain intact, while gaps are inserted and nucleotides are trimmed for the query sequence according to the profile defined by the previous alignments from the databases. Thus the accuracy and quality of the alignment generated at this step depends heavily on the quality of the Bacterial/Archaeal ss-rRNA alignments from the Greengenes project or the Eukaryotic ss-rRNA alignments from the RDPII project. Phylogenetic analysis using multiple sequence alignments rests on the assumption that the residues (nucleotides or amino acids) at the same position in every sequence in the alignment are homologous. Thus, columns in the alignment for which ‘‘positional homology’’ cannot be robustly determined must be excluded from subsequent analyses. This process of evaluating homology and eliminating questionable columns, known as masking, typically requires time- consuming, skillful, human intervention. We designed an automat- ed masking method for ss-rRNA alignments, thus eliminating this bottleneck in high-throughput processing. First, an alignment score is calculated for each aligned column by a method similar to that used in the CLUSTALX package [42]. Specifically, an R-dimensional sequence space representing all the possible nucleotide character states is defined. Then for each aligned column, the nucleotide populating that column in each of the aligned sequences is assigned a score in each of the R dimensions (Sr) according to the IUB matrix [42]. The consensus ‘‘nucleotide’’ for each column (X) also has R dimensions, with the Figure 2. Domain assignment. In Step 1, STAP assigns a domain to each query sequence based on its position in a maximum likelihood tree of representative ss-rRNA sequences. Because the tree illustrated here is not rooted, domain assignment would not be accurate and Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 ss-rRNA Taxonomy Pipeline Dongying Wu Amber Hartman Naomi Ward
  • 23. WATERsPage 2 of 14 chimeric sequences generated during PCR identifying closely related sets of sequences (also known as opera- tional taxonomic units or OTUs), removing redundant sequences above a certain percent identity cutoff, assign- ing putative taxonomic identifiers to each sequence or representative of a group, inferring a phylogenetic tree of the sequences, and comparing the phylogenetic structure Figure 1 Overview of WATERS. Schema of WATERS where white boxes indicate "behind the scenes" analyses that are performed in WA- TERS. Quality control files are generated for white boxes, but not oth- erwise routinely analyzed. Black arrows indicate that metadata (e.g., sample type) has been overlaid on the data for downstream interpre- tation. Colored boxes indicate different types of results files that are generated for the user for further use and biological interpretation. Colors indicate different types of WATERS actors from Fig. 2 which were used: green, Diversity metrics, WriteGraphCoordinates, Diversity graphs; blue, Taxonomy, BuildTree, Rename Trees, Save Trees; Create- Unifrac; yellow, CreateOtuTable, CreateCytoscape, CreateOTUFile; white, remaining unnamed actors. Align Check chimeras Cluster Build Tree Assign Taxonomy Tree w/ Taxonomy Diversity statistics & graphs Unifrac files Cytoscape network OTU table Hartman et al 2010. W.A.T.E.R.S.: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences. BMC Bioinformatics 2010, 11:317 doi:10.1186/1471-2105-11-317 Hartman et al. BMC Bioinformatics 2010, 11:317 http://www.biomedcentral.com/1471-2105/11/317 Page 9 of 14 default is 97% and 99%), and they are also generated for every metadata variable comparison that the user includes. Data pruning To assist in troubleshooting and quality control, WATERS returns to the user three fasta files of sequences that were removed at various steps in the workflow. A short_sequences.fas file is created that contains all Figure 3 Biologically similar results automatically produced by WATERS on published colonic microbiota samples. (A) Rarefaction curves sim- ilar to curves shown in Eckburg et al. Fig. 2; 70-72, indicate patient numbers, i.e., 3 different individuals. (B) Weighted Unifrac analysis based on phylo- genetic tree and OTU data produced by WATERS very similar to Eckburg et al. Fig. 3B. (C) Neighbor-joining phylogenetic tree (Quicktree) representing the sequences analyzed by WATERS, which is clearly similar to Fig. S1 in Eckburg et al. BA 3 3HUFHQW YDULDWLRQ H[SODLQHG 33HUFHQWYDULDWLRQH[SODLQHG $% & '( ) 6 $ % & '( ) 6 $ %& ' () 6 3&$ 3 YV 3 C %$&7(52,'(7(6 %$&7(52,'$/(6 '(/7$3527(2%$&7(5,$ $&7,12%$&7(5,$ 9(558&20,&52%,$ (36,/213527(2%$&7(5,$ ),50,&87(6 &/2675,',$ &/2675,',$/(6 *$00$3527(2%$&7(5,$ &<$12%$&7(5,$ $/3+$3527(2%$&7(5,$ )862%$&7(5,$ ),50,&87(6 %$&,//, ),50,&87(6 02//,&87(6 Amber Hartman Bertram Ludaescer
  • 24. alignment used to build the profile, resulting in a multiple sequence alignment of full-length reference sequences and PD versus PID clustering, 2) to explore overlap between PhylOTU clusters and recognized taxonomic designations, and 3) to quantify Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generalize workflow of PhylOTU. See Results section for details. doi:10.1371/journal.pcbi.1001061.g001 Finding Metagenomic OTUs Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 PhylOTU Tom Sharpton @tjsharpton
  • 25. QIIME Phylotyping and Phylogenetic Ecology 296 Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297 compartment in the greenhouse experiment. (A) Number of OTU298 they belong to that are enriched across all rhizocompartments in the299 A subset of the Proteobacteria and the classes and families they belo300 enriched across all rhizocompartments in the greenhouse.301 https://evomics.org/2014/01/the-glories-of-the-gut-ask-a-fat-mouse/
  • 26. QIIME Phylotyping and Phylogenetic Ecology 296 Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297 compartment in the greenhouse experiment. (A) Number of OTU298 they belong to that are enriched across all rhizocompartments in the299 A subset of the Proteobacteria and the classes and families they belo300 enriched across all rhizocompartments in the greenhouse.301 https://evomics.org/2014/01/the-glories-of-the-gut-ask-a-fat-mouse/ Lesson 4: Accept When You Are Defeated
  • 27. Rice Microbiome: Variation w/in Plant Joseph Edwards @Bulk_Soil Sundar @sundarlab Cameron Johnson Srijak Bhatnagar @srijakbhatnagar growth. For our study, the rhizosphere compartment was com- the un sitive t zocomp indicat microb and SI ration the ext terior o (PERM talizati microb P < 0.0 howeve the sec P < 0.0 perform (CAP) iance a Materia PCoA analysi terest t on the soil typ quenci agreem Fig. 1. Root-associated microbial communities are separable by rhizo- compartment and soil type. (A) A representation of a rice root cross-section depicting the locations of the microbial communities sampled. (B) Within- sample diversity (α-diversity) measurements between rhizospheric compart- ments indicate a decreasing gradient in microbial diversity from the rhizo- sphere to the endosphere independent of soil type. Estimated species Edwards et al. 2015. Structure, variation, and assembly of the root-associated microbiomes of rice. PNAS
  • 28. Rice Genotype Affects Microbiome rhizocompartments were analyzed as before. Unfortunately, collection of bulk soil controls for the field experiment was not Fig. 3. Host plant genotype significantly affects microbial communities in the rhizospheric compartments. (A) Ordination of CAP analysis using the WUF metric constrained to rice genotype. (B) Within-sample diversity measurements of rhizosphere samples of each cultivar grown in each soil. Estimated species richness was calculated as eShannon_entropy . The horizontal Edwards et al. 2015. Structure, variation, and assembly of the root-associated microbiomes of rice. PNAS
  • 29. Rice: Cultivation Site Effects Edwards et al. 2015. Structure, variation, and assembly of the root- associated microbiomes of rice. PNAS the field plants again showed that the rhizosphere had the highest microbial diversity, whereas the endosphere had the least found to be enriche greenhouse plants (S OTUs were classifiabl sisted of taxa in the fa and Myxococcaceae, al bidopsis root endosphe Cultivation Practice Result The rice fields that we practices, organic farmi tion called ecofarming farming in that chemica are all permitted but g harvest fumigants are n itself does significantly partments overall (P = a significant interaction the rhizocompartments indicating that the α-d affected differentially by the rhizosphere compa practice, with the mean zospheres than organic Dataset S14), whereas crobial communities (P tests; Dataset S14). Un practices are separable a the WUF metric (Fig.
  • 30. Rice: Functional Enrichment x Genotype and mitochondrial) reads to analyze microbial abundance in the endosphere over time (Fig. 6A). Using this technique, we confirmed the sterility of seedling roots before transplantation. (13 d) approach the endosphere and rhizoplane microbiome compositions for plants that have been grown in the green- house for 42 d. Fig. 5. OTU coabundance network reveals modules of OTUs associated with methane cycling. (A) Subset of the entire network corresponding to 11 modules with methane cycling potential. Each node represents one OTU and an edge is drawn between OTUs if they share a Pearson correlation of greater than or equal to 0.6. (B) Depiction of module 119 showing the relationship between methanogens, syntrophs, methanotrophs, and other methane cycling taxonomies. Each node represents one OTU and is labeled by the presumed function of that OTU’s taxonomy in methane cycling. An edge is drawn between two OTUs if they have a Pearson correlation of greater than or equal to 0.6. (C) Mean abundance profile for OTUs in module 119 across all rhizocompartments and field sites. The position along the x axis corresponds to a different field site. Error bars represent SE. The x and y axes represent no particular scale. Edwards et al. 2015. Structure, variation, and assembly of the root-associated microbiomes of rice. PNAS
  • 31. Rice Developmental Time Series of magnitude greater than in any single plant species Under controlled greenhouse conditions, the rhizocomp described the largest source of variation in the microb munities sampled (Dataset S5A). The pattern of separ tween the microbial communities in each compar consistent with a spatial gradient from the bulk soil a rhizosphere and rhizoplane into the endosphere (F Similarly, microbial diversity patterns within samples same pattern where there is a gradient in α-diversity rhizosphere to the endosphere (Fig. 1B). Enrichment pletion of certain microbes across the rhizocompartme cates that microbial colonization of rice roots is not a process and that plants have the ability to select for ce crobial consortia or that some microbes are better at f root colonizing niche. Similar to studies in Arabidopsis, w that the relative abundance of Proteobacteria is increas endosphere compared with soil, and that the relative abu of Acidobacteria and Gemmatimonadetes decrease from to the endosphere (9–11), suggesting that the distrib different bacterial phyla inside the roots might be simil land plants (Fig. 1D and Dataset S6). Under controlle house conditions, soil type described the second large of variation within the microbial communities of each However, the soil source did not affect the pattern of se between the rhizospheric compartments, suggesting rhizocompartments exert a recruitment effect on micro sortia independent of the microbiome source. By using differential OTU abundance analysis in t partments, we observed that the rhizosphere serves an ment role for a subset of microbial OTUs relative to (Fig. 2). Further, the majority of the OTUs enriche rhizosphere are simultaneously enriched in the rhizoplan endosphere of rice roots (Fig. 2B and SI Appendix, Fig consistent with a recruitment model in which factors pro the root attract taxa that can colonize the endosphere. W that the rhizoplane, although enriched for OTUs that enriched in the endosphere, is also uniquely enriched for of OTUs, suggesting that the rhizoplane serves as a sp Edwards et al. 2015. Structure, variation, and assembly of the root- associated microbiomes of rice. PNAS
  • 32. Tree from Woese. 1987. Microbiological Reviews 51:221 Example III: rRNA Not Perfect Lesson 5: Nothing is Perfect
  • 33. Tree from Woese. 1987. Microbiological Reviews 51:221 Taxa Phylogeny III: rRNA Not Perfect
  • 34. rRNA Copy # Correction by Phylogeny Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi:10.1371/journal.pcbi.1002743 Jessica Green @jessicaleegreen Steven Kembel @stevenkembel Martin Wu
  • 35. DNA extraction PCR Sequence all genes Phylogenetic tree Shotgun GeneX E. coli Humans GeneX Yeast GeneX GeneX Phylotyping Phylogeny in Shotgun Metagenomics
  • 36. RecA vs. rRNA Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
  • 37. RecA vs. rRNA Eisen 1995 Journal of Molecular Evolution 41: 1105-1123.. Lesson 6: Keep Going Back to Your Past
  • 38. Phylotyping w/ Protein Markers AMPHORA http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Alphaproteobacteria Betaproteobacteria G am m aproteobacteria D eltaproteobacteria Epsilonproteobacteria U nclassified proteobacteria Bacteroidetes C hlam ydiae C yanobacteria Acidobacteria Therm otogae Fusobacteria ActinobacteriaAquificae Planctom ycetes Spirochaetes Firm icutes C hloroflexiC hlorobi U nclassified bacteria dnaG frr infC nusA pgk pyrG rplA rplB rplC rplD rplE rplF rplK rplL rplM rplN rplP rplS rplT rpmA rpoB rpsB rpsC rpsE rpsI rpsJ rpsK rpsM rpsS smpB tsf Relativeabundance Martin Wu
  • 39. GOS 1 GOS 2 GOS 3 GOS 4 GOS 5 Phylogenetic ID of Novel Lineages Wu et al PLoS One 2011 Dongying Wu
  • 40. Phylogenetic Diversity of Metagenomes typically used as a qualitative measure because duplicate s quences are usually removed from the tree. However, the test may be used in a semiquantitative manner if all clone even those with identical or near-identical sequences, are i cluded in the tree (13). Here we describe a quantitative version of UniFrac that w call “weighted UniFrac.” We show that weighted UniFrac b haves similarly to the FST test in situations where both a FIG. 1. Calculation of the unweighted and the weighted UniFr measures. Squares and circles represent sequences from two differe environments. (a) In unweighted UniFrac, the distance between t circle and square communities is calculated as the fraction of t branch length that has descendants from either the square or the circ environment (black) but not both (gray). (b) In weighted UniFra branch lengths are weighted by the relative abundance of sequences the square and circle communities; square sequences are weight twice as much as circle sequences because there are twice as many tot circle sequences in the data set. The width of branches is proportion to the degree to which each branch is weighted in the calculations, an gray branches have no weight. Branches 1 and 2 have heavy weigh since the descendants are biased toward the square and circles, respe tively. Branch 3 contributes no value since it has an equal contributio from circle and square sequences after normalization. Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214 Jessica Green Steven Kembel Katie Pollard
  • 41. Phylosift/ pplacer Workflow Input Sequences rRNA workflow protein workflow profile HMMs used to align candidates to reference alignment Taxonomic Summaries parallel option hmmalign multiple alignment LAST fast candidate search pplacer phylogenetic placement LAST fast candidate search LAST fast candidate search search input against references hmmalign multiple alignment hmmalign multiple alignment Infernal multiple alignment LAST fast candidate search <600 bp >600 bp Sample Analysis & Comparison Krona plots, Number of reads placed for each marker gene Edge PCA, Tree visualization, Bayes factor tests eachinputsequencescannedagainstbothworkflows Aaron Darling @koadman Erik Matsen @ematsen Holly Bik @hollybik Guillaume Jospin @guillaumejospin Darling AE, Jospin G, Lowe E, Matsen FA IV, Bik HM, Eisen JA. (2014) PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2:e243 http://dx.doi.org/10.7717/peerj. 243 Erik Lowe
  • 42. Whole Genome Tree of 2000 Taxa Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/ journal.pone.0062510 Jenna Lang @jennnomics Aaron Darling @koadman
  • 43. Phylosift Markers • PMPROK – Dongying Wu’s Bac/Arch markers • Eukaryotic Orthologs – Parfrey 2011 paper • 16S/18S rRNA • Mitochondria - protein-coding genes • Viral Markers – Markov clustering on genomes • Codon Subtrees – finer scale taxonomy • Extended Markers – plastids, gene families
  • 44. PhyEco Markers Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684 Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE 8(10): e77033. doi:10.1371/journal.pone.0077033
  • 45. Edge PCA: Identify lineages that explain most variation among samples Edge PCA - Matsen and Evans 2013 Output: Edge PCA
  • 46. QIIME Phylotyping and Phylogenetic Ecology 296 Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297 compartment in the greenhouse experiment. (A) Number of OTU298 they belong to that are enriched across all rhizocompartments in the299 A subset of the Proteobacteria and the classes and families they belo300 enriched across all rhizocompartments in the greenhouse.301 https://evomics.org/2014/01/the-glories-of-the-gut-ask-a-fat-mouse/ Lesson 7: Don’t Accept When You Are Defeated
  • 48. My Study Organisms Tree from Woese. 1987. Microbiological Reviews 51:221
  • 50. TIGR Genome Projects Tree from Woese. 1987. Microbiological Reviews 51:221
  • 51. 1st Genome Sequence Fleischmann et al. 1995 Lesson 8: If you can’t beat them, critique them or join them
  • 52. • Leveraging an understanding of the evolution of function to better prediction functions Function & Phylogeny
  • 53. PHYLOGENENETIC PREDICTION OF GENE FUNCTION IDENTIFY HOMOLOGS OVERLAY KNOWN FUNCTIONS ONTO TREE INFER LIKELY FUNCTION OF GENE(S) OF INTEREST 1 2 3 4 5 6 3 5 3 1A 2A 3A 1B 2B 3B 2A 1B 1A 3A 1B 2B 3B ALIGN SEQUENCES CALCULATE GENE TREE 1 2 4 6 CHOOSE GENE(S) OF INTEREST 2A 2A 5 3 Species 3Species 1 Species 2 1 1 2 2 2 31 1A 3A 1A 2A 3A 1A 2A 3A 4 6 4 5 6 4 5 6 2B 3B 1B 2B 3B 1B 2B 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Duplication? EXAMPLE A EXAMPLE B Duplication? Duplication? Duplication 5 METHOD Ambiguous Based on Eisen, 1998 Genome Res 8: 163-167. Phylogenomics
  • 54. PHYLOGENENETIC PREDICTION OF GENE FUNCTION IDENTIFY HOMOLOGS OVERLAY KNOWN FUNCTIONS ONTO TREE INFER LIKELY FUNCTION OF GENE(S) OF INTEREST 1 2 3 4 5 6 3 5 3 1A 2A 3A 1B 2B 3B 2A 1B 1A 3A 1B 2B 3B ALIGN SEQUENCES CALCULATE GENE TREE 1 2 4 6 CHOOSE GENE(S) OF INTEREST 2A 2A 5 3 Species 3Species 1 Species 2 1 1 2 2 2 31 1A 3A 1A 2A 3A 1A 2A 3A 4 6 4 5 6 4 5 6 2B 3B 1B 2B 3B 1B 2B 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Duplication? EXAMPLE A EXAMPLE B Duplication? Duplication? Duplication 5 METHOD Ambiguous Based on Eisen, 1998 Genome Res 8: 163-167. Phylogenomics Lesson 9: If you invent your own omics word, you are stuck with it so use it for branding
  • 55. Phylogenomics ~~ Phylotyping Eisen et al. 1992Eisen et al. 1992. J. Bact.174: 3416
  • 56. Phylogenomics ~~ Phylotyping Eisen et al. 1992Eisen et al. 1992. J. Bact.174: 3416 Lesson 10: Stealing (with acknowledgement) is OK
  • 57. Proteorhodopsin Functional Diversity Venter et al., Science 304: 66. 2004
  • 58. • Leveraging understanding of gene gain and loss to better predict genome functions Lesson 11: Who you hang out with matters
  • 59. Carboxydothermus hydrogenoformans • Isolated from a Russian hotspring • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO (Carbon Monoxide) • Produces hydrogen gas • Low GC Gram positive (Firmicute) • Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )
  • 60. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.
  • 61. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
  • 62. Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles)
  • 63. Sporulation Gene Profile Wu et al. 2005 PLoS Genetics 1: e65.
  • 64. B. subtilis new sporulation genes J Bacteriol. 2013 Jan;195(2):253-60. doi: 10.1128/JB.01778-12 Bjorn Traag Richard Losick
  • 65. Tree from Woese. 1987. Microbiological Reviews 51:221 Example V: More Gaps Lesson 12: Keep Returning to the Same Theme Over and Over and Over
  • 66. Yet Another Map Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  • 67. Genomes Poorly Sampled Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  • 68. TIGR Tree of Life Project Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  • 69. Genomic Encyclopedia of Bacteria & Archaea Wu et al. 2009 Nature 462, 1056-1060 Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  • 70. Genomic Encyclopedia of Bacteria & Archaea Wu et al. 2009 Nature 462, 1056-1060 Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  • 71. Family Diversity vs. PD Wu et al. 2009 Nature 462, 1056-1060
  • 72. GEBA Cyanobacteria Shih et al. 2013. PNAS 10.1073/pnas.1217107110 0.3 B1 B2 C1 Paulinella Glaucophyte Green Red Chromalveolates C2 C3 A E F G B3 D A B Fig. mum noba
  • 73. Haloarchaeal GEBA-like Lynch et al. (2012) PLoS ONE 7(7): e41389. doi:10.1371/journal.pone.0041389
  • 74. The Dark Matter of Biology From Wu et al. 2009 Nature 462, 1056-1060
  • 75. 75 Number of SAGs from Candidate Phyla OD1 OP11 OP3 SAR406 Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz GEBA Uncultured
  • 76. JGI Dark Matter Project environmental samples (n=9) isolation of single cells (n=9,600) whole genome amplification (n=3,300) SSU rRNA gene based identification (n=2,000) genome sequencing, assembly and QC (n=201) draft genomes (n=201) SAK HSM ETLTG HOT GOM GBS EPR TAETL T PR EBS AK E SM G TATTG OM OT seawater brackish/freshwater hydrothermal sediment bioreactor GN04 WS3 (Latescibacteria) GN01 +Gí LD1 WS1 Poribacteria BRC1 Lentisphaerae Verrucomicrobia OP3 (Omnitrophica) Chlamydiae Planctomycetes NKB19 (Hydrogenedentes) WYO Armatimonadetes WS4 Actinobacteria Gemmatimonadetes NC10 SC4 WS2 Cyanobacteria :36í2 Deltaproteobacteria EM19 (Calescamantes) 2FW6SDí )HUYLGLEDFWHULD
  • 77. GAL35 Aquificae EM3 Thermotogae Dictyoglomi SPAM GAL15 CD12 (Aerophobetes) OP8 (Aminicenantes) AC1 SBR1093 Thermodesulfobacteria Deferribacteres Synergistetes OP9 (Atribacteria) :36í2 Caldiserica AD3 Chloroflexi Acidobacteria Elusimicrobia Nitrospirae 49S1 2B Caldithrix GOUTA4 6$5 0DULQLPLFURELD
  • 78. Chlorobi )LUPLFXWHV Tenericutes )XVREDFWHULD Chrysiogenetes Proteobacteria )LEUREDFWHUHV TG3 Spirochaetes WWE1 (Cloacamonetes) 70 ZB3 093í 'HLQRFRFFXVí7KHUPXV OP1 (Acetothermia) Bacteriodetes TM7 GN02 (Gracilibacteria) SR1 BH1 OD1 (Parcubacteria) :6 OP11 (Microgenomates) Euryarchaeota Micrarchaea DSEG (Aenigmarchaea) Nanohaloarchaea Nanoarchaea Cren MCG Thaumarchaeota Cren C2 Aigarchaeota Cren pISA7 Cren Thermoprotei Korarchaeota pMC2A384 (Diapherotrites) BACTERIA ARCHAEA archaeal toxins (Nanoarchaea) lytic murein transglycosylase stringent response (Diapherotrites, Nanoarchaea) ppGpp limiting amino acids SpotT RelA (GTP or GDP) + PPi GTP or GDP +ATP limiting phosphate, fatty acids, carbon, iron DksA Expression of components for stress response sigma factor (Diapherotrites, Nanoarchaea) ı4 ȕ ȕ¶ ı2ı3 ı1 -35 -10 Į17' Į7' 51$ SROPHUDVH oxidoretucase + +e- donor e- acceptor H 1 Ribo ADP + 1+2 O Reduction Oxidation H 1 Ribo ADP 1+ O 2H 1$' + H 1$'++ + - HGT from Eukaryotes (Nanoarchaea) Eukaryota O +2+2 OH 1+ 2+3 O O +2+2 1+ 2+3 O tetra- peptide O +2+2 OH 1+ 2+3 O O +2+2 1+ 2+3 O tetra- peptide murein (peptido-glycan) archaeal type purine synthesis (Microgenomates) PurF PurD 3XU1 PurL/Q PurM PurK PurE 3XU PurB PurP ? Archaea adenine guanine O + 12 + 1 1+2 1 1 H H 1 1 1 H H H1 1 H PRPP )$,$5 IMP $,$5 A GUA G U G U A G U A U A U A U Growing AA chain W51$*O
  • 79. recognizes UGA P51$ UGA recoded for Gly (Gracilibacteria) ribosome Woyke et al. Nature 2013.
  • 80. A Genomic Encyclopedia of Microbes (GEM) Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  • 82. A Genomic Encyclopedia of Microbes (GEM) Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  • 83. Tree from Woese. 1987. Microbiological Reviews 51:221 Example VI: Beyond Sequence Lesson 13: Don’t Overdo It With That Theme
  • 85. Wu et al. 2006 PLoS Biology 4: e188. Baumannia makes vitamins and cofactors Sulcia makes amino acids Phylogenetic Binning
  • 86. HiC Crosslinking Sequencing Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, Darling AE. (2014) Strain- and plasmid- level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ 2:e415 http://dx.doi.org/10.7717/peerj.415 Table 1 Species alignment fractions. The number of reads aligning to each replicon present in the synthetic microbial community are shown before and after filtering, along with the percent of total constituted by each species. The GC content (“GC”) and restriction site counts (“#R.S.”) of each replicon, species, and strain are shown. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21. An expanded version of this table can be found in Table S2. Sequence Alignment % of Total Filtered % of aligned Length GC #R.S. Lac0 10,603,204 26.17% 10,269,562 96.85% 2,291,220 0.462 629 Lac1 145,718 0.36% 145,478 99.84% 13,413 0.386 3 Lac2 691,723 1.71% 665,825 96.26% 35,595 0.385 16 Lac 11,440,645 28.23% 11,080,865 96.86% 2,340,228 0.46 648 Ped 2,084,595 5.14% 2,022,870 97.04% 1,832,387 0.373 863 BL21 12,882,177 31.79% 2,676,458 20.78% 4,558,953 0.508 508 K12 9,693,726 23.92% 1,218,281 12.57% 4,686,137 0.507 568 E. coli 22,575,903 55.71% 3,894,739 17.25% 9,245,090 0.51 1076 Bur1 1,886,054 4.65% 1,797,745 95.32% 2,914,771 0.68 144 Bur2 2,536,569 6.26% 2,464,534 97.16% 3,809,201 0.672 225 Bur 4,422,623 10.91% 4,262,279 96.37% 6,723,972 0.68 369 Figure 1 Hi-C insert distribution. The distribution of genomic distances between Hi-C read pairs is shown for read pairs mapping to each chromosome. For each read pair the minimum path length on the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded. The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and plotted. E. coli K12 genome were distributed in a similar manner as previously reported (Fig. 1; (Lieberman-Aiden et al., 2009)). We observed a minor depletion of alignments spanning the linearization point of the E. coli K12 assembly (e.g., near coordinates 0 and 4686137) due to edge eVects induced by BWA treating the sequence as a linear chromosome rather than circular. 10.7717/peerj.415 9/19 Figure 2 Metagenomic Hi-C associations. The log-scaled, normalized number of Hi-C read pairs associating each genomic replicon in the synthetic community is shown as a heat map (see color scale, blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21. reference assemblies of the members of our synthetic microbial community with the same alignment parameters as were used in the top ranked clustering (described above). We first Figure 3 Contigs associated by Hi-C reads. A graph is drawn with nodes depicting contigs and edges depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count thereof depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see legend) with node size reflecting contig size. Contigs below 5 kb and edges with weights less than 5 were excluded. Contig associations were normalized for variation in contig size. typically represent the reads and variant sites as a variant graph wherein variant sites are represented as nodes, and sequence reads define edges between variant sites observed in the same read (or read pair). We reasoned that variant graphs constructed from Hi-C data would have much greater connectivity (where connectivity is defined as the mean path length between randomly sampled variant positions) than graphs constructed from mate-pair sequencing data, simply because Hi-C inserts span megabase distances. Such Figure 4 Hi-C contact maps for replicons of Lactobacillus brevis. Contact maps show the number of Hi-C read pairs associating each region of the L. brevis genome. The L. brevis chromosome (Lac0, (A), Chris Beitel @datscimed Aaron Darling @koadman
  • 87. Sequence Isn’t Everything PB-PSB1 (Purple sulfur bacteria) PB-SRB1 (Sulfate reducing bacteria) (sulfate) (sulfide) Wilbanks, E.G. et al (2014). Environmental Microbiology Lizzy Wilbanks @lizzywilbanks
  • 88. 12 C, 12 C14 N, 32 S Biomass (RGB composite) 0.044 0.080 34S-incorporation (34S/32S ratio) Wilbanks, E.G. et al (2014). Environmental Microbiology Transfer of 34 S from SRB to PSB
  • 89. Long Reads Help, A Lot Hiseq Miseq 100-250 bp Moleculo 2-20 kb Pacbio RSII 2-20kb Micky Kertesz, Tim Blauwcamp Meredith Ashby Cheryl Heiner Illumina-based synthetic long reads” Real-time single molecul sequencing (p4-c2, p5-c3) 295 Megabases 474 Megabases61 Gigabases
  • 91. Lesson 14: Asking for, and getting, help, is a good thing
  • 92. Seagrass Microbiome 1000 samples collected. Not a blade of seagrass touched. YEAR ONE
  • 93.
  • 94. 
 ZEN (Zostera Experimental Network)
 25 partner sites leaves, roots, sediment, and water samples
  • 95.
  • 96.
  • 98. Acknowledgements • GEBA: • $$: DOE-JGI, DSMZ • Eddy Rubin, Phil Hugenholtz, Hans-Peter Klenk, Nikos Kyrpides, Tanya Woyke, Dongying Wu, Aaron Darling, Jenna Lang • GEBA Cyanobacteria • $$: DOE-JGI • Cheryl Kerfeld, Dongying Wu, Patrick Shih • Haloarchaea • $$$ NSF • Marc Facciotti, Aaron Darling, Erin Lynch, • Phylosift • $$$ DHS • Aaron Darling, Erik Matsen, Holly Bik, Guillaume Jospin • iSEEM: • $$: GBMF • Katie Pollard, Jessica Green, Martin Wu, Steven Kembel, Tom Sharpton, Morgan Langille, Guillaume Jospin, Dongying Wu, • aTOL • $$: NSF • Naomi Ward, Jonathan Badger, Frank Robb, Martin Wu, Dongying Wu • Others (not mentioned in detail) • $$: NSF, NIH, DOE, GBMF, DARPA, Sloan • Frank Robb, Craig Venter, Doug Rusch, Shibu Yooseph, Nancy Moran, Colleen Cavanaugh, Josh Weitz • EisenLab: Srijak Bhatnagar, Russell Neches, Lizzy Wilbanks, Holly Bik