2. High-throughput biodiversity research
• Oceanic sediments (covering >70% of the earth’s surface)
harbor the vast majority of the world’s biodiversity
• Microscopic eukaryotes (e.g. nematode
worms, protists, fungi) are diverse and abundant in these
environments
• The taxonomy and functional role of these species (likely
to be significant in marine ecosystems) is not understood
• Informed mitigation and remediation REQUIRE prior
knowledge of biodiversity!
3. -Omic Dictionary
• Marker gene studies – amplification of a
conserved homologous gene (18S, 16S rRNA)
from environmental samples
• Metagenomics – shotgun sequencing of random
genomic fragments from environmental DNA
• Metatranscriptomics– expressed mRNA
transcripts from environmental samples
4. Extract Environmental DNA
EASY
EASY Amplify rRNA
Diverse marine community
Community analysis
VERY
Difficult!!
EASY
High-throughput
sequencing
5. Amplification of 18S rRNA
F04/R22 NF1/18Sr2b
(Region 1) (Region 2)
456 bp ~400 bp
Base Conservation across Metazoa
Nematodes
Primer Sequence
SSU_F04 5’- G C T T G T C T C A A A G A T T A A G C C C -3’
% identity 99 98 98 98 98 98 98 98 99 99 99 100 100 99 99 99 99 99 99 99 99 98 100
SSU_R22 5’- G C C T G C T G C C T T C C T T G G A -3’
% identity 100 100 100 100 100 100 100 100 100 100 98 100 100 100 90 100 90 100 100 100
NF1 5’- G G T G G T G C A T G G C C G T T C T T A G T T -3’
% identity 99 100 100 100 99 100 100 100 100 100 99 97 100 99 99 100 100 98 100 100 98 98 100 100 100
18Sr2b 5’- T A C A A A G G G C A G G G A C G T A A T -3’
% identity 100 88 88 88 88 88 88 88 100 98 98 100 99 99 100 100 100 99 100 99 100 100
6. Key Questions
1) How diverse are marine communities of
microscopic eukaryotes?
2) How structured are these communities in
marine sediments?
3) What has been the effect of anthropogenic
disturbance on these communities?
7. Environmental Taxonomy
(18S rRNA)
Deep sea and shallow water marine sediment
1.2 million reads, 454 GS FLX Titanium Bik et al. (2012), Molecular Ecology
11. Introduction of Bias
• Sampling design (replicates, temporal, gear)
• Preservation and Extraction methods
• Primer bias (marker gene studies)
• PCR bias (template composition, inhibitors)
• Sequencing bias (depth of sequencing, platform
specific considerations)
12. Ha
lice
No. of Reads
ph
a lo
bu
s
50
100
150
200
250
300
350
400
0
n. s
p. 6
H.
B. gal 96
ana eat
to l us
iu s
A. 1
he 7 0
A. lict
i9
B. b esse 4
D it lon yi 9
yle gicau 8
n ch d
u s atus
1r_08
B. sp.
3r_C_09
1r_C_09
kev 199
3r_B_09
1r_B_09
3r_A_09
1r_A_09
in i
B. Z. p 361
hyl
ob u ncta
ian
B. um ta
t 1
B. uscia 60
ho e1
fm 83
B. ann i
egg 1
ers 55
i 14
6
Tri T. lire
ch o llus
d
P. a o ru s
cu m sp.
B in a
B. . se an tus
hel i1
P. f len ic 75
lor us
ide 1
B. n 54
B. fu ngi sis 61
par v 7
aco oru s
rne 153
B.
se x o lus
de
B. nt 172
abr ati 17
up 9
M y B. g tu s 1
o la e 36
imu r beri
sn 169
B. . sp
bo . 23
Pr i
sm real 3
ato is 1
Rh la im 38
ab L us
dit o ngi
id o dor sp.
ide u
s n s sp .
. sp
C. e . 243
Variation in Read Number
B. leg
p a
P. a latze ns
er i ri 1
A. v 7
rh y o rus 1
Par nch . 75
act o fo 8
ino ri
laim 193
us
sp.
Artificial control community – 1 individual per nematode species
Porazinska et al. 2009 Molecular Ecology Resources
13. OTUs as ‘Clouds’
99% cutoff
97% cutoff
How to correlate OTUs
with biological species?
14. Head-Tail Pattern in Nematode OTUs
OCTU Reads OCTU Length Bit Score E-Value Match bp Total bp % Similarity Chimera DB match
27 63 266 525 e-146 265 265 100 Head
Head
-1 B. seani 175
12 9 265 500 e-138 261 264 98.86 -1 B. seani 175
170 8 264 496 e-137 261 264 98.86 0 B. seani 175
513 1 264 494 e-136 259 262 98.85 Tail
-2 B. seani 175
579 2 263 492 e-136 258 261 98.85 -2 B. seani 175
570 1 262 492 e-136 258 261 98.85 -1 B. seani 175
394
19
1
2
263
269
490
488
e-135
e-135
260
264
264
269
98.48
98.14
Tail
1
0
B. seani 175
B. seani 175
658 1 266 486 e-134 260 265 98.11 -1 B. seani 175
412 2 264 480 e-132 260 265 98.11 1 B. seani 175
465 9 254 478 e-132 251 254 98.82 0 B. seani 175
1164 1 268 478 e-132 261 267 97.75 -1 B. seani 175
304 1 261 474 e-130 255 260 98.08 -1 B. seani 175
868 1 244 460 e-126 242 245 98.78 1 B. seani 175
514 2 274 458 e-126 263 272 96.69 -2 B. seani 175
683 1 250 426 e-116 241 249 96.79 -1 B. seani 175
627 1 230 422 e-115 223 226 98.67 -4 B. seani 175
171 3 212 400 e-108 209 211 99.05 -1 B. seani 175
1223 1 202 355 5.00E-95 198 204 97.06 2 B. seani 175
Porazinska et al. 2010 Zootaxa
Artificial control community containing known nematode
species, all with corresponding full length reference 18S sequences
15. Assigning Taxonomy to OTUs
• BLAST approaches: accuracy is critically
dependent on reference databases
• Eukaryote sequence databases are patchy and
sparsely sampled
SILVA 108 Ref rRNA Database (16S/18S)
Bacteria 530,197
Archaea 25,658
Eukaryotes 62,587
16. Errors vs. Rare Taxa
• Chimeras – hybrid sequences formed during PCR that
do not exist in nature
• ‘Jumping off points’ in conserved amplicon regions
• Mostly low-read OTUs restricted to single samples
How do we separate the ‘rare biosphere’ from erroneous
sequences?
17. Important Challenges
Phylogenetic
– rRNA data needs to be interpreted in a phylogenetic
context, but eukaryotic guide trees are not
comprehensive
– Phylogenetic placement of short sequences can help
you identify taxon sampling problems in the reference
dataset that would not be obvious by BLAST searches
19. Explicitly Phylogenetic Approaches
Aligned Evolutionary
OTU sequences Placement of Edge PCoA
short reads
Community
Guide Tree ‘fingerprints’
Taxonomy
assignment, Exploiting head-tail
21. Development of new tools
How does OTU picking
affect biological
interpretations of
sequence data
Shift towards Illumina…
processing 10x as much data?!
22. Visualization
Sample Sites
Visuals tools for
enabling novel
Abundance (vertical)
scientific discovery
OTUs / Species
23. Important Challenges
Metadata
– Genbank’s Short Read Archive is not accessible
– MOTUs (Molecular Operational Taxonomic Units)
are arbitrary constructions
Pressing need for open access database resources for metadata
analysis and comparative studies
24.
25. Tools for Computational Analysis
QIIME is popular and easy to use – available on Amazon Cloud
if researchers don’t have local bioinformatic facilities
26. Acknowledgements
UC Davis
• Jonathan Eisen
• Aaron Darling
• Guillaume Jospin
Former Lab Members
• W. Kelley Thomas (Univ. of New Hampshire)
• Way Sung (Univ. of New Hampshire)
• Feseha Abebe-Akele (Univ. of New Hampshire)
Collaborators
• Simon Creer (Univ. of Wales, Bangor)
• Vera Fonseca (Univ. of Wales, Bangor)
• Dorota Porazinska (Univ. of Florida)
• Robin Giblin-Davis (Univ. of Florida)
• Jyotsna Sharma (University of Texas, San Antonio)
• Ken Halanych (Auburn University)
Notas del editor
These primers are HIGHLY CONSERVED. We use 18S because there is no other option for broad taxonomic coverage – universal COI primers don’t work for meiofauna
1) Looking at diversity and species assemblages (relative abundances)2) Cosmopolitan or regionally restricted taxa? Using phylogeographic patterns to infer global biogeography for microbial eukaryote taxa3) Work in the Gulf of Mexico following Deepwater Horizon oil spill
High-throughput sequencing have revolutionized studies of environmentsBut limits of BLAST (no match/enviromental)
These patterns were consistent regardless of CLUSTERING cutoff or 18S LOCUS
Read patterns were consistent and replicable across PCRs
We can see individual specimens, so we know how sequence data relates to an concerted evolution is an incomplete process and we have to deal with intragenomic variation across these copies.OTUs are arbitrary units and one cutoff is not likely to be universally applicable across all taxa (vs. microbial protocols, 97% = a species)
The second challenge is assigning accurate taxonomy to OTUs that we do define (assuming not everything is a species)
Chimeras typically show up as low-copyA rare biosphere certainly does exist and it can be ecologically important
Head-tail patterns may help us to delimit species and separate out rare taxa (who will have Head-tail patterns) from errors (no apparent pattern)
Marker genes across all domains – bacteria, archeaa,eukaryotes & virusesrRNA genes,Protein-coding orthologs, lineage-specific gene families
For a deeper discussion of some of the things I’ve brushed on, I’ll refer you to our recent review in TREE
So with that I’d just like to thank my current and former lab members and collaborators. And I’ll take any questions.