SlideShare una empresa de Scribd logo
1 de 81
Descargar para leer sin conexión
C.Titus Brown
Assistant Professor
CSE, MMG, BEACON
Michigan State University
August 2013
ctb@msu.edu
Assembling metagenomes: a not so practical guide
Acknowledgements
Lab members involved Collaborators
—  Adina Howe (w/Tiedje)
—  Jason Pell
—  Arend Hintze
—  Rosangela Canino-Koning
—  Qingpeng Zhang
—  Elijah Lowe
—  Likit Preeyanon
—  Jiarong Guo
—  Tim Brom
—  Kanchan Pavangadkar
—  Eric McDonald
—  JimTiedje, MSU; Janet
Jansson, JGI; Susannah
Tringe, JGI.
Funding
USDA NIFA; NSF IOS;
BEACON, NIH.
Acknowledgements
Lab members involved Collaborators
—  Adina Howe (w/Tiedje)
—  Jason Pell
—  Arend Hintze
—  Rosangela Canino-Koning
—  Qingpeng Zhang
—  Elijah Lowe
—  Likit Preeyanon
—  Jiarong Guo
—  Tim Brom
—  Kanchan Pavangadkar
—  Eric McDonald
—  JimTiedje, MSU; Janet
Jansson, JGI; Susannah
Tringe, JGI.
Funding
USDA NIFA; NSF IOS;
BEACON, NIH.
Shotgun sequencing and coverage
Genome (unknown)
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Reads
(randomly chosen;
have errors)
X
XX
“Coverage” is simply the average number of reads that overlap
each true base in genome.
Here, the coverage is ~10 – just draw a line straight down from the top through all
of the reads.
Random sampling => deep sampling
needed
Typically 10-100x needed for robust recovery (300 Gbp for human)
Coverage distribution matters!
(MD amplified)
Assembly depends on high coverage
Shared low-level
transcripts may not
reach the threshold
for assembly.
K-mer based assemblers scale poorly
Why do big data sets require big machines??
Memory usage ~ “real” variation + number of errors
Number of errors ~ size of data set
GCGTCAGGTAGGAGACCGCCGCCATGGCGACGATG
GCGTCAGGTAGGAGACCACCGTCATGGCGACGATG
GCGTCAGGTAGCAGACCACCGCCATGGCGACGATG
GCGTTAGGTAGGAGACCACCGCCATGGCGACGATG
Conway T C , Bromage A J Bioinformatics 2011;27:479-486
© The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions,
please email: journals.permissions@oup.com
De Bruijn graphs scale poorly with data size
Practical memory measurements
Velvet measurements (Adina Howe)
How much data do we need? (I)
(“More” is rather vague…)
Putting it in perspective:
Total equivalent of ~1200 bacterial genomes
Human genome ~3 billion bp
Assembly results for Iowa corn and prairie
(2x ~300 Gbp soil metagenomes)
Total
Assembly
Total Contigs
(> 300 bp)
% Reads
Assembled
Predicted
protein
coding
2.5 bill 4.5 mill 19% 5.3 mill
3.5 bill 5.9 mill 22% 6.8 mill
Adina Howe
Resulting contigs are low coverage.
Figure 11: Coverage (median basepair) distribution of assembled contigs from soil metagenomes.
How much? (II)
—  Suppose we need 10x coverage to assemble a microbial
genome, and microbial genomes average 5e6 bp of DNA.
—  Further suppose that we want to be able to assemble a
microbial species that is “1 in a 100000”, i.e. 1 in 1e5.
—  Shotgun sequencing samples randomly, so must sample deeply
to be sensitive.
10x coverage x 5e6 bp x 1e5 =~ 50e11, or 5 Tbp of sequence.
Currently this would cost approximately $100k, for 10 full
Illumina runs, but in a year we will be able to do it for much
less.
We can estimate sequencing req’d:
http://ivory.idyll.org/blog/how-much-sequencing-is-needed.html
“Whoa, that’s a lot of data…”
http://ivory.idyll.org/blog/how-much-sequencing-is-needed.html
Some approximate metagenome sizes
—  Deep carbon mine data set: 60 Mbp (x 10x => 600 Mbp)
—  Great Prairie soil: 12 Gbp (4x human genome)
—  Amazon Rain Forest Microbial Observatory soil: 26 Gbp
How can we scale assembly!?
—  We have developed two prefiltering approaches.
—  Essentially, we preprocess your reads and (a) normalize their
coverage and (b) subdivide them by graph partition.
“We take your reads and make them better! Satisfaction guaranteed or your
money back!*”
*Terms may not apply to NSF, NIH, and USDA funding bodies.
Approach I: Digital normalization
(a computational version of library normalization)
Species A
Species B
Ratio 10:1
Unnecessary data
81%
Suppose you have a dilution
factor ofA (10) to B(1). To get
10x of B you need to get 100x
ofA! Overkill!!
This 100x will consume disk
space and, because of errors,
memory.
We can discard it for you…
We only need ~5x at each point.
Genome (unknown)
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Reads
(randomly chosen;
have errors)
X
XX
“Coverage” is simply the average number of reads that overlap
each true base in genome.
Here, the coverage is ~10 – just draw a line straight down from the top through all
of the reads.
True sequence (unknown)
Reads
(randomly sequenced)
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
X
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
X
X
X
X
X
X
X
X
X
X
X
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
X
X
X
X
X
X
X
X
X
X
X
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
X
X
X
X
X
X
X
X
X
If next read is from a high
coverage region - discard
X
X
Digital normalization
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Redundant reads
(not needed for assembly)
A read’s median k-mer count is a good
estimator of “coverage”.
This gives us a
reference-free
measure of coverage.
Digital normalization approach
A digital analog to cDNA library normalization, diginorm:
—  Is single pass: looks at each read only once;
—  Does not “collect” the majority of errors;
—  Keeps all low-coverage reads;
—  Smooths out coverage of regions.
Coverage before digital normalization:
(MD amplified)
Coverage after digital normalization:
Normalizes coverage
Discards redundancy
Eliminates majority of
errors
Scales assembly dramatically.
Assembly is 98% identical.
Digital normalization approach
A digital analog to cDNA library normalization, diginorm is a
read prefiltering approach that:
—  Is single pass: looks at each read only once;
—  Does not “collect” the majority of errors;
—  Keeps all low-coverage reads;
—  Smooths out coverage of regions.
Contig assembly is significantly more efficient and now
scales with underlying genome size
—  Transcriptomes, microbial genomes incl MDA, and most
metagenomes can be assembled in under 50 GB of RAM,
with identical or improved results.
Digital normalization retains information, while
discarding data and errors
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
Raw data
(~10-100 GB)
Analysis "Information"
~1 GB
"Information"
"Information"
"Information"
"Information"
Database &
integration
We can use lossy compression approaches to make downstream
analysis faster and better.
~2 GB – 2TB of single-chassis RAM
Metagenomes: Data partitioning
(a computational version of cell sorting)
Split reads into “bins”
belonging to different
source species.
Can do this based almost
entirely on connectivity of
sequences.
“Divide and conquer”
Memory-efficient
implementation helps to
scale assembly.
Pell et al., 2012, PNAS
Partitioning separates reads by genome.
When computationally spiking HMP mock data with one E.coli genome
(left) or multiple E.coli strains (right), majority of partitions contain reads
from only a single genome (blue) vs multi-genome partitions (green).
Partitions containing spiked data indicated with a * Adina Howe
**
Partitioning: Technical challenges met (and
defeated)
—  Novel data structure properties elucidated via percolation
theory analysis (Pell et al., 2012, PNAS).
—  Exhaustive in-memory traversal of graphs containing 5-15
billion nodes.
—  Sequencing technology introduces false sequences in graph
(Howe et al., submitted.)
—  Only 20x improvement in assembly scaling L.
Is your assembly good?
—  Truly reference-free assembly is hard to evaluate.
—  Traditional “genome” measures like N50 and NG50 simply do
not apply to metagenomes, because very often you don’t
know what the genome “size” is.
Evaluating assembly
Predicted genome.
X
X
X
X
X
X
X
X
XX
Reads - noisy observations
of some genome.
Assembler
(a Big Black Box)
Evaluating correctness of metagenomes is still undiscovered country.
Assembler overlap?
See tutorial.
What’s the best assembler?
-25
-20
-15
-10
-5
0
5
10
BCM* BCM ALLP NEWB SOAP** MERAC CBCB SOAP* SOAP SGA PHUS RAY MLK ABL
CumulativeZ-scorefromrankingmetrics
Bird assembly
Bradnam et al.,Assemblathon 2:
http://arxiv.org/pdf/1301.5406v1.pdf
What’s the best assembler?
-8
-6
-4
-2
0
2
4
6
8
BCM CSHL CSHL* SYM ALLP SGA SOAP* MERAC ABYSS RAY IOB CTD IOB* CSHL** CTD** CTD*
CumulativeZ-scorefromrankingmetrics
Fish assembly
Bradnam et al.,Assemblathon 2:
http://arxiv.org/pdf/1301.5406v1.pdf
What’s the best assembler?
Bradnam et al.,Assemblathon 2:
http://arxiv.org/pdf/1301.5406v1.pdf
-14
-10
-6
-2
2
6
10
SGA PHUS MERAC SYMB ABYSS CRACS BCM RAY SOAP GAM CURT
CumulativeZ-scorefromrankingmetrics
Snake assembly
Answer: for eukaryotic genomes, it
depends
—  Different assemblers perform differently, depending on
—  Repeat content
—  Heterozygosity
—  Generally the results are very good (est completeness, etc.)
but different between different assemblers (!)
—  There Is No OneAnswer.
Estimated completeness: CEGMA
Each assembler lost different ~5% CEGs
Bradnam et al.,Assemblathon 2:
http://arxiv.org/pdf/1301.5406v1.pdf
Tradeoffs in N 50 and % incl.
0
10
20
30
40
50
60
70
80
90
100
110
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
16,000,000
18,000,000
BCM ALLP SOAP NEWB MERAC SGA CBCB PHUS RAY MLK ABL
%ofestimatedgenomesizepresentinscaffolds>=25Kbp
NG50scaffoldlength(bp)
Bird assembly
Bradnam et al.,Assemblathon 2:
http://arxiv.org/pdf/1301.5406v1.pdf
Evaluating assemblies
—  Every assembly returns different results for eukaryotic genomes.
—  For metagenomes, it’s even worse.
—  No systematic exploration of precision, recall, etc.
—  Very little in the way of cross-comparable data sets
—  Often sequencing technology being evaluated is out of date
—  etc. etc.
Our experience
—  Our metagenome assemblies compare well with others, but
we have little in the way of ground truth with which to
evaluate.
—  Scaffold assembly is tricky; we believe in contig assembly for
metagenomes, but not scaffolding.
—  See arXiv paper,“Assembling large, complex metagenomes”,
for our suggested pipeline and statistics & references.
Metagenomic assemblies are highly variable
Adina Howe et al., arXiv 1212.0159
How to choose a metagenome
assembler
—  Try a few.
—  Use what seems to perform best (most bp > some
minimum)
—  I’ve heard/read good things about
—  MetaVelvet
—  Ray Meta
—  IDBA-UD
—  Our pipeline doesn’t specify an assembler.
Adapter trim &
quality filter
Diginorm to C=10
Trim high-
coverage reads at
low-abundance
k-mers
Diginorm to C=5
Partition
graph
Split into "groups"
Reinflate groups
(optional
Assemble!!!
Map reads to
assembly
Too big to
assemble?
Small enough to assemble?
Annotate contigs
with abundances
MG-RAST, etc.
The Battle Creek Metagenomics Pipeline
Adapter trim &
quality filter
Diginorm to C=10
Trim high-
coverage reads at
low-abundance
k-mers
Diginorm to C=5
Partition
graph
Split into "groups"
Reinflate groups
(optional
Assemble!!!
Map reads to
assembly
Too big to
assemble?
Small enough to assemble?
Annotate contigs
with abundances
MG-RAST, etc.
Diginorm
Adapter trim &
quality filter
Diginorm to C=10
Trim high-
coverage reads at
low-abundance
k-mers
Diginorm to C=5
Partition
graph
Split into "groups"
Reinflate groups
(optional
Assemble!!!
Map reads to
assembly
Too big to
assemble?
Small enough to assemble?
Annotate contigs
with abundances
MG-RAST, etc.
Diginorm Partitioning
Thoughts on our pipeline
—  Should work with any metagenome; very generic approach.
—  Diginorm can be decoupled from partitioning;
—  People report that diginorm “just works”;
—  Partitioning is trickier and only needed for REALLY BIG data sets.
—  Diginorm does interact with some assemblers in a funny way, so
suggest starting withVelvet and/or reinflating your partitions.
—  This pipeline, esp diginorm part, is faster and lower memory than any
other assembler out there (well, except maybe Minia).
Deep Carbon data set
—  Name: DCO_TCO_MM5
—  Masimong Gold Mine; microbial cells filtered from fracture
water from within a 1.9km borehole. (32,000 year old
water?!)
—  M.C.Y. Lau, C. Magnabosco, S. Grim, G. Lacrampe
Couloume, K.Wilkie, B. Sherwood Lollar, D.N. Simkus, G.F.
Slater, S. Hendrickson, M. Pullin,T.L. Kieft, O. Kuloyo, B.
Linage, G. Borgonie, E. van Heerden, J.Ackerman, C. van
Jaarsveld, andT.C. Onstott
DCO_TCO_MM520m reads / 2.1 Gbp
5.6m reads / 601.3 Mbp
Adapter trim &
quality filter
Diginorm to C=10
Trim high-
coverage reads at
low-abundance
k-mers
“Could you take a look at this? MG-
RAST is telling us we have a lot of
artificially duplicated reads, i.e. the
data is bad.”
Entire process took ~4 hours of
computation, or so.
(Minimum genome size est: 60.1 Mbp)
Assembly stats:
All	
  con'gs	
   Con'gs	
  >	
  1kb	
  
k	
   N	
  con'gs	
   Sum	
  BP	
   N	
  con'gs	
   Sum	
  BP	
  
Max	
  
con'g	
  
21	
   343263	
   63217837	
   6271	
   10537601	
   9987	
  
23	
   302613	
   63025311	
   7183	
   13867885	
   21348	
  
25	
   276261	
   62874727	
   7375	
   15303646	
   34272	
  
27	
   258073	
   62500739	
   7424	
   16078145	
   48742	
  
29	
   242552	
   62001315	
   7349	
   16426147	
   48746	
  
31	
   228043	
   61445912	
   7307	
   16864293	
   48750	
  
33	
   214559	
   60744478	
   7241	
   17133827	
   48768	
  
35	
   203292	
   60039871	
   7129	
   17249351	
   45446	
  
37	
   189948	
   58899828	
   7088	
   17527450	
   59437	
  
39	
   180754	
   58146806	
   7027	
   17610071	
   54112	
  
41	
   172209	
   57126650	
   6914	
   17551789	
   65207	
  
43	
   165563	
   56440648	
   6925	
   17654067	
   73231	
  
DCO_TCO_MM5
(Minimum genome size est: 60.1 Mbp)
Chose two:
—  A: k=43 (“long contigs”)
—  165563 contigs
—  56.4 Mbp
—  longest contig: 73231 bp
—  B: k=43 (“high recall”)
—  343263 contigs
—  63.2 Mbp
—  longest contig is 9987 bp
DCO_TCO_MM5
How to evaluate??
How many reads map back?
Mapped 3.8m paired-end reads (one subsample):
—  high-recall: 41% of pairs map
—  longer-contigs: 70% of pairs map
+ 150k single-end reads:
—  high-recall: 49% of sequences map
—  longer-contigs: 79% of sequences map
Annotation/exploration with MG-RAST
—  You can upload genome assemblies to MG-RAST, and
annotate them with coverage; tutorial to follow.
—  What does MG-RAST do?
Conclusion
—  This is a pretty good metagenome assembly – > 80%
of reads map!
—  Surprised that the larger dataset (6.32 Mbp,“high recall”)
accounts for a smaller percentage of the reads – 49% vs 79%
for the 56.4 Mbp “long contigs” data set.
—  I now suspect that different parameters are recovering
different subsets of the sample…
—  Don’t trust MG-RASTADR calls.
DCO_TCO_MM5
A few notes --
You can estimate metagenome size…
Estimates of metagenome size
Calculation: # reads * (avg read len) / (diginorm coverage)
Assumes: few entirely erroneous reads (upper bound);
saturation (lower bound).
—  E.coli:384k * 86.1 / 5.0 => 6.6 Mbp est. (true: 4.5 Mbp)
—  MM5 deep carbon: 60 Mbp
—  Great Prairie soil: 12 Gbp
—  Amazon Rain Forest Microbial Observatory: 26 Gbp
Diginorm changes your coverage.
DN	
   Reinflated	
  
k	
   N	
  con'gs	
   bp	
   longest	
   N	
  con'gs	
   bp	
   longest	
  
21	
   24	
   441844	
   80662	
   31	
   439074	
   79170	
  
23	
   13	
   443330	
   86040	
   24	
   437988	
   80488	
  
25	
   12	
   443565	
   84324	
   24	
   426949	
   84286	
  
27	
   11	
   443256	
   89835	
   23	
   385473	
   89795	
  
29	
   11	
   443665	
   89748	
   11	
   285725	
   89809	
  
31	
   10	
   440919	
   102131	
   11	
   286508	
   89810	
  
33	
   12	
   432320	
   85175	
   15	
   282373	
   85210	
  
35	
   15	
   423541	
   85177	
   15	
   276158	
   85177	
  
37	
   14	
   352233	
   121539	
   14	
   278537	
   85184	
  
39	
   16	
   322968	
   121538	
   10	
   276068	
   85187	
  
41	
   20	
   393501	
   121545	
   8	
   278483	
   85211	
  
43	
   25	
   363656	
   121624	
   6	
   278380	
   121462	
  
Contigs > 1kb
http://ivory.idyll.org/blog/the-k-parameter.html
Extracting whole genomes?
So far, we have only assembled contigs, but not whole genomes.
Can entire genomes be
assembled from metagenomic
data?
Iverson et al. (2012), from
theArmbrust lab, contains a
technique for scaffolding
metagenome contigs into
~whole genomes. YES.
Concluding thoughts
—  What works?
—  What needs work?
—  What will work?
What works?
Today,
—  From deep metagenomic data, you can get the gene and
operon content (including abundance of both) from
communities.
—  You can get microarray-like expression information from
metatranscriptomics.
What needs work?
—  Assembling ultra-deep samples is going to require more
engineering, but is straightforward. (“Infinite assembly.”)
—  Building scaffolds and extracting whole genomes has been
done, but I am not yet sure how feasible it is to do
systematically with existing tools (c.f.Armbrust Lab).
What will work, someday?
—  Sensitive analysis of strain variation.
—  Both assembly and mapping approaches do a poor job detecting
many kinds of biological novelty.
—  The 1000 Genomes Project has developed some good tools that
need to be evaluated on community samples.
—  Ecological/evolutionary dynamics in vivo.
—  Most work done on 16s, not on genomes or functional content.
—  Here, sensitivity is really important!
The interpretation challenge
—  For soil, we have generated approximately 1200 bacterial genomes
worth of assembled genomic DNA from two soil samples.
—  The vast majority of this genomic DNA contains unknown genes
with largely unknown function.
—  Most annotations of gene function & interaction are from a few
phylogenetically limited model organisms
—  Est 98% of annotations are computationally inferred: transferred
from model organisms to genomic sequence, using homology.
—  Can these annotations be transferred? (Probably not.)
This will be the biggest sequence analysis challenge of the
next 50 years.
What are future needs?
—  High-quality, medium+ throughput annotation of genomes?
—  Extrapolating from model organisms is both immensely
important and yet lacking.
—  Strong phylogenetic sampling bias in existing annotations.
—  Synthetic biology for investigating non-model organisms?
(Cleverness in experimental biology doesn’t scale L)
—  Integration of microbiology, community ecology/evolution
modeling, and data analysis.
Papers on our work.
—  2012 PNAS, Pell et al., pmid 22847406 (partitioning).
—  Submitted, Brown et al., arXiv:1203.4802 (digital normalization).
—  Submitted, Howe et al, arXiv: 1212.0159 (artifact removal from
Illumina metagenomes).
—  Submitted, Howe et al., arXiv: 1212.2832 –Assembling large,
complex environmental metagenomes.
—  In preparation, Zhang et al. – efficient k-mer counting.
Recommended reading
—  “Comparative metagenomic and rRNA microbial diversity
characterization using archaeal and bacterial synthetic
communities.” Shakya et al., pmid 23387867.
—  Good benchmark data set!
“The results … indicate that a single gene marker such as rRNA is a poor
determinant of the community structure in metagenomic sequence data
from complex communities.”

Más contenido relacionado

La actualidad más candente

Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Keith Bradnam
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics Christopher Mason
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writingKeith Bradnam
 
Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestKeith Bradnam
 
2014 khmer protocols
2014 khmer protocols2014 khmer protocols
2014 khmer protocolsc.titus.brown
 
What's in a name? Better vocabularies = better bioinformatics?
What's in a name? Better vocabularies = better bioinformatics?What's in a name? Better vocabularies = better bioinformatics?
What's in a name? Better vocabularies = better bioinformatics?Keith Bradnam
 
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsJonathan Eisen
 
Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Keith Bradnam
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Keith Bradnam
 
Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Keith Bradnam
 
Aug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsAug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsGenomeInABottle
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollGenomeInABottle
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han caoGenomeInABottle
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesScott Edmunds
 

La actualidad más candente (20)

Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writing
 
Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contest
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
2014 khmer protocols
2014 khmer protocols2014 khmer protocols
2014 khmer protocols
 
What's in a name? Better vocabularies = better bioinformatics?
What's in a name? Better vocabularies = better bioinformatics?What's in a name? Better vocabularies = better bioinformatics?
What's in a name? Better vocabularies = better bioinformatics?
 
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1
 
Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2
 
Aug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsAug2015 analysis team spiral genetics
Aug2015 analysis team spiral genetics
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carroll
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han cao
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
 

Destacado

Using Web Analytics Insights To Sell More
Using Web Analytics Insights To Sell MoreUsing Web Analytics Insights To Sell More
Using Web Analytics Insights To Sell MoreDennis Mortensen
 
Steve Rice Los Gatos: Financial Tips For Recent College Graduates
Steve Rice Los Gatos: Financial Tips For Recent College GraduatesSteve Rice Los Gatos: Financial Tips For Recent College Graduates
Steve Rice Los Gatos: Financial Tips For Recent College GraduatesSteve Rice Los Gatos
 
Making Connections : A talk on mobile engagement for UNICEF Japan
Making Connections : A talk on mobile engagement for UNICEF JapanMaking Connections : A talk on mobile engagement for UNICEF Japan
Making Connections : A talk on mobile engagement for UNICEF JapanYounghee Jung
 
9845312548-Dashboard-Guide
9845312548-Dashboard-Guide9845312548-Dashboard-Guide
9845312548-Dashboard-GuideAdam Corum
 
Market research analysis 2
Market research analysis 2Market research analysis 2
Market research analysis 2mcgowanjhhs1
 
Magnetismo2 3
Magnetismo2 3Magnetismo2 3
Magnetismo2 3clausgon
 
Laura's_Cardona_ Portfolio2015
Laura's_Cardona_ Portfolio2015Laura's_Cardona_ Portfolio2015
Laura's_Cardona_ Portfolio2015Laura Cardona
 
Tandoori chicken
Tandoori chickenTandoori chicken
Tandoori chickenDLSU-CEFIW
 
La cuestión del presupuesto nacional para educación 2013
La cuestión del presupuesto nacional para educación 2013La cuestión del presupuesto nacional para educación 2013
La cuestión del presupuesto nacional para educación 2013Laura Marrone
 
Social Media for Atlantic Canada Marketing
Social Media for Atlantic Canada MarketingSocial Media for Atlantic Canada Marketing
Social Media for Atlantic Canada MarketingMediaBadger
 
Global agriculture market drivers and outlook 2014
Global agriculture market drivers and outlook 2014   Global agriculture market drivers and outlook 2014
Global agriculture market drivers and outlook 2014 ipga
 
Ubuntu migration at Zaragoza City Council v3
Ubuntu migration at Zaragoza City Council v3Ubuntu migration at Zaragoza City Council v3
Ubuntu migration at Zaragoza City Council v3Eduardo Romero Moreno
 
How to succeed at hiring without really trying
How to succeed at hiring without really tryingHow to succeed at hiring without really trying
How to succeed at hiring without really tryingMelinda Seckington
 
Kuntien haaste: Enemmän vähemmällä
Kuntien haaste: Enemmän vähemmälläKuntien haaste: Enemmän vähemmällä
Kuntien haaste: Enemmän vähemmälläJyrki Kasvi
 

Destacado (20)

Using Web Analytics Insights To Sell More
Using Web Analytics Insights To Sell MoreUsing Web Analytics Insights To Sell More
Using Web Analytics Insights To Sell More
 
Cross references
Cross referencesCross references
Cross references
 
Steve Rice Los Gatos: Financial Tips For Recent College Graduates
Steve Rice Los Gatos: Financial Tips For Recent College GraduatesSteve Rice Los Gatos: Financial Tips For Recent College Graduates
Steve Rice Los Gatos: Financial Tips For Recent College Graduates
 
Making Connections : A talk on mobile engagement for UNICEF Japan
Making Connections : A talk on mobile engagement for UNICEF JapanMaking Connections : A talk on mobile engagement for UNICEF Japan
Making Connections : A talk on mobile engagement for UNICEF Japan
 
9845312548-Dashboard-Guide
9845312548-Dashboard-Guide9845312548-Dashboard-Guide
9845312548-Dashboard-Guide
 
Календари портфолио
Календари портфолиоКалендари портфолио
Календари портфолио
 
Market research analysis 2
Market research analysis 2Market research analysis 2
Market research analysis 2
 
CV 2015
CV 2015CV 2015
CV 2015
 
Magnetismo2 3
Magnetismo2 3Magnetismo2 3
Magnetismo2 3
 
Laura's_Cardona_ Portfolio2015
Laura's_Cardona_ Portfolio2015Laura's_Cardona_ Portfolio2015
Laura's_Cardona_ Portfolio2015
 
Tandoori chicken
Tandoori chickenTandoori chicken
Tandoori chicken
 
Задачі
ЗадачіЗадачі
Задачі
 
La cuestión del presupuesto nacional para educación 2013
La cuestión del presupuesto nacional para educación 2013La cuestión del presupuesto nacional para educación 2013
La cuestión del presupuesto nacional para educación 2013
 
Social Media for Atlantic Canada Marketing
Social Media for Atlantic Canada MarketingSocial Media for Atlantic Canada Marketing
Social Media for Atlantic Canada Marketing
 
Sustainia
SustainiaSustainia
Sustainia
 
Global agriculture market drivers and outlook 2014
Global agriculture market drivers and outlook 2014   Global agriculture market drivers and outlook 2014
Global agriculture market drivers and outlook 2014
 
Ubuntu migration at Zaragoza City Council v3
Ubuntu migration at Zaragoza City Council v3Ubuntu migration at Zaragoza City Council v3
Ubuntu migration at Zaragoza City Council v3
 
Organic Farming in Laos
Organic Farming in LaosOrganic Farming in Laos
Organic Farming in Laos
 
How to succeed at hiring without really trying
How to succeed at hiring without really tryingHow to succeed at hiring without really trying
How to succeed at hiring without really trying
 
Kuntien haaste: Enemmän vähemmällä
Kuntien haaste: Enemmän vähemmälläKuntien haaste: Enemmän vähemmällä
Kuntien haaste: Enemmän vähemmällä
 

Similar a 2013 stamps-assembly-methods.pptx

2013 hmp-assembly-webinar
2013 hmp-assembly-webinar2013 hmp-assembly-webinar
2013 hmp-assembly-webinarc.titus.brown
 
2013 talk at TGAC, November 4
2013 talk at TGAC, November 42013 talk at TGAC, November 4
2013 talk at TGAC, November 4c.titus.brown
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizonac.titus.brown
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talkc.titus.brown
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streamingc.titus.brown
 
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...c.titus.brown
 
2012 erin-crc-nih-seattle
2012 erin-crc-nih-seattle2012 erin-crc-nih-seattle
2012 erin-crc-nih-seattlec.titus.brown
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
ASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop SlidesASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop SlidesAdina Chuang Howe
 
U Florida / Gainesville talk, apr 13 2011
U Florida / Gainesville  talk, apr 13 2011U Florida / Gainesville  talk, apr 13 2011
U Florida / Gainesville talk, apr 13 2011c.titus.brown
 
2013 bms-retreat-talk
2013 bms-retreat-talk2013 bms-retreat-talk
2013 bms-retreat-talkc.titus.brown
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-datac.titus.brown
 
Scaling metagenome assembly
Scaling metagenome assemblyScaling metagenome assembly
Scaling metagenome assemblyc.titus.brown
 

Similar a 2013 stamps-assembly-methods.pptx (20)

2012 oslo-talk
2012 oslo-talk2012 oslo-talk
2012 oslo-talk
 
2013 hmp-assembly-webinar
2013 hmp-assembly-webinar2013 hmp-assembly-webinar
2013 hmp-assembly-webinar
 
2013 talk at TGAC, November 4
2013 talk at TGAC, November 42013 talk at TGAC, November 4
2013 talk at TGAC, November 4
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
2012 erin-crc-nih-seattle
2012 erin-crc-nih-seattle2012 erin-crc-nih-seattle
2012 erin-crc-nih-seattle
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2012 XLDB talk
2012 XLDB talk2012 XLDB talk
2012 XLDB talk
 
ASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop SlidesASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop Slides
 
U Florida / Gainesville talk, apr 13 2011
U Florida / Gainesville  talk, apr 13 2011U Florida / Gainesville  talk, apr 13 2011
U Florida / Gainesville talk, apr 13 2011
 
2013 bms-retreat-talk
2013 bms-retreat-talk2013 bms-retreat-talk
2013 bms-retreat-talk
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-data
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Scaling metagenome assembly
Scaling metagenome assemblyScaling metagenome assembly
Scaling metagenome assembly
 

Más de c.titus.brown

Más de c.titus.brown (20)

2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
 
2014 wcgalp
2014 wcgalp2014 wcgalp
2014 wcgalp
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 

Último

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Último (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

2013 stamps-assembly-methods.pptx

  • 1. C.Titus Brown Assistant Professor CSE, MMG, BEACON Michigan State University August 2013 ctb@msu.edu Assembling metagenomes: a not so practical guide
  • 2. Acknowledgements Lab members involved Collaborators —  Adina Howe (w/Tiedje) —  Jason Pell —  Arend Hintze —  Rosangela Canino-Koning —  Qingpeng Zhang —  Elijah Lowe —  Likit Preeyanon —  Jiarong Guo —  Tim Brom —  Kanchan Pavangadkar —  Eric McDonald —  JimTiedje, MSU; Janet Jansson, JGI; Susannah Tringe, JGI. Funding USDA NIFA; NSF IOS; BEACON, NIH.
  • 3. Acknowledgements Lab members involved Collaborators —  Adina Howe (w/Tiedje) —  Jason Pell —  Arend Hintze —  Rosangela Canino-Koning —  Qingpeng Zhang —  Elijah Lowe —  Likit Preeyanon —  Jiarong Guo —  Tim Brom —  Kanchan Pavangadkar —  Eric McDonald —  JimTiedje, MSU; Janet Jansson, JGI; Susannah Tringe, JGI. Funding USDA NIFA; NSF IOS; BEACON, NIH.
  • 4.
  • 5. Shotgun sequencing and coverage Genome (unknown) X X X X X X X X X X X X X X Reads (randomly chosen; have errors) X XX “Coverage” is simply the average number of reads that overlap each true base in genome. Here, the coverage is ~10 – just draw a line straight down from the top through all of the reads.
  • 6. Random sampling => deep sampling needed Typically 10-100x needed for robust recovery (300 Gbp for human)
  • 8. Assembly depends on high coverage
  • 9. Shared low-level transcripts may not reach the threshold for assembly.
  • 10. K-mer based assemblers scale poorly Why do big data sets require big machines?? Memory usage ~ “real” variation + number of errors Number of errors ~ size of data set GCGTCAGGTAGGAGACCGCCGCCATGGCGACGATG GCGTCAGGTAGGAGACCACCGTCATGGCGACGATG GCGTCAGGTAGCAGACCACCGCCATGGCGACGATG GCGTTAGGTAGGAGACCACCGCCATGGCGACGATG
  • 11. Conway T C , Bromage A J Bioinformatics 2011;27:479-486 © The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com De Bruijn graphs scale poorly with data size
  • 12. Practical memory measurements Velvet measurements (Adina Howe)
  • 13. How much data do we need? (I) (“More” is rather vague…)
  • 14. Putting it in perspective: Total equivalent of ~1200 bacterial genomes Human genome ~3 billion bp Assembly results for Iowa corn and prairie (2x ~300 Gbp soil metagenomes) Total Assembly Total Contigs (> 300 bp) % Reads Assembled Predicted protein coding 2.5 bill 4.5 mill 19% 5.3 mill 3.5 bill 5.9 mill 22% 6.8 mill Adina Howe
  • 15. Resulting contigs are low coverage. Figure 11: Coverage (median basepair) distribution of assembled contigs from soil metagenomes.
  • 16. How much? (II) —  Suppose we need 10x coverage to assemble a microbial genome, and microbial genomes average 5e6 bp of DNA. —  Further suppose that we want to be able to assemble a microbial species that is “1 in a 100000”, i.e. 1 in 1e5. —  Shotgun sequencing samples randomly, so must sample deeply to be sensitive. 10x coverage x 5e6 bp x 1e5 =~ 50e11, or 5 Tbp of sequence. Currently this would cost approximately $100k, for 10 full Illumina runs, but in a year we will be able to do it for much less.
  • 17. We can estimate sequencing req’d: http://ivory.idyll.org/blog/how-much-sequencing-is-needed.html
  • 18. “Whoa, that’s a lot of data…” http://ivory.idyll.org/blog/how-much-sequencing-is-needed.html
  • 19. Some approximate metagenome sizes —  Deep carbon mine data set: 60 Mbp (x 10x => 600 Mbp) —  Great Prairie soil: 12 Gbp (4x human genome) —  Amazon Rain Forest Microbial Observatory soil: 26 Gbp
  • 20. How can we scale assembly!? —  We have developed two prefiltering approaches. —  Essentially, we preprocess your reads and (a) normalize their coverage and (b) subdivide them by graph partition. “We take your reads and make them better! Satisfaction guaranteed or your money back!*” *Terms may not apply to NSF, NIH, and USDA funding bodies.
  • 21. Approach I: Digital normalization (a computational version of library normalization) Species A Species B Ratio 10:1 Unnecessary data 81% Suppose you have a dilution factor ofA (10) to B(1). To get 10x of B you need to get 100x ofA! Overkill!! This 100x will consume disk space and, because of errors, memory. We can discard it for you…
  • 22. We only need ~5x at each point. Genome (unknown) X X X X X X X X X X X X X X Reads (randomly chosen; have errors) X XX “Coverage” is simply the average number of reads that overlap each true base in genome. Here, the coverage is ~10 – just draw a line straight down from the top through all of the reads.
  • 23. True sequence (unknown) Reads (randomly sequenced) Digital normalization
  • 24. True sequence (unknown) Reads (randomly sequenced) X Digital normalization
  • 25. True sequence (unknown) Reads (randomly sequenced) X X X X X X X X X X X Digital normalization
  • 26. True sequence (unknown) Reads (randomly sequenced) X X X X X X X X X X X Digital normalization
  • 27. True sequence (unknown) Reads (randomly sequenced) X X X X X X X X X If next read is from a high coverage region - discard X X Digital normalization
  • 28. Digital normalization True sequence (unknown) Reads (randomly sequenced) X X X X X X X X X X X X X X X X X X X X X X X X Redundant reads (not needed for assembly)
  • 29. A read’s median k-mer count is a good estimator of “coverage”. This gives us a reference-free measure of coverage.
  • 30. Digital normalization approach A digital analog to cDNA library normalization, diginorm: —  Is single pass: looks at each read only once; —  Does not “collect” the majority of errors; —  Keeps all low-coverage reads; —  Smooths out coverage of regions.
  • 31. Coverage before digital normalization: (MD amplified)
  • 32. Coverage after digital normalization: Normalizes coverage Discards redundancy Eliminates majority of errors Scales assembly dramatically. Assembly is 98% identical.
  • 33. Digital normalization approach A digital analog to cDNA library normalization, diginorm is a read prefiltering approach that: —  Is single pass: looks at each read only once; —  Does not “collect” the majority of errors; —  Keeps all low-coverage reads; —  Smooths out coverage of regions.
  • 34. Contig assembly is significantly more efficient and now scales with underlying genome size —  Transcriptomes, microbial genomes incl MDA, and most metagenomes can be assembled in under 50 GB of RAM, with identical or improved results.
  • 35. Digital normalization retains information, while discarding data and errors
  • 41. Raw data (~10-100 GB) Analysis "Information" ~1 GB "Information" "Information" "Information" "Information" Database & integration We can use lossy compression approaches to make downstream analysis faster and better. ~2 GB – 2TB of single-chassis RAM
  • 42. Metagenomes: Data partitioning (a computational version of cell sorting) Split reads into “bins” belonging to different source species. Can do this based almost entirely on connectivity of sequences. “Divide and conquer” Memory-efficient implementation helps to scale assembly. Pell et al., 2012, PNAS
  • 43. Partitioning separates reads by genome. When computationally spiking HMP mock data with one E.coli genome (left) or multiple E.coli strains (right), majority of partitions contain reads from only a single genome (blue) vs multi-genome partitions (green). Partitions containing spiked data indicated with a * Adina Howe **
  • 44. Partitioning: Technical challenges met (and defeated) —  Novel data structure properties elucidated via percolation theory analysis (Pell et al., 2012, PNAS). —  Exhaustive in-memory traversal of graphs containing 5-15 billion nodes. —  Sequencing technology introduces false sequences in graph (Howe et al., submitted.) —  Only 20x improvement in assembly scaling L.
  • 45. Is your assembly good? —  Truly reference-free assembly is hard to evaluate. —  Traditional “genome” measures like N50 and NG50 simply do not apply to metagenomes, because very often you don’t know what the genome “size” is.
  • 46. Evaluating assembly Predicted genome. X X X X X X X X XX Reads - noisy observations of some genome. Assembler (a Big Black Box) Evaluating correctness of metagenomes is still undiscovered country.
  • 48. What’s the best assembler? -25 -20 -15 -10 -5 0 5 10 BCM* BCM ALLP NEWB SOAP** MERAC CBCB SOAP* SOAP SGA PHUS RAY MLK ABL CumulativeZ-scorefromrankingmetrics Bird assembly Bradnam et al.,Assemblathon 2: http://arxiv.org/pdf/1301.5406v1.pdf
  • 49. What’s the best assembler? -8 -6 -4 -2 0 2 4 6 8 BCM CSHL CSHL* SYM ALLP SGA SOAP* MERAC ABYSS RAY IOB CTD IOB* CSHL** CTD** CTD* CumulativeZ-scorefromrankingmetrics Fish assembly Bradnam et al.,Assemblathon 2: http://arxiv.org/pdf/1301.5406v1.pdf
  • 50. What’s the best assembler? Bradnam et al.,Assemblathon 2: http://arxiv.org/pdf/1301.5406v1.pdf -14 -10 -6 -2 2 6 10 SGA PHUS MERAC SYMB ABYSS CRACS BCM RAY SOAP GAM CURT CumulativeZ-scorefromrankingmetrics Snake assembly
  • 51. Answer: for eukaryotic genomes, it depends —  Different assemblers perform differently, depending on —  Repeat content —  Heterozygosity —  Generally the results are very good (est completeness, etc.) but different between different assemblers (!) —  There Is No OneAnswer.
  • 52. Estimated completeness: CEGMA Each assembler lost different ~5% CEGs Bradnam et al.,Assemblathon 2: http://arxiv.org/pdf/1301.5406v1.pdf
  • 53. Tradeoffs in N 50 and % incl. 0 10 20 30 40 50 60 70 80 90 100 110 0 2,000,000 4,000,000 6,000,000 8,000,000 10,000,000 12,000,000 14,000,000 16,000,000 18,000,000 BCM ALLP SOAP NEWB MERAC SGA CBCB PHUS RAY MLK ABL %ofestimatedgenomesizepresentinscaffolds>=25Kbp NG50scaffoldlength(bp) Bird assembly Bradnam et al.,Assemblathon 2: http://arxiv.org/pdf/1301.5406v1.pdf
  • 54. Evaluating assemblies —  Every assembly returns different results for eukaryotic genomes. —  For metagenomes, it’s even worse. —  No systematic exploration of precision, recall, etc. —  Very little in the way of cross-comparable data sets —  Often sequencing technology being evaluated is out of date —  etc. etc.
  • 55. Our experience —  Our metagenome assemblies compare well with others, but we have little in the way of ground truth with which to evaluate. —  Scaffold assembly is tricky; we believe in contig assembly for metagenomes, but not scaffolding. —  See arXiv paper,“Assembling large, complex metagenomes”, for our suggested pipeline and statistics & references.
  • 56. Metagenomic assemblies are highly variable Adina Howe et al., arXiv 1212.0159
  • 57. How to choose a metagenome assembler —  Try a few. —  Use what seems to perform best (most bp > some minimum) —  I’ve heard/read good things about —  MetaVelvet —  Ray Meta —  IDBA-UD —  Our pipeline doesn’t specify an assembler.
  • 58. Adapter trim & quality filter Diginorm to C=10 Trim high- coverage reads at low-abundance k-mers Diginorm to C=5 Partition graph Split into "groups" Reinflate groups (optional Assemble!!! Map reads to assembly Too big to assemble? Small enough to assemble? Annotate contigs with abundances MG-RAST, etc. The Battle Creek Metagenomics Pipeline
  • 59. Adapter trim & quality filter Diginorm to C=10 Trim high- coverage reads at low-abundance k-mers Diginorm to C=5 Partition graph Split into "groups" Reinflate groups (optional Assemble!!! Map reads to assembly Too big to assemble? Small enough to assemble? Annotate contigs with abundances MG-RAST, etc. Diginorm
  • 60. Adapter trim & quality filter Diginorm to C=10 Trim high- coverage reads at low-abundance k-mers Diginorm to C=5 Partition graph Split into "groups" Reinflate groups (optional Assemble!!! Map reads to assembly Too big to assemble? Small enough to assemble? Annotate contigs with abundances MG-RAST, etc. Diginorm Partitioning
  • 61. Thoughts on our pipeline —  Should work with any metagenome; very generic approach. —  Diginorm can be decoupled from partitioning; —  People report that diginorm “just works”; —  Partitioning is trickier and only needed for REALLY BIG data sets. —  Diginorm does interact with some assemblers in a funny way, so suggest starting withVelvet and/or reinflating your partitions. —  This pipeline, esp diginorm part, is faster and lower memory than any other assembler out there (well, except maybe Minia).
  • 62. Deep Carbon data set —  Name: DCO_TCO_MM5 —  Masimong Gold Mine; microbial cells filtered from fracture water from within a 1.9km borehole. (32,000 year old water?!) —  M.C.Y. Lau, C. Magnabosco, S. Grim, G. Lacrampe Couloume, K.Wilkie, B. Sherwood Lollar, D.N. Simkus, G.F. Slater, S. Hendrickson, M. Pullin,T.L. Kieft, O. Kuloyo, B. Linage, G. Borgonie, E. van Heerden, J.Ackerman, C. van Jaarsveld, andT.C. Onstott
  • 63. DCO_TCO_MM520m reads / 2.1 Gbp 5.6m reads / 601.3 Mbp Adapter trim & quality filter Diginorm to C=10 Trim high- coverage reads at low-abundance k-mers “Could you take a look at this? MG- RAST is telling us we have a lot of artificially duplicated reads, i.e. the data is bad.” Entire process took ~4 hours of computation, or so. (Minimum genome size est: 60.1 Mbp)
  • 64. Assembly stats: All  con'gs   Con'gs  >  1kb   k   N  con'gs   Sum  BP   N  con'gs   Sum  BP   Max   con'g   21   343263   63217837   6271   10537601   9987   23   302613   63025311   7183   13867885   21348   25   276261   62874727   7375   15303646   34272   27   258073   62500739   7424   16078145   48742   29   242552   62001315   7349   16426147   48746   31   228043   61445912   7307   16864293   48750   33   214559   60744478   7241   17133827   48768   35   203292   60039871   7129   17249351   45446   37   189948   58899828   7088   17527450   59437   39   180754   58146806   7027   17610071   54112   41   172209   57126650   6914   17551789   65207   43   165563   56440648   6925   17654067   73231   DCO_TCO_MM5 (Minimum genome size est: 60.1 Mbp)
  • 65. Chose two: —  A: k=43 (“long contigs”) —  165563 contigs —  56.4 Mbp —  longest contig: 73231 bp —  B: k=43 (“high recall”) —  343263 contigs —  63.2 Mbp —  longest contig is 9987 bp DCO_TCO_MM5 How to evaluate??
  • 66. How many reads map back? Mapped 3.8m paired-end reads (one subsample): —  high-recall: 41% of pairs map —  longer-contigs: 70% of pairs map + 150k single-end reads: —  high-recall: 49% of sequences map —  longer-contigs: 79% of sequences map
  • 67. Annotation/exploration with MG-RAST —  You can upload genome assemblies to MG-RAST, and annotate them with coverage; tutorial to follow. —  What does MG-RAST do?
  • 68. Conclusion —  This is a pretty good metagenome assembly – > 80% of reads map! —  Surprised that the larger dataset (6.32 Mbp,“high recall”) accounts for a smaller percentage of the reads – 49% vs 79% for the 56.4 Mbp “long contigs” data set. —  I now suspect that different parameters are recovering different subsets of the sample… —  Don’t trust MG-RASTADR calls. DCO_TCO_MM5
  • 70. You can estimate metagenome size…
  • 71. Estimates of metagenome size Calculation: # reads * (avg read len) / (diginorm coverage) Assumes: few entirely erroneous reads (upper bound); saturation (lower bound). —  E.coli:384k * 86.1 / 5.0 => 6.6 Mbp est. (true: 4.5 Mbp) —  MM5 deep carbon: 60 Mbp —  Great Prairie soil: 12 Gbp —  Amazon Rain Forest Microbial Observatory: 26 Gbp
  • 72. Diginorm changes your coverage. DN   Reinflated   k   N  con'gs   bp   longest   N  con'gs   bp   longest   21   24   441844   80662   31   439074   79170   23   13   443330   86040   24   437988   80488   25   12   443565   84324   24   426949   84286   27   11   443256   89835   23   385473   89795   29   11   443665   89748   11   285725   89809   31   10   440919   102131   11   286508   89810   33   12   432320   85175   15   282373   85210   35   15   423541   85177   15   276158   85177   37   14   352233   121539   14   278537   85184   39   16   322968   121538   10   276068   85187   41   20   393501   121545   8   278483   85211   43   25   363656   121624   6   278380   121462   Contigs > 1kb http://ivory.idyll.org/blog/the-k-parameter.html
  • 73. Extracting whole genomes? So far, we have only assembled contigs, but not whole genomes. Can entire genomes be assembled from metagenomic data? Iverson et al. (2012), from theArmbrust lab, contains a technique for scaffolding metagenome contigs into ~whole genomes. YES.
  • 74. Concluding thoughts —  What works? —  What needs work? —  What will work?
  • 75. What works? Today, —  From deep metagenomic data, you can get the gene and operon content (including abundance of both) from communities. —  You can get microarray-like expression information from metatranscriptomics.
  • 76. What needs work? —  Assembling ultra-deep samples is going to require more engineering, but is straightforward. (“Infinite assembly.”) —  Building scaffolds and extracting whole genomes has been done, but I am not yet sure how feasible it is to do systematically with existing tools (c.f.Armbrust Lab).
  • 77. What will work, someday? —  Sensitive analysis of strain variation. —  Both assembly and mapping approaches do a poor job detecting many kinds of biological novelty. —  The 1000 Genomes Project has developed some good tools that need to be evaluated on community samples. —  Ecological/evolutionary dynamics in vivo. —  Most work done on 16s, not on genomes or functional content. —  Here, sensitivity is really important!
  • 78. The interpretation challenge —  For soil, we have generated approximately 1200 bacterial genomes worth of assembled genomic DNA from two soil samples. —  The vast majority of this genomic DNA contains unknown genes with largely unknown function. —  Most annotations of gene function & interaction are from a few phylogenetically limited model organisms —  Est 98% of annotations are computationally inferred: transferred from model organisms to genomic sequence, using homology. —  Can these annotations be transferred? (Probably not.) This will be the biggest sequence analysis challenge of the next 50 years.
  • 79. What are future needs? —  High-quality, medium+ throughput annotation of genomes? —  Extrapolating from model organisms is both immensely important and yet lacking. —  Strong phylogenetic sampling bias in existing annotations. —  Synthetic biology for investigating non-model organisms? (Cleverness in experimental biology doesn’t scale L) —  Integration of microbiology, community ecology/evolution modeling, and data analysis.
  • 80. Papers on our work. —  2012 PNAS, Pell et al., pmid 22847406 (partitioning). —  Submitted, Brown et al., arXiv:1203.4802 (digital normalization). —  Submitted, Howe et al, arXiv: 1212.0159 (artifact removal from Illumina metagenomes). —  Submitted, Howe et al., arXiv: 1212.2832 –Assembling large, complex environmental metagenomes. —  In preparation, Zhang et al. – efficient k-mer counting.
  • 81. Recommended reading —  “Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities.” Shakya et al., pmid 23387867. —  Good benchmark data set! “The results … indicate that a single gene marker such as rRNA is a poor determinant of the community structure in metagenomic sequence data from complex communities.”