1. An Introduction to NGS
(Next Generation Sequencing)
François Paillier - 22/02/2011
2. Plan
[ Reminder about Sanger Sequencing ]
• NGS Definition
• Overview of NGS technologies
• NGS Applications & examples
• Conclusion
NOT discussed here : Sequence accuracy, assembly and sampling ; NGS
data Analysis & BioInformatics tools
3. A word about Sanger Sequencing
(First generation sequencing machine Video)
3730xl
Principle (only the tube G + dideoxyG)
From gel to
capillary
Still a gold standard but capillary sequencing has reached its technical
limitation (costs and performance will remain unchanged)
4. Short Reminder about « Classical » Assembly
projects
Sample Libraries
Target genome
n Sequencing sub-projects Cloning
SubTargets (BACs, cosmids, ..)
Assembly
Clone selection &
Sequencing
Finishing: Draft (Q40)
Annotation
Assembly
Annotated Genome
Other strategy : wgs
5. Sequencing, what for ?
Assembly projects for example
In bioinformatics, sequence assembly refers to aligning and merging fragments of
a much longer DNA sequence in order to reconstruct the original sequence. This
is needed as DNA sequencing technology cannot read whole genomes in one go,
but rather small pieces between 20 and 1000 bases, depending on the technology
used. Typically the short fragments, called reads, result from shotgun sequencing
genomic DNA, or gene transcript (ESTs).
Target genome
Sequencing
reads
Assembly
Assembled reads
gap gap gap
4X Local coverage Consensus
scaffold
6. Vocabulary that should be kept in mind
in the sequencing field
• Assembly : result of the sequence clustering based on their local
similarity
• Contig : A set of overlapping DNA segments
• Coverage (in sequencing) : The mean number of times a nucleotide is
sequenced in a genome (example: 10X coverage)
• Scaffolds : A series of contigs that are in the right order but not necessarily
connected in one contiguous stretch
• Mate pairs Sequences known to be in the 3′ and 5′ of a contig from a single
clone
• WGS = Whole genome shotgun sequencing strategy
• ESS = Environmental Shotgun Sequencing
7. NGS = Next Generation
Sequencing
After PCR,
THE new revolution
in Biology ?
8. NGS Synonym is : High-throughput Sequencing
(HTS)
Third Generation :
NGS = HTS, Single
Molecule Sequencing
Second Generation :
NGS = Massively
Parallel Sequencing
First Generation :
SANGER Sequencing
9. Overview of actual NGS technologies
(Second generation sequencing machines)
Year 2005*
Roche, 454 GS-FLX
Titanium Protocol a must Each machine with
different :
2006 - Throughput
- Sequence accuracy
Illumina, GA1 then GA2
- Data formats (and
programs)
2007
Applied Bio.,
Solid v3
*NGS “proof of principle” was done in 2000 by Lynx Therapeutics : They publishes and markets "MPSS" - a parallelized,
adapter/ligation-mediated, bead-based sequencing technology, launching "next-generation" sequencing.
12. NGS Principle
Building sequencing devices at nanoscale
Polony : Discrete clonal amplifications of a single DNA molecule,
grown in a gel matrix. The clusters can then be individually
sequenced, producing short reads. Polony-based sequencing is
the basis of most second generation sequencers
A typical NGS Workflow is:
1) Library construction
2) Template CLONAL amplification
3) Massively PARALLEL sequencing
17. 454 Process : Emulsion PCR &
Pyrosequencing
Titanium =
Read lengths approx. 400 nt
1 million reads / Run
400 Mb / day
VIDEOs
About Pyrosequencing 1’53’’: <here>
Summary about GS Flex 4’34’’: <click
here>
18.
19. 454 GS FLX titanium
No more Cloning step - Seq. Accuracy not so high
From purified DNA to Sequencing (especially in case of
Fit the laboratory bench top / small homopolymers
LONG Sequences (400 nt) Main error type is indel
GS Junior system not so expensive
- Cost : approx. 20K€ / Gb
Capabilities : Multiplexing & Cost per base is cheaper
paired-ends (regarding Sanger) but still
High regarding others NexGen
Well fitted to : Machines
- proK. Genome sequencing
- RNA-seq
20. Illumina* : Bridge PCR
GA2x Version =
Read lengths
approx. 100 nt
240 million reads
1500 Mb / day
30000 Mb / Run
21. Generation of Polony array: Bridge-
PCR (Solexa)
DNA fragments are attached to array and
used as PCR templates
<Watch VIDEO : Related Links Video : Genome
Analyzer workflow Panel technology>
22. Illumina Chemistry : 4-color DNA sequencing-by-synthesis using reversible
terminators with removable flourescent dyes
8
Lanes
A Flow cell
25. Illumina
No more Cloning step
From purified DNA to Sequencing - Machine is very expensive
Fit the laboratory bench top / small Main error type is mismatch
Good Sequence Accuracy
- Read lengths are still too short
Capabilities : Multiplexing & Not fitted to big genomes
paired-ends (Repeats)
Cost : approx. 2K€ / Gb , Cost per - Poor coverage of AT rich regions
base is cheaper than 454 - Most widely used NGS platform.
- Requires least DNA
Well fitted to :
- proK. Genome sequencing
- RNA-seq, ChIP-Seq,
Methyl-Seq
26. SOLiD system : 4-color DNA Sequencing by
Ligation
SOLiD V3 =
Read lengths
approx. 50 nt
400 million reads
1500 Mb / day
20000 Mb / Run
1500€ / Gb
<Watch Video> 4’46’’
27. Sequencing by ligation rxn: Fluorescently Labeled
Nucleotides (ABI SOLiD)
Complementar y strand elongation: DNA Ligase
31. SOLiD
No more Cloning step
From purified DNA or RNA to Seq. - This Technology is NOT
Fit the laboratory bench top / small Intuitive
Good Sequence Accuracy
- Machine is VERY expensive
Capabilities : Multiplexing &
paired-ends -HUGE amount of data produced
(1500 Gb !!)
Cost : approx. 1.5K€ / Gb , Cost per
base is cheaper than illumina -Long Run times
Well fitted to : -Has been demonstrated
- REsequencing certain reads don’t match
- RNA-seq, ChIP-Seq, Reference !
Methyl-Seq
35. Prokaryotic Genome Sequencing
Project as a mix of NGS technologies
Conclusion :
- High quality drafts can be produced for small genomes without any Sanger data input.
- We found that 454 GSFLX and Solexa/Illumina show great complementarity in producing
large contigs and supercontigs with a low error rate.
36. NGS Applications
DEEPER insight into biological processes
BROADER sampling of populations (cells, viruses,
Ecosystems…)
• In different fields…
– Metagenomics
– Genomics
– Transcriptomics
– proteomics
37. Genome
* De Novo Sequencing
* Targeted Resequencing …for different
(SNP, Indel, CNV)
* Whole Genome Resequencing purposes…
-Towards Personalized
* Metagenome analyses Medicine
- Biodiversity assessment
Transcriptome -De Novo Sequencing of
* Gene Expression Profiling prokaryotic or eukaryotic
genomes (or re-sequencing)
* Small RNA Analysis
-RNA-Seq Annotation of
* Whole Transcriptome Analysis eukaryotic genomes
-SNP calling : identification of
Epigenome mutations
* Chromatin Immunoprecipitation -Chip-Seq : identification of
DNA/protein interactions
Sequencing (ChIP-Seq)
* Methylation Analysis
38.
39. What is the current impact of
NGS on Biology ?
• Both transcriptomics and genomics can now be
adressed using one technology with higher
accuracy and robustess (instead of Sanger
sequencing + µarrays p.e.) ( Example of RNA-SEQ)
• SNP calling can rely on ultra-deep assemblies
• Whole genome overview of transcription factors
binding sites
• Biodiversity assessment ( Metagenomics projects)
• And so much more…
40. About whole-exome sequencing :
« For the First Time, DNA Sequencing Technology
Saves A Child's Life »
« Proponents of genetic medicine say DNA sequencing is the future of
medicine and that soon every truly sick person will have his or her genome
sequenced. Critics cite privacy concerns and note that genetic mutations and
variations don’t necessarily lead to medical outcomes. Whatever the
position, it’s hard to argue that this isn’t good news: the first child – plagued
by undiagnosable illness – has been saved by DNA sequencing.
That may be a bit of a strong statement – six-year-old Nicholas Volker is
doing well, though complications could soon arise. But it’s highly likely that
the sequencing of young Nicholas’s genome saved his life. »
<Link> <Article>
Mayer & Al. Genetics IN Medicine • Volume xx, Number xx, 01 2011
41. What’s Next ?
IonTorrent
PacBio
Roche, 454 GS-FLX
Titanium
Illumina, GA2 Third Generation :
- Single
Molecule Sequencing (no bias)
- Faster
Applied BioSys, Solid v3
- Cheaper (or not)
Second Generation : - 1000€ Human genome ?
NGS = Massively
Parallel Sequencing
(polony sequencing)
42. Conclusion : impact of NGS
Global Shift to sequencing-based technologies
Great improvements on-going : Higher throughput, longer reads
Is it the end of µarrays ? A sub-part of NGS workflows restricted to target-
enrichment ?
Is it the end of forward genetics ? Reverse genetics only ?
Biologists education should integrate NGS knowledge
Is it the end of « Big sequencing centers »? change in their mission ?
Next bottleneck : BioInformatics
- Storing data a problem (SRA soon down ?) AND IT networks speed
FAR too low Very difficult to share NGS data Fridges instead of
disks !?
- Analyzing data a problem great improvements but still a lot of work
remain to be done
46. NGS Technology Comparison
ABI SOLiD Illumina GA 454 Roche FLX
Cost SOLiD 4: $495k IIe: $470k Titanium: $500k
SOLiD PI: $240k IIx: $250k
HiSeq: $690k
Quantity SOLiD 4: 100Gb IIe: 20 - 38 Gb 450 Mb
of Data SOLiD PI: 50Gb IIx: 50 – 95 Gb
per run HiSeq: 200Gb +
Run Time 7 Days 4 Days 9 Hours
Pros Low error rate due to Most widely used Short run time. Long
dibase probes NGS platform. reads better for de
Requires least DNA novo sequencing
Cons Long run times. Has Least multiplexing Expensive reagent
been demonstrated capability of the 3. cost. Difficulty
certain reads don’t Poor coverage of AT reading
match reference rich regions homopolymer
regions
Source: The University of Western Ontario