SlideShare a Scribd company logo
1 of 56
High-throughput sequencing technologies in
genome assembly
Hans Jansen
Dutch SME at Bioscience Park in Leiden, the Netherlands
• High throughput drug screens, and toxicity assays in zebrafish larvae
• Fish fertility (eel, pike perch, sole) to aid sustainable aquaculture
• Sequencing (genomes, transcriptomes)
• Bioinformatics
ZF-screens B.V.
Common carp (Cyprinus carpio)
High troughput screening model
Genome and transcriptomes
European and Japanese eel (Anguilla anguilla and Anguilla japonica)
Completing the life cycle in aquaculture
Genome and transcriptomes
King cobra (Ophiophagus hannah)
Evolution and toxins
Genome and transcriptomes
Some examples of genome projects
Chemical cleavage (Maxam and Gilbert)
Chain termination (Sanger, Nicklen, and Coulson)
Throughput: 5 samples, 1 Kb/day, micrograms
of ssDNA needed
1977 2000 2011
Massively parallel
signature sequencing
(Brenner)
SMRT (Pacific
Biosciences)
Throughput: 3x109 samples, 55 Gb/day,
single molecule of DNA needed
A brief history of DNA sequencing
A brief history of DNA sequencing
February 1977: Maxam and Gilbert
Chemical cleavage: Modify nucleotides and cut at the modified position.
December 1977: Sanger, Nicklen, and Coulson
Chain termination: Use modified nucleotides to stop the
extension of a newly synthesized DNA strand.
A brief history of DNA sequencing
Maxam and Gilbert sequencing was relatively soon abandoned. It was technically
complex, used some nasty chemicals and radioactivity.
The Sanger sequencing method has been improved and over the years was the method
of choice to sequence the first draft of a human genome.
• Thermostable polymerases alleviated the need for ssDNA template
• Fluorescent dye terminators to combine all four reactions in one.
• Automation of the separation of the DNA fragments.
Shotgun sequencing was already used by Sanger to sequence lambda DNA and proved
to be a powerful tool to sequence and assemble larger DNA molecules and even whole
genomes.
A brief history of DNA sequencing
To make assembly easier partially overlapping BAC clones from the genome were first
selected and then sequenced and assembled by the shotgun method.
gDNA
BAC
This was a laborious method and later a whole genome shothun approach was used.
A brief history of DNA sequencing
Genomic DNA
Break the DNA in < 1Kb fragments
3’
5’
Polish the ends of the DNA and
adenylate them
3’
5’
3’
A5’
3’
A
A
3’
5’
5’
Ligate adapter to the ends of the DNAT5’
3’T5’
3’
Amplify paired end library3’
5’
3’
5’
3’
5’
3’
5’
3’
5’
3’
5’
3’
5’
3’
5’
Bind ss-library to flowcell3’5’
Making a paired end library
Attach and cluster the library on a carrier
Sequence the library
2 x 50 bp
Generate large fragments by shearing,
and label the ends with biotin (green dash).
Self ligate fragments in large volume,
and shear the circular fragments (black dash).
Isolate the biotinylated fragments, convert them to a
paired end library and sequence them (red arrows).
Problem: part of these fragments have unconvertible ends.
Problems: larger fragments will self ligate inefficiently.
Nicks in the DNA will enable digestion of circularized molecules
The above mentioned problems limit the library to ~10 kb insert size and they tend to have a low number of
unique fragments.
Obtaining scaffolding information: mate pairs
Generate large fragments by shearing, isolate
~39 kb fragments and clone in adapted fosmid
vector which contain insert flanking EcoP15I
sites (purple dash).
Cut with EcoP15I which leaves a 26 bp
overhang, end repair fragments and self ligate.
PCR the diTag library from these fragments, and
sequence the 52 bp inserts.
Problem: These large fragments will ligate inefficiently in the
fosmid vector leading to low complexity libraries.
Obtaining scaffolding information: Fosmid diTags
Library Insert Reads Gbp Coverage Span
PE200 <155 bp 2 × 76 nt 21.9 14.6×
PE280 230–305 bp 2 × 151 11.0 7.3×
PE500 370–485 bp
2 × 50–151
nt
19.3 12.9× 1.2×
MP2K 1.6–2.4 Kbp 2 × 36 nt 5.4 4.5×
MP7K 4–6 Kbp 2 × 51 nt 2.3 0.6×
MP10K 6.5–10 Kbp 2 × 51 nt 5.3 7.7×
MP15K 9–13 Kbp 2 × 51 nt 3.8 8.8×
69 Gbp 34.8× 22.9×
King cobra sequence data
Read merging
If the two reads of a paired end fragment overlap they can be merged into a single
longer read
• We use our own script since nothing was available at the time
• Now there are a number of tools: FLASH, SHERA, SeqPrep
• Paired end libraries need to be prepared with the read length in mind, and size
select as narrow as possible.
~600 bp
~270 bp
102
Fragmentsize (bp)
%oftheassembly
103 104 105 106
+ 500 bp + 2 Kbp+ 7 Kbp + 10 Kbp+ 15 Kbp
Assembly (cobra)
Contigs
N50 3982 bp
largest 70 Kbp
number 1186408
Tota length 1.45 Gbp
Scaffolds
N50 226 Kbp
largest 2.84 Mbp
number 716551
Total length 1.66 Gbp
number of genes 22183
King cobra sequence assembly
Genome Res. 2007 17: 240-248
This is a method to sequence (a small) part of a genome, and do this for
multiple siblings.
From the sequence data SNP’s can be identified and used as markers to build a
genetic map of this genome.
Analysis of the spotted gar genome cut with SbfI in the parents and 94
individuals from their progeny produced 8406 markers in 29 linkage groups.
Generating a RAD-tag genetic map
From Baird, PLoS ONE 2008
This can be done with multiple samples
when using barcodes
After adding the barcodes all samples can
be pooled to reduce workload
Pools of short fragments from different
individuals.
Generating a RAD-tag genetic map
Amores A et al. Genetics 2011;188:799-808
Generating a RAD-tag genetic map
Long DNA molecules Fluorescently labeled at specific sites are linearized in
nanochannels and imaged. The fluorescent fingerprints of each molecule can
be assembled and linked to contigs and scaffolds.
Optical mapping: BioNano Genomics
Gabino Sanchez-Perez lecture at 15.00 hrs. will explain this in much
more detail and show some great examples how to use this technology.
Just a genome is usually not the goal of a de novo sequencing project.
Based on the general structure of a gene, gene predictions can be made.
exon exon exon exon
AGGT AGT
A
G
Pyrich CAGG
splice acceptor site
ATG STOP
Poly adenylation signalA
C
splice donor site
CT A
Branch site
A C
G T
20-50 bases
intron
RNAseq reads can help validate predictions
Annotation of the genome
Different flavors of RNAseq
• Stranded dUTP RNAseq: simple modification of standard prep gives
information of the strandedness of the transcript.
• RNAseq with minimal quantities of RNA : a great tool to look at small
numbers of (FACS sorted) cells
• Cage : ideal to find transcription start site
• smallRNA: to explore the miRNA content of a sample
Transcriptome sequencing
Disadvantages of next generation sequencing:
• Complex sample preparation including PCR amplification.
• High run costs.
• Long run times.
• Short reads
Changes needed:
• Single molecule analysis
• Reading sequences at a high speed
• Highly parallel
• Long reads >10kb
• No errors
Long reads: what do we want?
Pacific Biosciences PacBio RS II
Available since 2010
Oxford Nanopore Technologies MinION
Available since 2014
Generating long reads
Pacific Biosciences PacBio RSII
It uses a zero mode waveguide
to measure fluorescence in a
very small volume.
Ligate hairpin adapters
Fragment gDNA and polish ends, and add adenosine.
Attach polymerase, load on SMRT cell and sequence
DNA polymerase
Transparent bottom of
zero mode waveguide
Pacific Biosciences
Pacific Biosciences P6-C4
• Yield 0.5-1 Gbp/SMRT cell.
• Since no amplification is done you
sequence the DNA as it comes out of your
sample (nicks, base modifications).
• There is very little sequence bias and no
systemic errors
Christoph Konig’s lecture at 14.15 hrs will delve much deeper into this technology.
• Started to work on nanopore sensing in 2005
• Investments to date 180 million GBP (227 M€)
• ~200 employees
• Broad IP portfolio
• Announced products: MinION and PromethION systems
• Access program for MinION (MAP)
Oxford Nanopore Technologies
But MAP is much more. It is about being a community and a playground to test new
applications.
Last part of the development of this technology is done “in field” in an fairly open
program.
100’s of MinIONs send around the globe to see how they would behave in real life.
MAP is visible as a web portal with information from ONT and social media like system
with blog possibilities, comment, likes, and a forum to ask advice.
MinION access program
Tethering oligo
Motor protein Brake protein
hairpin
abasic nucleotidesT
TA
A
Shear (optional)
DNA repair (optional), AmpureXP purification
end repair, AmpureXP purification
A tailing, AmpureXP purification
Ligation, His-tag purification,
Dilution in run buffer and ATP
A MuA transposase protocol is under development. This should further
simplify sample preparation (10 minutes).
Library preparation
Tethering oligo
Motor protein E5
Brake protein E3
hairpin
abasic nucleotides
Tether keeps DNA fragment on the membrane leading to a ~20K fold higher DNA
concentration close to the pore.
Motor protein unwinds DNA and ratchets it though the pore.
Abasic nucleotides in the hairpin are a recognition point.
Brake protein prevents the motor protein from zipping through the complement strand.
Sequencing
Stills taken from: https://www.nanoporetech.com/news/movies#movie-24-nanopore-dna-sequencing
Strand sequencing
ATP
GGCTCACTCCCATAAGC
GGCTC
GCTCA
CTCAC
TCACT
CACTC
ACTCC
CTCCC
Raw Data (ionic curent, pA)
Events (with time domain)
Squiggle (events with time domain removed)
Sensing the DNA
Squiggle plot for a complete read
First the template part in blue, then the abasic nucleotides in the hairpin in red, and
finally the complement part in turquoise .
Alignment of template and complement squiggles gives a 2d read.
Squiggle plot
MinKNOW controls the run and shows channel states…..
Interactive interface
….. and amount of events vs read length.
Metrichor agent runs in the background to send sequence files to and from
the (cloud based) base caller.
MinKNOW can interact with other software.
minoTour analyses reads in a streaming mode and can control MinKNOW.
Interactive interface
template mean 8734 bp complement mean 8126 bp 2D mean 9930 bp
Read length is limited by the non-nicked fragment length rather than the by the system.
My longest 2D read until now: 93.5 Kbp, template 120 Kb.
Read length distribution
There are actually 4 wells/detection
channel. QC at the beginning of the
run determines the quality of the
4wells. Sequencing starts on the best
set of wells. Each 24 hrs the next best
set of wells is chosen.
Yield over time
Errors
ref TGATGTATATGCTCTCTTTTCTGACGTTAGTCTCCGACGGCAGGCTTCAA-TGACCC-A-GGCTGAGAAATTCCCGGACCCTTTTTGCTCAAGAGCGATG
|||||||||||||| |||||||||||| ||||||||||||||||||| |||||| | ||||||||||||||||||||| |||||| |||| | |
MinION TGATGTATATGCTC----TTCTGACGTTAGCCTCCGACGGCAGGCTTCAATTGACCCGATGGCTGAGAAATTCCCGGACCC--TTTGCTACAGAGTG-T-
ref TTAATTTGTTCAATCATTTGGTTAGGAAAGCGGATGTTGCGGGTTGTTGTTCTGCGGGTTCTGTTCTTCGTTGACATGAG---GTTGCCCCGTATTCAGT
|||||||||||||||||||||||||||||||||| ||| |||||| | |||| ||||||| ||| |||||| | || | || | | |
MinION TTAATTTGTTCAATCATTTGGTTAGGAAAGCGGA---TGC-GGTTGT--TCCTGC-GGTTCTG----TCG-TGACATCCGTTATTTGCGCTGT-TACGC
ref GTCGC-TGATTTGTATTGTCTGAAGTTGTTTTTACGTTAAGTTGATGCAGATCAATTAATACGATACCT--GCGTCATAATTGATTATTTGACGT--GGT
| || || |||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||| |||||||||||||||||||||||| |||
MinION ATGGCATGTTTTGTATTGTCTGAAGTTGTTTTTACGTTAAGTTAATGCAGATCAATTAATACGATACCTCGGCGTCATAATTGATTATTTGACGTGGGGT
Error rate lies around 15% for current chemistry (R7.3). Typical passing 2D R7.3 read now is
2.8% deletions, 2.7% insertions and 1.7% substitutions.
R8&9 nanopores are in the pipeline (improving on G/C rich reads and better S/N).
Errors
Errors result from different parts of the system.
On the ASIC:
Events are missed by the translation from raw data to event data.
Solution: Sharpen up the raw data by playing with voltage and by new
nanopores with lower noise. Sequence faster.
In the base caller:
Bases outside the observed k-mer influence the current.
Solution: Higher k-mer models
Modified bases are currently not included in the k-mer model.
Solution: add modified k-mers to the model. Modified k-mers are
different from unmodified k-mers.
Errors
Throughput is defined by:
Number of channels. 512 on the MinION
Speed of translocation. 30 bps/sec
Occupancy of the pore. 90%
The time a Flow Cell can run. ~60 hrs.
Currently well over 1 Gb events.
On R7.3 this translates to ~400 Mb 2D data.
Throughput
In “fast mode” the MinION will read 500 bps/sec. Currently three MAP groups are
testing this. Throughput will increase to ~20 Gb in events.
Longest 2D read: 93.5 Kbp
Longest template read: 120 Kbp (231 Kbp)
Highest yield: 1.32 Gevents
R7
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Base pairs sequenced (Mbp)
Runs
template and 2D yield over the past year
template
2D
R7.3R6
repeatunique sequence in unique sequence out
Long reads can help to resolve repeat area’s in the assembly graph
And the resulting contigs will now look like this:
Untangle
1. Short read correction Quake (not for small genomes)
2. Short read assembly Velvet
3. MinION read alignment to Velvet contigs LAST
4. Link filtering and contig tiling Untangle script
5. Path detachment around repeats Untangle script
6. Bubble popping Untangle script
7. Delete unconfirmed connections Untangle script
8. Contig extraction Untangle script
Assembly and scaffolding strategy
Task Software
Agrobacterium NCPPB 1771 assembly graph
25× transposon →
(1160 bp)
8× transposon →
(873 bp)
4× rRNA →
(6.4 Kb)
271 nodes, 311 connections
154 contigs
N50 = 198 Kb
Sum = 5.87 Mb
• Alignment: LAST with optimized settings
• Links: alignment filtering and contig tiling
• 7328 reads aligned to contigs
• 438 reads aligned to multiple contigs
• 585 links between contigs
• 13158 reads on R6 and R7 chemistry
• 73.8 Mb total yield (template and 2D)
• 5–85970 nt length, typical ~12 Kb
MinION sequencing and scaffolding
Links between nodes are specific
Means link is confirmed by PCR
Final assembly graph after scaffolding
• 271 nodes + 312 connections → 49 nodes + 5 connections
• 154 contigs → ~8 contigs
• Complete chromosome 2 (1.2 Mb), pTi (190 Kb), cryptic megaplasmid (746 Kb)
• Slight residual fragmentation of chromosome 1
Reads are in HDF5 format and contain all data from the event data onwards.
A cloud based basecaller is provided by Oxford Naopore Technologies.
The MAP community is actively developing software to use this type of data.
Some examples:
Jared Simpson’s pipeline to correct and assemble using only nanopore reads.
Live monitoring, alignments and feedback to the MinION.
Matt Loose’s Minotour.
Squiggle space aligners
Each base is measured 5 times in consecutive kmers so it makes sense to avoid
basecalling and work directly with the events (squiggle space)
Software
London Calling 2015
Highlights from Clive Brown’s talk
• Improvements to the basecaller .
• Read until (and barcoding).
• Fast mode on the MinION MkI (500 bp/sec instead of 30).
• New 3000 channel ASIC with “crumpet” chip design to separate ASIC and fluidics part.
• MinION MkII and PromethION will have this new ASIC.
• Library prep on beads to reduce amounts of DNA needed (lower ng to pg).
• Direct RNA sequencing.
• Simplified sample preparation and VolTRAX.
• Pricing will be “pay as you go”. Initial payment for hardware include some hrs sequencing.
• MkI $270 and 3 hrs sequencing (~3 Gbp in fast mode).
London Calling 2015
Much emphasis on getting the library prep
simpler and faster to be able to leave the lab.
If the system leaves the lab many more
applications become possible.
VolTRAX
The technology underlying the MinION system is scalable so
larger throughput can be made available relatively easy.
It will use the new ASIC design and will have 144000 channels.
Projected throughput: 6.4 Tbp/day.
Too much data to do cloud baseclling so will be done locally.
Access Program will start later this year.
London Calling 2015
PromethION
Freek Vonk
Harald Kerkkamp
Asad Hyder
Michael Richardson
Christiaan Henkel
Paul Hooykaas
Ron Dirks
Guido van den Thillart
Herman Spaink
Pim Arntzen
Erwin Fakkert
Marten Boetzer
Walter Pirovano
Diana Uffink
R. Manjunatha Kini
Ken Kraaijeveld
Yavuz Ariyurek
Arnoud Schmitz
Yahya Anvar
Acknowledgments
Dan Turner
Oliver Hartwell
20150601 bio sb_assembly_course

More Related Content

What's hot

Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Sebastian Schmeier
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiomejukais
 
Oxford nanopore sequencing
Oxford nanopore sequencingOxford nanopore sequencing
Oxford nanopore sequencingSangeetha80717
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingTapish Goel
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingSajad Rafatiyan
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionMinesh A. Jethva
 
next generation sequencing (recent collection2018)
next generation sequencing (recent collection2018) next generation sequencing (recent collection2018)
next generation sequencing (recent collection2018) Newborn Screening KW
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.mkim8
 
Next-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-SeqNext-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-Seqb0rAAs
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingLINUS CORNERY
 

What's hot (20)

Nextera Overview Feb 2010
Nextera Overview Feb 2010Nextera Overview Feb 2010
Nextera Overview Feb 2010
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
Oxford nanopore sequencing
Oxford nanopore sequencingOxford nanopore sequencing
Oxford nanopore sequencing
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools Selection
 
next generation sequencing (recent collection2018)
next generation sequencing (recent collection2018) next generation sequencing (recent collection2018)
next generation sequencing (recent collection2018)
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
Next-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-SeqNext-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-Seq
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 

Viewers also liked

Fish breeding for future environments under climate change
 Fish breeding for future environments under climate change Fish breeding for future environments under climate change
Fish breeding for future environments under climate changeExternalEvents
 
Fishing in the genepool: Genetic resources and traits to address climate change
Fishing in the genepool: Genetic resources and traits to address climate changeFishing in the genepool: Genetic resources and traits to address climate change
Fishing in the genepool: Genetic resources and traits to address climate changeDecision and Policy Analysis Program
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...Anne Deslattes Mays
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...QBiC_Tue
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyLex Nederbragt
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applicationsAGRF_Ltd
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsAnnelies Haegeman
 

Viewers also liked (10)

Fish breeding for future environments under climate change
 Fish breeding for future environments under climate change Fish breeding for future environments under climate change
Fish breeding for future environments under climate change
 
Fishing in the genepool: Genetic resources and traits to address climate change
Fishing in the genepool: Genetic resources and traits to address climate changeFishing in the genepool: Genetic resources and traits to address climate change
Fishing in the genepool: Genetic resources and traits to address climate change
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assembly
 
20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 

Similar to 20150601 bio sb_assembly_course

Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing pptAshwini R
 
THIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxTHIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxRITHIKA R S
 
Sequence based Markers
Sequence based MarkersSequence based Markers
Sequence based Markerssukruthaa
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsAjit Shinde
 
Useful.ppt
Useful.pptUseful.ppt
Useful.pptaaaa bbb
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS4RTPCRAnand
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cellAmitSamadhiya1
 
ngs-mousumee-210611153338.pdf
ngs-mousumee-210611153338.pdfngs-mousumee-210611153338.pdf
ngs-mousumee-210611153338.pdfssuser4743df
 
AFLP, RFLP & RAPD
AFLP, RFLP & RAPDAFLP, RFLP & RAPD
AFLP, RFLP & RAPDDOCTOR WHO
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...fruitbreedomics
 
molecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfmolecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfGeetanjaliSaraswat1
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for educationaryajayakottarathil
 

Similar to 20150601 bio sb_assembly_course (20)

Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing ppt
 
Hamas 1
Hamas 1Hamas 1
Hamas 1
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Dna sequencing and its types
Dna sequencing and its typesDna sequencing and its types
Dna sequencing and its types
 
THIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxTHIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptx
 
Sequence based Markers
Sequence based MarkersSequence based Markers
Sequence based Markers
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Useful.ppt
Useful.pptUseful.ppt
Useful.ppt
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
ngs-mousumee-210611153338.pdf
ngs-mousumee-210611153338.pdfngs-mousumee-210611153338.pdf
ngs-mousumee-210611153338.pdf
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
AFLP, RFLP & RAPD
AFLP, RFLP & RAPDAFLP, RFLP & RAPD
AFLP, RFLP & RAPD
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
RMR-Nirma-NGS-Heena.pdf
RMR-Nirma-NGS-Heena.pdfRMR-Nirma-NGS-Heena.pdf
RMR-Nirma-NGS-Heena.pdf
 
molecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfmolecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdf
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education
 

Recently uploaded

Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 

Recently uploaded (20)

Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 

20150601 bio sb_assembly_course

  • 1. High-throughput sequencing technologies in genome assembly Hans Jansen
  • 2. Dutch SME at Bioscience Park in Leiden, the Netherlands • High throughput drug screens, and toxicity assays in zebrafish larvae • Fish fertility (eel, pike perch, sole) to aid sustainable aquaculture • Sequencing (genomes, transcriptomes) • Bioinformatics ZF-screens B.V.
  • 3. Common carp (Cyprinus carpio) High troughput screening model Genome and transcriptomes European and Japanese eel (Anguilla anguilla and Anguilla japonica) Completing the life cycle in aquaculture Genome and transcriptomes King cobra (Ophiophagus hannah) Evolution and toxins Genome and transcriptomes Some examples of genome projects
  • 4. Chemical cleavage (Maxam and Gilbert) Chain termination (Sanger, Nicklen, and Coulson) Throughput: 5 samples, 1 Kb/day, micrograms of ssDNA needed 1977 2000 2011 Massively parallel signature sequencing (Brenner) SMRT (Pacific Biosciences) Throughput: 3x109 samples, 55 Gb/day, single molecule of DNA needed A brief history of DNA sequencing
  • 5. A brief history of DNA sequencing February 1977: Maxam and Gilbert Chemical cleavage: Modify nucleotides and cut at the modified position. December 1977: Sanger, Nicklen, and Coulson Chain termination: Use modified nucleotides to stop the extension of a newly synthesized DNA strand.
  • 6. A brief history of DNA sequencing Maxam and Gilbert sequencing was relatively soon abandoned. It was technically complex, used some nasty chemicals and radioactivity. The Sanger sequencing method has been improved and over the years was the method of choice to sequence the first draft of a human genome. • Thermostable polymerases alleviated the need for ssDNA template • Fluorescent dye terminators to combine all four reactions in one. • Automation of the separation of the DNA fragments. Shotgun sequencing was already used by Sanger to sequence lambda DNA and proved to be a powerful tool to sequence and assemble larger DNA molecules and even whole genomes.
  • 7. A brief history of DNA sequencing To make assembly easier partially overlapping BAC clones from the genome were first selected and then sequenced and assembled by the shotgun method. gDNA BAC This was a laborious method and later a whole genome shothun approach was used.
  • 8. A brief history of DNA sequencing
  • 9. Genomic DNA Break the DNA in < 1Kb fragments 3’ 5’ Polish the ends of the DNA and adenylate them 3’ 5’ 3’ A5’ 3’ A A 3’ 5’ 5’ Ligate adapter to the ends of the DNAT5’ 3’T5’ 3’ Amplify paired end library3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ Bind ss-library to flowcell3’5’ Making a paired end library
  • 10. Attach and cluster the library on a carrier
  • 12. 2 x 50 bp Generate large fragments by shearing, and label the ends with biotin (green dash). Self ligate fragments in large volume, and shear the circular fragments (black dash). Isolate the biotinylated fragments, convert them to a paired end library and sequence them (red arrows). Problem: part of these fragments have unconvertible ends. Problems: larger fragments will self ligate inefficiently. Nicks in the DNA will enable digestion of circularized molecules The above mentioned problems limit the library to ~10 kb insert size and they tend to have a low number of unique fragments. Obtaining scaffolding information: mate pairs
  • 13. Generate large fragments by shearing, isolate ~39 kb fragments and clone in adapted fosmid vector which contain insert flanking EcoP15I sites (purple dash). Cut with EcoP15I which leaves a 26 bp overhang, end repair fragments and self ligate. PCR the diTag library from these fragments, and sequence the 52 bp inserts. Problem: These large fragments will ligate inefficiently in the fosmid vector leading to low complexity libraries. Obtaining scaffolding information: Fosmid diTags
  • 14. Library Insert Reads Gbp Coverage Span PE200 <155 bp 2 × 76 nt 21.9 14.6× PE280 230–305 bp 2 × 151 11.0 7.3× PE500 370–485 bp 2 × 50–151 nt 19.3 12.9× 1.2× MP2K 1.6–2.4 Kbp 2 × 36 nt 5.4 4.5× MP7K 4–6 Kbp 2 × 51 nt 2.3 0.6× MP10K 6.5–10 Kbp 2 × 51 nt 5.3 7.7× MP15K 9–13 Kbp 2 × 51 nt 3.8 8.8× 69 Gbp 34.8× 22.9× King cobra sequence data
  • 15. Read merging If the two reads of a paired end fragment overlap they can be merged into a single longer read • We use our own script since nothing was available at the time • Now there are a number of tools: FLASH, SHERA, SeqPrep • Paired end libraries need to be prepared with the read length in mind, and size select as narrow as possible. ~600 bp ~270 bp
  • 16. 102 Fragmentsize (bp) %oftheassembly 103 104 105 106 + 500 bp + 2 Kbp+ 7 Kbp + 10 Kbp+ 15 Kbp Assembly (cobra)
  • 17. Contigs N50 3982 bp largest 70 Kbp number 1186408 Tota length 1.45 Gbp Scaffolds N50 226 Kbp largest 2.84 Mbp number 716551 Total length 1.66 Gbp number of genes 22183 King cobra sequence assembly
  • 18. Genome Res. 2007 17: 240-248 This is a method to sequence (a small) part of a genome, and do this for multiple siblings. From the sequence data SNP’s can be identified and used as markers to build a genetic map of this genome. Analysis of the spotted gar genome cut with SbfI in the parents and 94 individuals from their progeny produced 8406 markers in 29 linkage groups. Generating a RAD-tag genetic map
  • 19. From Baird, PLoS ONE 2008 This can be done with multiple samples when using barcodes After adding the barcodes all samples can be pooled to reduce workload Pools of short fragments from different individuals. Generating a RAD-tag genetic map
  • 20. Amores A et al. Genetics 2011;188:799-808 Generating a RAD-tag genetic map
  • 21. Long DNA molecules Fluorescently labeled at specific sites are linearized in nanochannels and imaged. The fluorescent fingerprints of each molecule can be assembled and linked to contigs and scaffolds. Optical mapping: BioNano Genomics Gabino Sanchez-Perez lecture at 15.00 hrs. will explain this in much more detail and show some great examples how to use this technology.
  • 22. Just a genome is usually not the goal of a de novo sequencing project. Based on the general structure of a gene, gene predictions can be made. exon exon exon exon AGGT AGT A G Pyrich CAGG splice acceptor site ATG STOP Poly adenylation signalA C splice donor site CT A Branch site A C G T 20-50 bases intron RNAseq reads can help validate predictions Annotation of the genome
  • 23. Different flavors of RNAseq • Stranded dUTP RNAseq: simple modification of standard prep gives information of the strandedness of the transcript. • RNAseq with minimal quantities of RNA : a great tool to look at small numbers of (FACS sorted) cells • Cage : ideal to find transcription start site • smallRNA: to explore the miRNA content of a sample Transcriptome sequencing
  • 24. Disadvantages of next generation sequencing: • Complex sample preparation including PCR amplification. • High run costs. • Long run times. • Short reads Changes needed: • Single molecule analysis • Reading sequences at a high speed • Highly parallel • Long reads >10kb • No errors Long reads: what do we want?
  • 25. Pacific Biosciences PacBio RS II Available since 2010 Oxford Nanopore Technologies MinION Available since 2014 Generating long reads
  • 26. Pacific Biosciences PacBio RSII It uses a zero mode waveguide to measure fluorescence in a very small volume.
  • 27. Ligate hairpin adapters Fragment gDNA and polish ends, and add adenosine. Attach polymerase, load on SMRT cell and sequence DNA polymerase Transparent bottom of zero mode waveguide Pacific Biosciences
  • 28. Pacific Biosciences P6-C4 • Yield 0.5-1 Gbp/SMRT cell. • Since no amplification is done you sequence the DNA as it comes out of your sample (nicks, base modifications). • There is very little sequence bias and no systemic errors Christoph Konig’s lecture at 14.15 hrs will delve much deeper into this technology.
  • 29. • Started to work on nanopore sensing in 2005 • Investments to date 180 million GBP (227 M€) • ~200 employees • Broad IP portfolio • Announced products: MinION and PromethION systems • Access program for MinION (MAP) Oxford Nanopore Technologies
  • 30. But MAP is much more. It is about being a community and a playground to test new applications. Last part of the development of this technology is done “in field” in an fairly open program. 100’s of MinIONs send around the globe to see how they would behave in real life. MAP is visible as a web portal with information from ONT and social media like system with blog possibilities, comment, likes, and a forum to ask advice. MinION access program
  • 31. Tethering oligo Motor protein Brake protein hairpin abasic nucleotidesT TA A Shear (optional) DNA repair (optional), AmpureXP purification end repair, AmpureXP purification A tailing, AmpureXP purification Ligation, His-tag purification, Dilution in run buffer and ATP A MuA transposase protocol is under development. This should further simplify sample preparation (10 minutes). Library preparation
  • 32. Tethering oligo Motor protein E5 Brake protein E3 hairpin abasic nucleotides Tether keeps DNA fragment on the membrane leading to a ~20K fold higher DNA concentration close to the pore. Motor protein unwinds DNA and ratchets it though the pore. Abasic nucleotides in the hairpin are a recognition point. Brake protein prevents the motor protein from zipping through the complement strand. Sequencing
  • 33. Stills taken from: https://www.nanoporetech.com/news/movies#movie-24-nanopore-dna-sequencing Strand sequencing ATP
  • 34. GGCTCACTCCCATAAGC GGCTC GCTCA CTCAC TCACT CACTC ACTCC CTCCC Raw Data (ionic curent, pA) Events (with time domain) Squiggle (events with time domain removed) Sensing the DNA
  • 35. Squiggle plot for a complete read First the template part in blue, then the abasic nucleotides in the hairpin in red, and finally the complement part in turquoise . Alignment of template and complement squiggles gives a 2d read. Squiggle plot
  • 36. MinKNOW controls the run and shows channel states….. Interactive interface
  • 37. ….. and amount of events vs read length. Metrichor agent runs in the background to send sequence files to and from the (cloud based) base caller. MinKNOW can interact with other software. minoTour analyses reads in a streaming mode and can control MinKNOW. Interactive interface
  • 38. template mean 8734 bp complement mean 8126 bp 2D mean 9930 bp Read length is limited by the non-nicked fragment length rather than the by the system. My longest 2D read until now: 93.5 Kbp, template 120 Kb. Read length distribution
  • 39. There are actually 4 wells/detection channel. QC at the beginning of the run determines the quality of the 4wells. Sequencing starts on the best set of wells. Each 24 hrs the next best set of wells is chosen. Yield over time
  • 41. ref TGATGTATATGCTCTCTTTTCTGACGTTAGTCTCCGACGGCAGGCTTCAA-TGACCC-A-GGCTGAGAAATTCCCGGACCCTTTTTGCTCAAGAGCGATG |||||||||||||| |||||||||||| ||||||||||||||||||| |||||| | ||||||||||||||||||||| |||||| |||| | | MinION TGATGTATATGCTC----TTCTGACGTTAGCCTCCGACGGCAGGCTTCAATTGACCCGATGGCTGAGAAATTCCCGGACCC--TTTGCTACAGAGTG-T- ref TTAATTTGTTCAATCATTTGGTTAGGAAAGCGGATGTTGCGGGTTGTTGTTCTGCGGGTTCTGTTCTTCGTTGACATGAG---GTTGCCCCGTATTCAGT |||||||||||||||||||||||||||||||||| ||| |||||| | |||| ||||||| ||| |||||| | || | || | | | MinION TTAATTTGTTCAATCATTTGGTTAGGAAAGCGGA---TGC-GGTTGT--TCCTGC-GGTTCTG----TCG-TGACATCCGTTATTTGCGCTGT-TACGC ref GTCGC-TGATTTGTATTGTCTGAAGTTGTTTTTACGTTAAGTTGATGCAGATCAATTAATACGATACCT--GCGTCATAATTGATTATTTGACGT--GGT | || || |||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||| |||||||||||||||||||||||| ||| MinION ATGGCATGTTTTGTATTGTCTGAAGTTGTTTTTACGTTAAGTTAATGCAGATCAATTAATACGATACCTCGGCGTCATAATTGATTATTTGACGTGGGGT Error rate lies around 15% for current chemistry (R7.3). Typical passing 2D R7.3 read now is 2.8% deletions, 2.7% insertions and 1.7% substitutions. R8&9 nanopores are in the pipeline (improving on G/C rich reads and better S/N). Errors
  • 42. Errors result from different parts of the system. On the ASIC: Events are missed by the translation from raw data to event data. Solution: Sharpen up the raw data by playing with voltage and by new nanopores with lower noise. Sequence faster. In the base caller: Bases outside the observed k-mer influence the current. Solution: Higher k-mer models Modified bases are currently not included in the k-mer model. Solution: add modified k-mers to the model. Modified k-mers are different from unmodified k-mers. Errors
  • 43. Throughput is defined by: Number of channels. 512 on the MinION Speed of translocation. 30 bps/sec Occupancy of the pore. 90% The time a Flow Cell can run. ~60 hrs. Currently well over 1 Gb events. On R7.3 this translates to ~400 Mb 2D data. Throughput In “fast mode” the MinION will read 500 bps/sec. Currently three MAP groups are testing this. Throughput will increase to ~20 Gb in events.
  • 44. Longest 2D read: 93.5 Kbp Longest template read: 120 Kbp (231 Kbp) Highest yield: 1.32 Gevents R7 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Base pairs sequenced (Mbp) Runs template and 2D yield over the past year template 2D R7.3R6
  • 45. repeatunique sequence in unique sequence out Long reads can help to resolve repeat area’s in the assembly graph And the resulting contigs will now look like this: Untangle
  • 46. 1. Short read correction Quake (not for small genomes) 2. Short read assembly Velvet 3. MinION read alignment to Velvet contigs LAST 4. Link filtering and contig tiling Untangle script 5. Path detachment around repeats Untangle script 6. Bubble popping Untangle script 7. Delete unconfirmed connections Untangle script 8. Contig extraction Untangle script Assembly and scaffolding strategy Task Software
  • 47. Agrobacterium NCPPB 1771 assembly graph 25× transposon → (1160 bp) 8× transposon → (873 bp) 4× rRNA → (6.4 Kb) 271 nodes, 311 connections 154 contigs N50 = 198 Kb Sum = 5.87 Mb
  • 48. • Alignment: LAST with optimized settings • Links: alignment filtering and contig tiling • 7328 reads aligned to contigs • 438 reads aligned to multiple contigs • 585 links between contigs • 13158 reads on R6 and R7 chemistry • 73.8 Mb total yield (template and 2D) • 5–85970 nt length, typical ~12 Kb MinION sequencing and scaffolding
  • 49. Links between nodes are specific Means link is confirmed by PCR
  • 50. Final assembly graph after scaffolding • 271 nodes + 312 connections → 49 nodes + 5 connections • 154 contigs → ~8 contigs • Complete chromosome 2 (1.2 Mb), pTi (190 Kb), cryptic megaplasmid (746 Kb) • Slight residual fragmentation of chromosome 1
  • 51. Reads are in HDF5 format and contain all data from the event data onwards. A cloud based basecaller is provided by Oxford Naopore Technologies. The MAP community is actively developing software to use this type of data. Some examples: Jared Simpson’s pipeline to correct and assemble using only nanopore reads. Live monitoring, alignments and feedback to the MinION. Matt Loose’s Minotour. Squiggle space aligners Each base is measured 5 times in consecutive kmers so it makes sense to avoid basecalling and work directly with the events (squiggle space) Software
  • 52. London Calling 2015 Highlights from Clive Brown’s talk • Improvements to the basecaller . • Read until (and barcoding). • Fast mode on the MinION MkI (500 bp/sec instead of 30). • New 3000 channel ASIC with “crumpet” chip design to separate ASIC and fluidics part. • MinION MkII and PromethION will have this new ASIC. • Library prep on beads to reduce amounts of DNA needed (lower ng to pg). • Direct RNA sequencing. • Simplified sample preparation and VolTRAX. • Pricing will be “pay as you go”. Initial payment for hardware include some hrs sequencing. • MkI $270 and 3 hrs sequencing (~3 Gbp in fast mode).
  • 53. London Calling 2015 Much emphasis on getting the library prep simpler and faster to be able to leave the lab. If the system leaves the lab many more applications become possible. VolTRAX
  • 54. The technology underlying the MinION system is scalable so larger throughput can be made available relatively easy. It will use the new ASIC design and will have 144000 channels. Projected throughput: 6.4 Tbp/day. Too much data to do cloud baseclling so will be done locally. Access Program will start later this year. London Calling 2015 PromethION
  • 55. Freek Vonk Harald Kerkkamp Asad Hyder Michael Richardson Christiaan Henkel Paul Hooykaas Ron Dirks Guido van den Thillart Herman Spaink Pim Arntzen Erwin Fakkert Marten Boetzer Walter Pirovano Diana Uffink R. Manjunatha Kini Ken Kraaijeveld Yavuz Ariyurek Arnoud Schmitz Yahya Anvar Acknowledgments Dan Turner Oliver Hartwell