SlideShare a Scribd company logo
1 of 3
Download to read offline
iMate Protocol (version 2.0) by GRAS – April 11, 2016
NGS and Phyloinfo in Kobe
http://www.clst.riken.jp/phylo/
1
iMate Protocol: Improved and Inexpensive NexteraTM
Mate Pair Library Preparation
Authorized by Kaori Tatsumi, Osamu Nishimura, Kazu Itomi, Chiharu Tanegashima & Shigehiro Kuraku
Genome Resource & Analysis Station (GRAS)
operated by Phyloinformatics Unit
RIKEN Center for Life Science Technologies (CLST)
Notice: A benchmark paper introducing this protocol has been published in the journal Biotechniques. When
you present or publish data based on technical guidance in this protocol, you could think about citing this
protocol at our web site and the benchmark paper (Tatsumi et al., 2015) published in Biotechniques.
This protocol outlines the modifications to the ‘Gel-plus’ version of the standard protocol for
Nextera Mate Pair Library Preparation and the logical background for them. Focusing on how to
best improve scaffolding performance, we optimized the protocol under the possibly
conservative policy that only read pairs with junction adaptors (bona fide ‘mates’) should be
passed on to scaffolding. The keys were optimizing the 1) tagmentation condition, 2) Covaris
shearing condition, and 3) sequence read length, in order to enhance the yield of libraries and
the capability of detecting the junction adaptor in reads.
Basically, we understand that 4μg of starting genomic DNA, as formulated in the standard
protocol, is enough for preparation of mate-pair libraries with mate distance of >10kb. Ideally, we
could optimize the tagmentation condition so that as much DNA as possible fall into the targeted
size range. For this purpose, perform tagment reaction with multiple conditions; for example, in
three tubes with 4, 8 and 12 μl of kit supplied tagment enzyme respectively. The tagment buffer
can be self-made [1], which leads to cost-saving, if other limiting reagents are also saved.
Size distribution of tagmented DNA molecules should be analyzed with a trustworthy method,
such as pulse field electrophoresis (e.g., PippinPulse) or the Agilent TapeStation―the Agilent
Bioanalyzer does not perform well for this purpose. With comparable results from multiple
tagment reactions, you could figure out which tagment condition allows you to retrieve the
largest amount of DNA for the targeted size range.
Like the previous tagmentation step, the amounts of the supplied reagents used in this step are
the limiting factor in terms of how many libraries can be prepared with one purchased kit. Thus,
it would be preferable to find a way to decrease the amount of kit-supplied reagents required to
perform this step. We previously suggested (in the iMate protocol versions 1.X) to perform
strand displacement with 1/4 volume of all reaction components after size selection with
BluePippin. However, we have recently found that this can result in contamination by read pairs
with untargeted mate distances. Therefore, we currently do not recommend reversing the order
of strand displacement and size selection. We are now looking into alternative ways to save kit-
supplied reagents in the strand displacement step.
Do as instructed in the standard protocol.
We use a BluePippin in this step and usually set a width size range of 4kb (for example, from 6
kb to 10 kb), although this is a matter of further consideration. After strand displacement and
size selection, it is ideal to retain at least 100 ng of DNA. Although the standard protocol
mentions ‘150-400 ng’ (on page 27), 100-200ng is realistic and still promising in our experience.
iMate Protocol (version 2.0) by GRAS – April 11, 2016
NGS and Phyloinfo in Kobe
http://www.clst.riken.jp/phylo/
2
Do as instructed in the standard protocol.
Shearing determines the length of library inserts, which should ideally be coordinated with the
selected sequencing read length. If you regard only reads with an adaptor junction as true mate
pairs, we propose a shearing condition which will ultimately produce a library possessing an
insert size distribution of 300 – 700bp, with the peak at 450-500bp. Note that this is markedly
different from the size distribution illustrated in the standard protocol (300-1200bp; on page 49).
To achieve our proposed size distribution, we recommend performing successive shearing via
multiple executions of the Covaris condition instructed in the standard protocol. In our
experience, shearing the genomes of different species with the same condition can result in
markedly different fragment size distributions. Thus, you need to optimize the condition
specifically for your species of interest. For one of the species we worked on, we performed as
many as 7 runs of Covaris shearing with the condition specified in the standard protocol.
You may feel an urge to perform QC with Bioanalyzer immediately after the Covaris shearing,
but it will not give you a fair assessment of shearing results unless you use a large quantity of
your sheared DNA for QC, which is undesirable. Thus, we recommend to save as much DNA as
possible at this stage, and to instead measure the size distribution later in the ‘
’ step.
Do as instructed in the standard protocol.
To get as many unique mate-pair reads as possible, it is strongly recommended to reduce PCR
cycles and avoid excessive amplification. We suggest performing no more than 10 cycles of
PCR. This warning is supported by our experience of getting a sufficient amount of products with
10 PCR cycles, even for samples that are supposed to require 15 cycles according to the
standard protocol (for example, 100ng for libraries with mate distant ranges of 6-10kb; see [2] for
details of cycle number estimates). In fact, we normally perform 8 PCR cycles, and only when
we find the yield too low after AMPure clean-up do we perform additional PCR cycling (still, no
more than 10 cycles in total). If you do not get enough products within 10 cycles, you had better
first optimize the tagment condition to increase the yield for the targeted size range.
With the illumina system, it seems that the insert lengths of many reads actually sequenced are
shorter than the most frequent insert length of a library. Thus, be sure to perform greedy size
selection with AMPure to get rid of molecules with short inserts, as instructed in the standard
protocol (x0.67 AMPure to get rid of <300bp molecules), no matter what the size distribution of
library inserts is. Modest size selection can result in high proportion of read pairs with too small
lengths, and they may not suffice for effective scaffolding.
Use Bioanalyzer or equivalent in this final QC before sequencing. Keep in mind that the size
distribution is determined mostly by shearing condition and AMPure clean-up, rather than the
choice of size range of mate distance.
iMate Protocol (version 2.0) by GRAS – April 11, 2016
NGS and Phyloinfo in Kobe
http://www.clst.riken.jp/phylo/
3
We use KAPA Library Quantification Kit (KK4835) in this step. Quantification should not be tricky
if the library should have an ordinary unimodal size distribution. The standard protocol says that
you need 1.5nM-20nM of the synthesized library, but we think that 2nM is enough unless the
sequencing facility you are working with requests much more than required in an actual
sequencing run.
In your first trial, it is advised to run a MiSeq for small-scale pilot sequencing to get 300nt-long
paired-end reads from prepared libraries―sequencing as many as 10 libraries per MiSeq run
should provide you with fair validation of the libraries. Obtained 300nt-long paired-end reads
could also be used for simulating which read length yields the highest proportion of reads with
junction adaptor; for example, by chopping them at 100nt, 127nt and 171nt (if sequencing with
HiSeq is planned next).
The lengths of 127nt and 171nt may sound unusual, but with Rapid Run on HiSeq one can
obtain reads of these lengths by making the best use of the extra cycles inherently assigned for
Nextera dual indexing, which we do not need in mate-pair sequencing. This trick allows you to
get 127nt and 171nt by using three and four of the TruSeq Rapid SBS Kit for 50 cycles,
respectively (see page 6 of the official manual for TruSeq Rapid SBS Kit). Please consult with
the sequencing facility that you plan to work with about the possibility of this extra-cycle
sequencing. The intention of getting 127nt or 171nt is to increase the proportion of reads with
the junction adaptor inside, but if one plans to use all obtained reads regardless of junction
adaptor inclusion, it may be wiser to respect cost-saving and go for 100nt or shorter reads.
In our experience, Rapid Run mode with v1 chemistry on older HiSeq Control Software (HCS) is
vulnerable to suboptimal library pooling, such as the ‘low plex pooling’ issue (see this document
by illumina). In the course of your mate pair sequencing, you may encounter a situation in which
you have only 4 or fewer libraries to be sequenced in a Rapid Run. In this case there is a high
chance that the base composition of index reads will be too homogeneous, and you will get
lower QV in index reads, resulting in a larger proportion of reads that failed in demultiplexing. To
reduce this unfavorable effect, you could introduce multiple indices per library in the step above
. As long as demultiplexing between libraries works out without any
overlap of indices, this strategy is supposed to produce as many valid reads as possible, with
the only drawback being the handling of more data files in post-sequencing informatics steps.
The latest versions of HCS (version 2.2.38 or higher) seems to be robust against low diversity
samples, so you are suggested to contact the sequencing facility you are working with in
advance to check if you need to be concerned with the low plex pooling issue.
We recommend to first run a recent version of FastQC (v0.11 or higher) on raw fastq files to
monitor some standard metrics, including the frequency of junction adaptor appearance along
base positions (in the ‘Adapter Content’ view newly added in FastQC v0.11).
After the primary QC, run a read processing program, such as NextClip [2], and assess PCR
duplicate rate and what proportion of reads has the junction adaptors. After read processing, be
sure to rerun FastQC on the processed fastq files, in order to confirm that junction/external
adaptors and low-quality bases were properly trimmed.
1. Wang Q, Gu L, Adey A, Radlwimmer B, Wang W, Hovestadt V, Bahr M, Wolf S, Shendure J, Eils R et al:
Tagmentation-based whole-genome bisulfite sequencing. Nature protocols 2013, 8(10):2022-2032.
2. Heavens D, Garcia Accinelli G, Clavijo B, and Derek Clark M: A method to simultaneously construct up to
12 differently sized Illumina Nextera long mate pair libraries with reduced DNA input, time, and cost.
BioTechniques 2015, 59(1):42-45.
3. Leggett RM, Clavijo BJ, Clissold L, Clark MD, Caccamo M: NextClip: an analysis and read preparation tool
for Nextera Long Mate Pair libraries. Bioinformatics 2014, 30(4):566-568.

More Related Content

What's hot

Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 
Bioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
Bioo Scientific - Improving the Performance of SureSelectXT2 Target CaptureBioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
Bioo Scientific - Improving the Performance of SureSelectXT2 Target CaptureBioo Scientific
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Torsten Seemann
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
 
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDTHigh efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDTIntegrated DNA Technologies
 
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...Baptiste Mayjonade
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Maté Ongenaert
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Thomas Keane
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 

What's hot (11)

Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
Bioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
Bioo Scientific - Improving the Performance of SureSelectXT2 Target CaptureBioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
Bioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
 
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDTHigh efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
 
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
 
PrimeTime® qPCR products for gene expression
PrimeTime® qPCR products for gene expressionPrimeTime® qPCR products for gene expression
PrimeTime® qPCR products for gene expression
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 

Viewers also liked

Implementing the Inverted Classroom in the Basic Video Course
Implementing the Inverted Classroom in the Basic Video Course Implementing the Inverted Classroom in the Basic Video Course
Implementing the Inverted Classroom in the Basic Video Course Chris Blair
 
The "-1 "strategy to save tree
The  "-1 "strategy to save treeThe  "-1 "strategy to save tree
The "-1 "strategy to save treeG K
 
Презентация Медиалогии с конференции "Дни PR на Юге"
Презентация Медиалогии с конференции "Дни PR на Юге"Презентация Медиалогии с конференции "Дни PR на Юге"
Презентация Медиалогии с конференции "Дни PR на Юге"Медиалогия
 
OSSLT presentation (2012)
OSSLT presentation (2012)OSSLT presentation (2012)
OSSLT presentation (2012)gpwoodburn
 
인터넷 비즈니스의 활용
인터넷 비즈니스의 활용인터넷 비즈니스의 활용
인터넷 비즈니스의 활용teeing055
 
인터랙티브디자인 김혜진
인터랙티브디자인 김혜진인터랙티브디자인 김혜진
인터랙티브디자인 김혜진Hyejin Kim
 
oxojob tutorial på svenska
oxojob tutorial på svenskaoxojob tutorial på svenska
oxojob tutorial på svenskaAndrei Postoaca
 
Sk134 vi bpha_2009
Sk134 vi bpha_2009Sk134 vi bpha_2009
Sk134 vi bpha_2009Arif Budiman
 
LMHS PowerPoint March 2015
LMHS PowerPoint March 2015LMHS PowerPoint March 2015
LMHS PowerPoint March 2015gpwoodburn
 
About Micro Leaks
About Micro LeaksAbout Micro Leaks
About Micro LeaksNitkap
 
4.15 인터랙티브
4.15 인터랙티브4.15 인터랙티브
4.15 인터랙티브Hyejin Kim
 
圖書館書香之旅(100.09)
圖書館書香之旅(100.09)圖書館書香之旅(100.09)
圖書館書香之旅(100.09)樟煥 劉
 
Alliance Rebel Rally Presentation
Alliance Rebel Rally PresentationAlliance Rebel Rally Presentation
Alliance Rebel Rally PresentationWorking Solutions
 
2012 2013 Grade Assemblies
2012 2013 Grade Assemblies2012 2013 Grade Assemblies
2012 2013 Grade Assembliesgpwoodburn
 
New england colonists
New england colonistsNew england colonists
New england colonistsmissjess41
 

Viewers also liked (20)

ИНФОПОВОД 2013: Samsung
ИНФОПОВОД 2013: SamsungИНФОПОВОД 2013: Samsung
ИНФОПОВОД 2013: Samsung
 
Principales ressources cartographiques et statistiques Centre GéoStat (2016)
Principales ressources cartographiques et statistiques Centre GéoStat (2016)Principales ressources cartographiques et statistiques Centre GéoStat (2016)
Principales ressources cartographiques et statistiques Centre GéoStat (2016)
 
Implementing the Inverted Classroom in the Basic Video Course
Implementing the Inverted Classroom in the Basic Video Course Implementing the Inverted Classroom in the Basic Video Course
Implementing the Inverted Classroom in the Basic Video Course
 
The "-1 "strategy to save tree
The  "-1 "strategy to save treeThe  "-1 "strategy to save tree
The "-1 "strategy to save tree
 
Презентация Медиалогии с конференции "Дни PR на Юге"
Презентация Медиалогии с конференции "Дни PR на Юге"Презентация Медиалогии с конференции "Дни PR на Юге"
Презентация Медиалогии с конференции "Дни PR на Юге"
 
OSSLT presentation (2012)
OSSLT presentation (2012)OSSLT presentation (2012)
OSSLT presentation (2012)
 
인터넷 비즈니스의 활용
인터넷 비즈니스의 활용인터넷 비즈니스의 활용
인터넷 비즈니스의 활용
 
인터랙티브디자인 김혜진
인터랙티브디자인 김혜진인터랙티브디자인 김혜진
인터랙티브디자인 김혜진
 
Arc 2016 - Architecture - Université Laval
Arc 2016 - Architecture - Université LavalArc 2016 - Architecture - Université Laval
Arc 2016 - Architecture - Université Laval
 
Overselling and why it is bad
Overselling and why it is badOverselling and why it is bad
Overselling and why it is bad
 
oxojob tutorial på svenska
oxojob tutorial på svenskaoxojob tutorial på svenska
oxojob tutorial på svenska
 
Sk134 vi bpha_2009
Sk134 vi bpha_2009Sk134 vi bpha_2009
Sk134 vi bpha_2009
 
LMHS PowerPoint March 2015
LMHS PowerPoint March 2015LMHS PowerPoint March 2015
LMHS PowerPoint March 2015
 
About Micro Leaks
About Micro LeaksAbout Micro Leaks
About Micro Leaks
 
4.15 인터랙티브
4.15 인터랙티브4.15 인터랙티브
4.15 인터랙티브
 
圖書館書香之旅(100.09)
圖書館書香之旅(100.09)圖書館書香之旅(100.09)
圖書館書香之旅(100.09)
 
Alliance Rebel Rally Presentation
Alliance Rebel Rally PresentationAlliance Rebel Rally Presentation
Alliance Rebel Rally Presentation
 
2012 2013 Grade Assemblies
2012 2013 Grade Assemblies2012 2013 Grade Assemblies
2012 2013 Grade Assemblies
 
New england colonists
New england colonistsNew england colonists
New england colonists
 
Services et ressources du Centre GéoStat
Services et ressources du Centre GéoStatServices et ressources du Centre GéoStat
Services et ressources du Centre GéoStat
 

Similar to iMate Protocol Guide version 2.0

Benchmark: Bananas vs Spark Streaming
Benchmark: Bananas vs Spark StreamingBenchmark: Bananas vs Spark Streaming
Benchmark: Bananas vs Spark StreamingAKUDA Labs
 
RSEM and DE packages
RSEM and DE packagesRSEM and DE packages
RSEM and DE packagesRavi Gandham
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...ijesajournal
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...ijesajournal
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesScott Edmunds
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfPushpendra83
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsPipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsAdam Bradley
 
20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slidesharehansjansen9999
 
about message coalescing
about message coalescingabout message coalescing
about message coalescingyyooooon
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08Computer Science Club
 

Similar to iMate Protocol Guide version 2.0 (20)

AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
Benchmark: Bananas vs Spark Streaming
Benchmark: Bananas vs Spark StreamingBenchmark: Bananas vs Spark Streaming
Benchmark: Bananas vs Spark Streaming
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
RSEM and DE packages
RSEM and DE packagesRSEM and DE packages
RSEM and DE packages
 
Final doc of dna
Final  doc of dnaFinal  doc of dna
Final doc of dna
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdf
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsPipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
 
20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare
 
about message coalescing
about message coalescingabout message coalescing
about message coalescing
 
Biomicrofluidics-9-044103-2015
Biomicrofluidics-9-044103-2015Biomicrofluidics-9-044103-2015
Biomicrofluidics-9-044103-2015
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
 

Recently uploaded

Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGSoniaBajaj10
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
dll general biology week 1 - Copy.docx
dll general biology   week 1 - Copy.docxdll general biology   week 1 - Copy.docx
dll general biology week 1 - Copy.docxkarenmillo
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 

Recently uploaded (20)

Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UG
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
dll general biology week 1 - Copy.docx
dll general biology   week 1 - Copy.docxdll general biology   week 1 - Copy.docx
dll general biology week 1 - Copy.docx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 

iMate Protocol Guide version 2.0

  • 1. iMate Protocol (version 2.0) by GRAS – April 11, 2016 NGS and Phyloinfo in Kobe http://www.clst.riken.jp/phylo/ 1 iMate Protocol: Improved and Inexpensive NexteraTM Mate Pair Library Preparation Authorized by Kaori Tatsumi, Osamu Nishimura, Kazu Itomi, Chiharu Tanegashima & Shigehiro Kuraku Genome Resource & Analysis Station (GRAS) operated by Phyloinformatics Unit RIKEN Center for Life Science Technologies (CLST) Notice: A benchmark paper introducing this protocol has been published in the journal Biotechniques. When you present or publish data based on technical guidance in this protocol, you could think about citing this protocol at our web site and the benchmark paper (Tatsumi et al., 2015) published in Biotechniques. This protocol outlines the modifications to the ‘Gel-plus’ version of the standard protocol for Nextera Mate Pair Library Preparation and the logical background for them. Focusing on how to best improve scaffolding performance, we optimized the protocol under the possibly conservative policy that only read pairs with junction adaptors (bona fide ‘mates’) should be passed on to scaffolding. The keys were optimizing the 1) tagmentation condition, 2) Covaris shearing condition, and 3) sequence read length, in order to enhance the yield of libraries and the capability of detecting the junction adaptor in reads. Basically, we understand that 4μg of starting genomic DNA, as formulated in the standard protocol, is enough for preparation of mate-pair libraries with mate distance of >10kb. Ideally, we could optimize the tagmentation condition so that as much DNA as possible fall into the targeted size range. For this purpose, perform tagment reaction with multiple conditions; for example, in three tubes with 4, 8 and 12 μl of kit supplied tagment enzyme respectively. The tagment buffer can be self-made [1], which leads to cost-saving, if other limiting reagents are also saved. Size distribution of tagmented DNA molecules should be analyzed with a trustworthy method, such as pulse field electrophoresis (e.g., PippinPulse) or the Agilent TapeStation―the Agilent Bioanalyzer does not perform well for this purpose. With comparable results from multiple tagment reactions, you could figure out which tagment condition allows you to retrieve the largest amount of DNA for the targeted size range. Like the previous tagmentation step, the amounts of the supplied reagents used in this step are the limiting factor in terms of how many libraries can be prepared with one purchased kit. Thus, it would be preferable to find a way to decrease the amount of kit-supplied reagents required to perform this step. We previously suggested (in the iMate protocol versions 1.X) to perform strand displacement with 1/4 volume of all reaction components after size selection with BluePippin. However, we have recently found that this can result in contamination by read pairs with untargeted mate distances. Therefore, we currently do not recommend reversing the order of strand displacement and size selection. We are now looking into alternative ways to save kit- supplied reagents in the strand displacement step. Do as instructed in the standard protocol. We use a BluePippin in this step and usually set a width size range of 4kb (for example, from 6 kb to 10 kb), although this is a matter of further consideration. After strand displacement and size selection, it is ideal to retain at least 100 ng of DNA. Although the standard protocol mentions ‘150-400 ng’ (on page 27), 100-200ng is realistic and still promising in our experience.
  • 2. iMate Protocol (version 2.0) by GRAS – April 11, 2016 NGS and Phyloinfo in Kobe http://www.clst.riken.jp/phylo/ 2 Do as instructed in the standard protocol. Shearing determines the length of library inserts, which should ideally be coordinated with the selected sequencing read length. If you regard only reads with an adaptor junction as true mate pairs, we propose a shearing condition which will ultimately produce a library possessing an insert size distribution of 300 – 700bp, with the peak at 450-500bp. Note that this is markedly different from the size distribution illustrated in the standard protocol (300-1200bp; on page 49). To achieve our proposed size distribution, we recommend performing successive shearing via multiple executions of the Covaris condition instructed in the standard protocol. In our experience, shearing the genomes of different species with the same condition can result in markedly different fragment size distributions. Thus, you need to optimize the condition specifically for your species of interest. For one of the species we worked on, we performed as many as 7 runs of Covaris shearing with the condition specified in the standard protocol. You may feel an urge to perform QC with Bioanalyzer immediately after the Covaris shearing, but it will not give you a fair assessment of shearing results unless you use a large quantity of your sheared DNA for QC, which is undesirable. Thus, we recommend to save as much DNA as possible at this stage, and to instead measure the size distribution later in the ‘ ’ step. Do as instructed in the standard protocol. To get as many unique mate-pair reads as possible, it is strongly recommended to reduce PCR cycles and avoid excessive amplification. We suggest performing no more than 10 cycles of PCR. This warning is supported by our experience of getting a sufficient amount of products with 10 PCR cycles, even for samples that are supposed to require 15 cycles according to the standard protocol (for example, 100ng for libraries with mate distant ranges of 6-10kb; see [2] for details of cycle number estimates). In fact, we normally perform 8 PCR cycles, and only when we find the yield too low after AMPure clean-up do we perform additional PCR cycling (still, no more than 10 cycles in total). If you do not get enough products within 10 cycles, you had better first optimize the tagment condition to increase the yield for the targeted size range. With the illumina system, it seems that the insert lengths of many reads actually sequenced are shorter than the most frequent insert length of a library. Thus, be sure to perform greedy size selection with AMPure to get rid of molecules with short inserts, as instructed in the standard protocol (x0.67 AMPure to get rid of <300bp molecules), no matter what the size distribution of library inserts is. Modest size selection can result in high proportion of read pairs with too small lengths, and they may not suffice for effective scaffolding. Use Bioanalyzer or equivalent in this final QC before sequencing. Keep in mind that the size distribution is determined mostly by shearing condition and AMPure clean-up, rather than the choice of size range of mate distance.
  • 3. iMate Protocol (version 2.0) by GRAS – April 11, 2016 NGS and Phyloinfo in Kobe http://www.clst.riken.jp/phylo/ 3 We use KAPA Library Quantification Kit (KK4835) in this step. Quantification should not be tricky if the library should have an ordinary unimodal size distribution. The standard protocol says that you need 1.5nM-20nM of the synthesized library, but we think that 2nM is enough unless the sequencing facility you are working with requests much more than required in an actual sequencing run. In your first trial, it is advised to run a MiSeq for small-scale pilot sequencing to get 300nt-long paired-end reads from prepared libraries―sequencing as many as 10 libraries per MiSeq run should provide you with fair validation of the libraries. Obtained 300nt-long paired-end reads could also be used for simulating which read length yields the highest proportion of reads with junction adaptor; for example, by chopping them at 100nt, 127nt and 171nt (if sequencing with HiSeq is planned next). The lengths of 127nt and 171nt may sound unusual, but with Rapid Run on HiSeq one can obtain reads of these lengths by making the best use of the extra cycles inherently assigned for Nextera dual indexing, which we do not need in mate-pair sequencing. This trick allows you to get 127nt and 171nt by using three and four of the TruSeq Rapid SBS Kit for 50 cycles, respectively (see page 6 of the official manual for TruSeq Rapid SBS Kit). Please consult with the sequencing facility that you plan to work with about the possibility of this extra-cycle sequencing. The intention of getting 127nt or 171nt is to increase the proportion of reads with the junction adaptor inside, but if one plans to use all obtained reads regardless of junction adaptor inclusion, it may be wiser to respect cost-saving and go for 100nt or shorter reads. In our experience, Rapid Run mode with v1 chemistry on older HiSeq Control Software (HCS) is vulnerable to suboptimal library pooling, such as the ‘low plex pooling’ issue (see this document by illumina). In the course of your mate pair sequencing, you may encounter a situation in which you have only 4 or fewer libraries to be sequenced in a Rapid Run. In this case there is a high chance that the base composition of index reads will be too homogeneous, and you will get lower QV in index reads, resulting in a larger proportion of reads that failed in demultiplexing. To reduce this unfavorable effect, you could introduce multiple indices per library in the step above . As long as demultiplexing between libraries works out without any overlap of indices, this strategy is supposed to produce as many valid reads as possible, with the only drawback being the handling of more data files in post-sequencing informatics steps. The latest versions of HCS (version 2.2.38 or higher) seems to be robust against low diversity samples, so you are suggested to contact the sequencing facility you are working with in advance to check if you need to be concerned with the low plex pooling issue. We recommend to first run a recent version of FastQC (v0.11 or higher) on raw fastq files to monitor some standard metrics, including the frequency of junction adaptor appearance along base positions (in the ‘Adapter Content’ view newly added in FastQC v0.11). After the primary QC, run a read processing program, such as NextClip [2], and assess PCR duplicate rate and what proportion of reads has the junction adaptors. After read processing, be sure to rerun FastQC on the processed fastq files, in order to confirm that junction/external adaptors and low-quality bases were properly trimmed. 1. Wang Q, Gu L, Adey A, Radlwimmer B, Wang W, Hovestadt V, Bahr M, Wolf S, Shendure J, Eils R et al: Tagmentation-based whole-genome bisulfite sequencing. Nature protocols 2013, 8(10):2022-2032. 2. Heavens D, Garcia Accinelli G, Clavijo B, and Derek Clark M: A method to simultaneously construct up to 12 differently sized Illumina Nextera long mate pair libraries with reduced DNA input, time, and cost. BioTechniques 2015, 59(1):42-45. 3. Leggett RM, Clavijo BJ, Clissold L, Clark MD, Caccamo M: NextClip: an analysis and read preparation tool for Nextera Long Mate Pair libraries. Bioinformatics 2014, 30(4):566-568.