Más contenido relacionado La actualidad más candente (20) Similar a Improved Algorithm for Amplicon Sequencing Assay Designs (20) Más de Thermo Fisher Scientific (20) Improved Algorithm for Amplicon Sequencing Assay Designs1. ABSTRACT
Ion AmpliSeq™ sequencing is one of the most promising applications
of the Ion Torrent NGS platform. It involves multiplex PCR for target
enrichment. Thermo Fisher offers online Ion AmpliSeq Designer to
customers to assist assay designs. While more and more people are
adopting Ion AmpliSeq technologies, challenges for assay designs
started to emerge. Here we present bioinformatics approaches to
improve the following areas of assay design: 1) assay specificity; 2)
primer quality control; 3) SNP under primer; and 4) flexibility to adapt
to different applications of Ion AmpliSeq sequencing including variant
calling, copy number variation detection, RNA expression, gene fusion
detection, and metagenomics. Design algorithms are developed to
ensure high coverage with controlled risk of amplification efficiency,
off-target reads and SNP effects. With the optimized design algorithm,
numerous custom and community research panels have been
created, including the Ion AmpliSeq Exome Panel, TP53 Panel, and
CFTR Panel.
Thermo Fisher Scientific • 5791 Van Allen Way • Carlsbad, CA 92008 • www.lifetechnologies.com
Guoying Liu, Manimozhi Manivannan, Heinz Breu, Adam Broomer, Alexander Atkins, Kate Rhodes, Cristina Van Loy, Fiona Hyland,
Mark Andersen, Thermo Fisher Scientific, 200 Oyster Point Blvd, South San Francisco, CA, USA, 94080
Improved Algorithm for Amplicon Sequencing Assay Designs
1) Gene symbol, SNP rsID,
or COSMIC mutation ID
from human or mouse
genome
2) Region/SNP coordinates
from any pre-loaded
genomes
3) Region/SNP coordinates
from customer
uploaded DNA
sequences, which can
be from any genome or
even artificially built
1) Against a reference genome pre-
loaded in AmpliSeq designer
2) Against the set of sequence
contigs submitted for AmpliSeq
design
Primer Specificity Search
1) SNPs in dbSNP for reference genomes
pre-loaded in Ampliseq
2) SNPs in customer submitted sequence
contigs, specified by customer
Check for Known Polymorphic
Sites at Primer Biding Sites
1. A piece of DNA
sequence
2. The part of the
sequence
(targets) to
amplify
One set of primer-
pairs that specifically
amplify the targets
Translates to
I. Ion AmpliSeq DNA Design Overview
Table 1. Scenarios of Design Input
human, mouse
cow, chicken,
pig, sheep, rice,
maize, soybean,
and tomato
(case I)
human, mouse, cow,
chicken, pig, sheep,
rice, maize, soybean,
and tomato (case II)
custom reference
contigs from other
genomes
Type of
design
1) DNA
2) DNA Hotspot
3) RNA
1) DNA
2) DNA Hotspot
1) DNA
2) DNA Hotspot
1) DNA
2) DNA Hotspot
Submission
of design
targets
1) chromosomal
coordinates
2)gene names
3) rsID or
COSMIC ID
chromosomal
coordinates
Custom proprietary
sequence contigs
plus targets listed as
contig coordinates
custom reference
sequence contigs plus
targets listed as contig
coordinates
Primer
Specificity
check
human or
mouse
reference
genome
Respective
reference
genome
Respective reference
genome
1) One of the 10
supported reference
genomes as proxy
2) None (specificity
check against custome
contigs only)
SNPs to
avoid
Common
human or
mouse SNPs
from dbSNP
SNPs (if publicly
available) for
respective
genome
1) SNPs or variation
regions on custom
contigs
2) None
1) SNPs or variation
regions on custom
contigs
2) None
Scenario_1 Scenario_2 Scenario_3 Scenario_4
Figure 2. How existence of SNP at primer binding site affects read
count.
Category 0: SNP is homozygous in NA12878 and primer sequence
matches genomic DNA;
Category 1: SNP is heterozygous in NA12878, primer sequence
matches half of the genomic DNA;
Category 2: SNP is homozygous in NA12878 and primer sequence
does not match genomic DNA.
Normalized SNP Position – SNP position in primer sequence, counting
from 3’ end, normalized to a theoretical 33bp primer and binned by 3.
II. Avoid SNPs for Primer Design
Cutoff of Similarity-Hits (as defined below)
Figure 3. Effect of primer specificity on off-target reads.
Primer specificity means two things: 1) Number of locations the
primer binds to (even though not perfectly) to background DNA –
termed Similarity-Hits; 2) How well the primer binds to non-target
DNA. Shown above illustrates how off-target reads can be
controlled by limit primer Similarity-Hits.
III. Control Primer Specificity to Avoid Off-target
Reads
Target
Target
Identify one set of amplicons with:
1. Maximum coverage of target;
2. Minimum overall amplicon cost
(the lower the cost, the better the amplicon quality)
Tiling
Pooling
Pool 1:
Pool 2:
Unpooled:
B
C
Target
Target
Retain input amplicons meant to be “must-have”.
Tiling and Pooling
in one step
Pool 1:
Pool 2:
IV. Ion AmpliSeq Designer Tiling and Pooling
A.
Figure 4. Ion AmpliSeq designer tiling/pooling scheme for regular
DNA region and gene designs. A) a diagram illustrates the process;
B) an example of selected amplicons covering a region target.
Figure 5. Ion AmpliSeq designer tiling/pooling scheme for one-
pool DNA Hotspot designs. A) and B) show how an amplicon
would be selected by the tiling/pooling scheme shown in Figure 4;
C) shows the amplicon selected by the tiling scheme specified for
one-pool DNA Hotspot design.
A
Figure 6. Ion AmpliSeq designer tiling/pooling scheme when a
set of pre-selected amplicons (shown in red in the graph) are
specified to be included in a new panel.
Conclusions
Design algorithms of Ion AmpliSeq designer are continuously
improved to ensure amplicon sequencing designs lead to
successful next-gen sequencing applications like variant calling
and copy number analysis. More information can be found at
AmpliSeq.com.
Acknowledgements
The authors would like to thank Niranjan Vissa, Dong Kim, Annie
Titus, Chris Lasher, Ryan Kumsher, Winston Cheng, Poorva Soni,
Antonio Martinez Alcantara, Denise Topacio, Nisha Mulakken, Nriti
Garg, Pius Brzoska, Fangqi Hu, Francisco Hernandez-Guzman,
David Kopp, Arvind Kothandaraman and Anup Parikh for their
contributions and support.
Category
ReadCountReadCount
Normalized SNP Position
Category 0
Category 1 Category 2
Normalized SNP PositionNormalized SNP Position
IV-1. Tiling/Pooling for DNA region design
IV-2. Tiling/Pooling for single pool DNA
hotspot design
IV-3. Tiling/Pooling for DNA designs with
subsetting
B
Figure 1. Diagram for an overview of the DNA design workflow.
© 2015 Thermo Fisher Scientific Inc. All rights reserved. All trademarks are the property of Thermo
Fisher Scientific and its subsidiaries unless otherwise specified.
For Research Use Only. Not for use in diagnostic procedures.