SlideShare una empresa de Scribd logo
1 de 60
Knowing Your NGS
Upstream: Alignment
and Variants
March 27, 2013
Gabe Rudy, Vice President of
Product Development
Use the Questions pane in
your GoToWebinar window
Questions during
the presentation
Goals
 What I Assume About You
- Some experience with NGS technology
- Not a command line bioinformatician by day;
not afraid of technical terms
 What You Will Learn
- A healthy skepticism when looking at NGS
data
- What to expect/not expect from core labs or
upstream sequencing service providers
- Reading pile-ups in a genome browser and
spotting high quality vs sketchy variants
 What You Won’t Learn
- Interpreting biological significance of variants
- One true way to do secondary analysis
I won’t do this… hopefully!
My Background
 Golden Helix
- Founded in 1998
- Genetic association software
- Analytic services
- Hundreds of users worldwide
- Over 700 customer citations in scientific
journals
 Products I Build with My Team
- SNP & Variation Suite (SVS)
- SNP, CNV, NGS tertiary analysis
- Import and deal with all flavors of upstream data
- GenomeBrowse
- Visualization of everything with genomic coordinates.
All standardized file formats.
- RNA-Seq Pipeline
- Expression profiling bioinformatics
Agenda
Why You Should Care About Your Upstream?
Service Provider Deliverables: CEPH Trio Example
2
3
4
A Drink from the Bioinformatics Firehose
Background and Definitions1
Applications That Require Special Upstream Analysis5
Agenda
Why You Should Care About Your Upstream?
Service Provider Deliverables: CEPH Trio Example
2
3
4
A Drink from the Bioinformatics Firehose
Background and Definitions1
Applications That Require Special Upstream Analysis5
NGS Analysis
Primary
Analysis
Secondary
Analysis
Tertiary
Analysis
“Sense Making”
 Analysis of hardware generated data, on-machine real-time stats.
 Production of sequence reads and quality scores
 QA and clipping/filtering reads
 Alignment/Assembly of reads
 Recalibrating, de-duplication, variant calling on aligned reads
 QA and filtering of variant calls
 Annotation and filtering of variants
 Multi-sample integration
 Visualization of variants in genomic context
 Experiment-specific inheritance/population analysis
Primary Analysis: I’d like some AGCT’s please
 Standardized on producing FASTQ
- AGCT or N
- Quality scores
- Pair of files for paired end
 Happens on machine for desktop
sequencers
- Ion Torrent processing microwell detectors
- MiSeq doing optic processing of flowcell
- PacBio processing optics of ZMW
 HiSeq 2000/2500
- Requires off-machine base-calling
- Can “call bases” with Illumina software on raw
data collected tile by tile
Assembly vs Alignment
 De Novo Genome Assembly
- Very difficult for large genomes to get
to “finished” genome quality
(traditionally done with Sanger).
- Short reads will get you to contigs
sizes of ~10-100Kb range.
- Need long reads (PacBio) or
restriction maps optical mapping
(OpGen) to make chromosomal sized
contigs
 Alignment
- Aligning to finished (or draft) genomes
that is considered “reference”
- Allows for some differences, but not
too many between your reads and the
reference
The Human Reference Sequence
 Genome Reference Consortium (GRCh37)
- Feb 2009, previous was NCBI36 March 2006
- 9 alt loci and 187 patches (11 patch releases)
 Supercontigs: Large unplaced contigs
- Some localized to chr level and some unknown
 Does not include a Mitochondrial reference
- UCSC hg19 includes older NCBI 36 MT
- 1000 genomes project using revised Cambridge
Reference Sequence (rCRS)
- Provide “g1k” reference: includes rCRS, Human
herpesvirus 4 type 1, supercontigs and “decoy”
sequence
 v38 genome coming this summer:
- Incorporate all patches into the reference
- Some allele fixes to have reference match major
Example Patch
 The tiling path in GRCh37 switched in the middle of ABO gene resulting in a
reference protein not present in humans.
 Patch adjusts tile path and fixes the problem.
 All patches will be incorporated into GRCh38, due this summer. Until then,
all alignment is done against unpatched reference.
Single Nucleotide Variants (i.e. SNVs or SNPs)
 Single base substitution from reference
 Note that “reference” is not always the “major” allele
 “Multi-allelic” sites have more than 2 cataloged
alleles
Gholson Lyon, 2012
Small Insertions/Deletions
 Generally defined as being < 150bp (often much shorter)
 Frameshift insertions/deletions important “loss of function” class of
variants
- Although InDels divisible by three are “in-frame” when in coding region
 Hard to call consistently. Poor concordance between algorithms.
 Where to call an InDel in a homopolymer?
- GTTTAC
- GTTTTAC
- 01234567
- How do you describe the insertion? Ins of T at 5? Or ins of T at 1?
- CGI in their v1 pipeline preferred calling insertion at end, others at beginning, now
always at beginning
 MNP – Can also be called differently
Copy Number Variants
 Required WGS
- CNVs > 10kb pretty accurate.
- 1kb to 10kb problematic.
 Detecting Deletions
- Can see coverage drop to near zero
- Harder to pinpoint breakpoint
- Possible false positives in low-
mappability regions
 Amplifications
- Can see coverage jump
- False positives due sample prep or
sequence artifacts
 Need “baseline,” look at Log Ratio
- Somatic detection uses normal
tissues
- Can have control population
Venter vs Watson WGS CNV-seq
64kbp gain in DDAH1 Gene of NA12878
Structural Variants
 Looking for:
- Balanced rearrangements
- Inversions
- Translocations
- Complex
 Signals to detect SV:
- Paired-end mappings too big (deletion)
- Depth of coverage
- Split-read mapping
 Translocations can result in “fusion”
genes.
- For example BCR-ABL fusion gene central in
pathogenesis certain leukemias.
Example 1kb Inversion (intron of APP)
Tertiary Analysis – “Sense Making”
 Detecting Known Clinically Relevant Variants
- Use targeted gene panels. Amplicons or custom capture.
- Look for carrier status or present of pathogenic or PGx variants
 Rare, Functional Variant Search and Interpretation
- Rare Mendelian Diseases
- Clinical Diagnostic: Ending Diagnostic Odyssey
- Looking for rare variants of functional consequence to a known phenotype
- Exome sequencing common, but whole genome has proponents
- Trios often used for looking at inheritance of putative variants (compound hets)
 Population Studies
- Like NHLBI or others studying complex disease
- Often looking at “variant burden” over genes between cases/controls
 Driver Somatic Variant Identification
- Looking for variants in tumor samples but not matched normal
- Not just SNPs and InDels, but CNVs and SVs
Not just DNA… but still DNA sequencing
 RNA-Seq
- Align to “transcriptome”, but often do analysis with reference genome coordinates and
reads “gapped” over introns they span in their spliced form
- Using read counts to approximate relative abundance of RNA in sample
- Compare relative abundance between groups
- Discover new transcribed genes or alternative splicing
 ChIP-Seq
- Measure sites and intensities of various proteins binding to DNA
- ENCODE project used to catalog TFBS and other functional elements
 Methyl-Seq
- Get sequences only with epigenetic methylation mark
- Run peak identification and intensity to look at relative levels of methylation
Agenda
Why You Should Care About Your Upstream?
Service Provider Deliverables: CEPH Trio Example
2
3
4
A Drink from the Bioinformatics Firehose
Background and Definitions1
Applications That Require Special Upstream Analysis5
The Promise
 Both in research and clinical care, NGS is
powering discoveries making impactful
diagnoses
 Desktop sequencers and gene panels
much more economical than gene-by-gene
hunts
 Exomes have lead to many rare disease
diagnoses and affordably assay rare
functional variants
 Whole genomes have led to clinical
success stories and promise to be
instrumental to our understanding of complex
disease genetics
 Barrier to entry is lower than ever
Things That Can Confound Your Experiment
Library preparation errors Sequencing errors Analysis errors
 PCR amplification point
mutations (e.g. TruSeq
protocol, amplicons)
 Emultion PCR
amplification point
mutations (454, Ion
Torrent and SOLiD)
 Bridge amplification errors
(Illumina)
 Chimera generation
(particularly during
amplicon protocols)
 Sample contamination
 Amplification errors
associated with high or low
GC content
 PCR duplicates
 Base miscalls due to low
signal
 InDel errors (particular
PacBio)
 Short homopolymer
associated InDels (Ion
Torrent PGM)
 Post-homopolymeric tract
SNPs (Illumina) and/or
read-through problems
 Associated with inverted
repeats (Illumina)
 Specific motifs particularly
with older Illumina
chemistry
 Calling variants without
sufficient reads mapping
 Bad mapping (incorrectly
placed read)
 Correctly placed read but
InDels misaligned
 Multi-mapping to
repeat/paralogous regions
 Sequence contamination
e.g. adaptors
 Error in reference
sequence
 Alignment to ends of
contigs in draft assemblies
 Incorrect trimming of
reads, aligning adaptors
 Inclusion of PCR
duplicates
Nick Loman: Sequencing data: I want the truth! (You can’t handle the truth!)
Qual et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent,
Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012 Jul
Your Choice of Technologies, Sometimes…
Platform Illumina MiSeq Ion Torrent PGM Ion Torrent Proton PacBio RS Illumina HiSeq 2000
Instrument Cost* $125 K $50 K $150K $695 K $654 K
Sequence yield per
run
1.5-2Gb
20-50 Mb on 314
chip, 100-200 Mb on
316, 1Gb on 318
10Gb on PI, 30GB on
PII (Mid 2013)
100 Mb 600Gb
Sequencing cost per
Gb*
$502 $500 (318 chip) $70 (PI chip) $2000 $41
Run Time 27 hours*** 2 hours 3 hours 2 hours 11 days
Reported Accuracy Mostly > Q30 Mostly Q20 Claimed >Q30 <Q10 Mostly > Q30
Observed Raw Error
Rate
0.80 % 1.71 % Probably ~1% 12.86 % 0.26 %
Read length up to 150 bases ~200 bp 100bp (200bp PII)
Average 1500
bases
up to 150 bases
Paired reads Yes Yes Yes No Yes
Insert size up to 700 bases up to 250 bases up to 250 bases up to 10 kb up to 700 bases
Typical DNA
requirements
50-1000 ng 100-1000 ng 100-1000 ng ~1 μg 50-1000 ng
Applications Targeted Targeted Exomes, RNA-Seq
Assembly,
Validation
Exomes, Genomes,
RNA
Qual et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent,
Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012 Jul 29
[Show SNPs/Indels GenomeBrowse]
Agenda
Why You Should Care About Your Upstream?
Service Provider Deliverables: CEPH Trio Example
2
3
4
A Drink from the Bioinformatics Firehose
Background and Definitions1
Applications That Require Special Upstream Analysis5
Agenda
Why You Should Care About Your Upstream?
Service Provider Deliverables: CEPH Trio Example
2
3
4
A Drink from the Bioinformatics Firehose
Background and Definitions1
Applications That Require Special Upstream Analysis5
 File formats
 Popular tools
 QA Filtering
 Visualization
FASTQ
 Contains 3 things per read:
- Sequence identifier (unique)
- Sequence bases [len N]
- Base quality scores [len N]
 Often “gzip” compressed (fq.gz)
 If not demultiplexed, first 4 or 6bp
is the “barcode” index. Used to
split lanes out by sample.
 Filtering may include:
- Removing adapters & primers
- Clip poor quality bases at ends
- Remove flagged low-quality reads
@HWI-ST845:4:1101:16436:2254#0/1
CAAACAGGCATGCGAGGTGCCTTTGGAAAGCCCCAGGGCACTGTGGCCAG
+
Y[SQORPMPYRSNP_][_babBBBBBBBBBBBBBBBBBBBBBBBBBB
SAM/BAM
 Spec defined by samtools author
Heng Li, aka Li H, aka lh3.
 SAM is text version (easy for any
program to output)
 BAM is binary/compressed version
with indexing support
 Alignment in terms of code of
matches, insertions, deletions,
gaps and clipping
 Can have any custom flags set by
analysis program (and many do)
Key Fields
 Chr, position
 Mapping quality
 CIGAR
 Name/position of mate
 Total template length
 Sequence
 Quality
VCF
 Specification defined by the 1000 genomes
group (now v4.1)
 Commonly compressed indexed with
bgzip/tabix (allows for reading directly by a
Genome Browser)
 Contains arbitrary data per “site” (INFO
fields) and per sample
 Single-Sample VCF:
- Contains only the variants for the sample.
 Multi-Sample VCF:
- Whenever one sample has a variant, all samples get
a “genotype” (often “ref”)
 Caveat:
- VCF requires a reference base be specified. Leaving
insertions to be “encoded” 1bp differently than they
are annotated
##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Sa
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequen
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral All
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membershi
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 members
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have d
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Q
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype
Aligners
 BWA (also by Li H)
- Most prolifically used for genome alignment
- BWA-SW version geared for long reads (>100bp)
- Supports aligning with insertions/deletions to
reference
 Bowtie (John Hopkins, part of “Tuxedo” suite)
- Very fast, used commonly in RNA-Seq workflows
- Version 1 did not support “gapped” alignments
- Bowtie2 supports local gapped, longer reads
 Novoalign, Eland, SOAP, MAQ,
- Seed and expand strategy
 TopHat, SHRiMP, STAR, Gmap
- Specifically designed for ESTs
 Most improved by paired-end (mate-pairs)
InDel Re-Alignment
 Place, then realign “de Novo”
- Each read aligned independently by
global aligner.
- May have different preference of how to
handle “gaps” to reference.
 Local Re-Aligners for InDels
- Pindel
- GATK
 Important areas still problematic:
- CpG islands
- Promoter and 5′-UTR regions of the
genome
reference CAATC realignment CAATC
read1 CA-TC ----> CA-TC
read2 C-ATC CA-TC
Weixin Wang. Next generation sequencing has lower sequence coverage and poorer SNP-detection
capability in the regulatory regions. Scientific Reports 1, Article number: 55
AUC (area under the curve) comparison for different
genetic regions.
Variant Callers
 Samtools
- “mpileup” command computes BAQ, preforms local realignment
- Many filters can be applied to get high-quality variants
 GATK
- More than just a variant caller, but UnifiedGenotyper is widely used
- Also provides pre-calling tools like local InDel realignment and quality
score recalibration
 Custom tools specific to platform:
- CASAVA includes a variant caller for illumna whole-genome data
- Ion Torrent has a caller that handles InDels better for their tech
Quality Score Recalibration
DePristo (2011) A framework for variation discovery …. Nature Genetics 43(5) 491
BWA+GATK Best Practices Pipeline
Getting to High Confidence Variants
 Hard filters versus heuristic
based statistics
 <10bp considered threshold
for “low coverage”
 Quality score recalibration
Unfiltered Provided RD>10 & GQ>20 Exonic
SNPs 98621 89132 65009 19365
InDels 8141 7800 6503 428
Ts/Tv 2.36 2.45 2.54 3.26
Mendel Errors 234 202 46 3
Gabe
Trio
Ti/Tv
 Ts/Tv ratio can measure true
biological ratio of mutation types
versus sequence error:
- Random seq errors: 2/4 or 0.5
- Genome-wide: ~2.0-2.1
- Exome capture: ~2.5-2.8
- Coding: ~3.0-3.3
 Divergent too far than this
indicates random sequence
errors biasing the number.
DePristo (2011) A framework for variation discovery and genotyping using
next-generation DNA sequencing data. Nature Genetics 43(5) 491
Visualization
 Genome browsers:
- Validate variant calls
- Look at gene annotations,
problematic regions, population
catalogs
- Compare samples where no
variant called
 Free Genome Browsers:
- IGV
- Popular desktop by Broad
- UCSC
- Web-based, most extensive
annotations
- GenomeBrowse
- Designed to be publication ready
- Smooth zoom and navigation
Updates in Software Can Introduce Bugs
 Found 8K phantom variants in my
“final” 23AndMe exome
[My 23andMe “Buggy” Variant and Interpretation Example]
Agenda
Why You Should Care About Your Upstream?
Service Provider Deliverables: CEPH Trio Example
2
3
4
A Drink from the Bioinformatics Firehose
Background and Definitions1
Applications That Require Special Upstream Analysis5
Recent Blog Post
http://blog.goldenhelix.com/?p=1725
The State of NGS Variant Calling: Don’t Panic!!
From which sequencing provider(s) do
you receive your upstream data from?
POLL:
Complete Genomics
 Different:
- Sequencing technology
- Alignment/variant calling algorithms
- File formats
 But high quality:
- MNPs, Indels
- CNV/SV calls
 Whole genome only
 Also provide tumor/normal pair
analysis
 Being acquired by BGI, some
question their sustainability
Complete Genomics Deliverables
 Summary statistics
 “var” and “masterVar” files
- Can be converted to VCF
- Some tools (like SVS) can import them
directly
 Evidence files
- Can be converted to BAM
 CNV, SV calls in text files
- CNV: Chr1:85980000-86006000 2.06 4x
gain, covers DDAH1
- SV: Chr21:27374158-27374699 common
inversion
Illumina Genome Network
 Standardized sequencing and
analysis, but multiple labs may be
contracted service provider.
 30x whole genomes
- SNPs, InDels, CNVs, SVs
- Concordance with SNP array (provided)
- Summary report
 Illumina provided tools used
- CASAVA toolkit with ELAND aligner
 Also provide Tumor/Normal pair
- Somatic SNVs and InDels identified by
looking at the tumor/normal together
Your Local Core Lab – Or 23andMe Exome Pilot!
 Research core labs often use a
BWA+GATK pipeline
- Especially for exomes
 Deliverables:
- VCF with SNVs, InDels
- BAM
 Tools for CNV/SV calling less
standardized
- Not commonly attempted with exomes
CEPH Trio
 The “benchmark” trio.
- Child, female NA12878 may be the most sequenced cell line
- Father NA12891 and Mother NA12892
 Whole Genome Data for Trio
- CGI with v2 pipeline
- IGN WGS at 30x, 100bp PE
- “Core Lab” BWA + GATK Best Practices on 100bp PE
 Concordance and Comparisons
- Lets interactively review examples where these three service providers differ and how.
Total Imported Variants
Genotype Quality
Genotype Quality
Read Depth
SNV Concordance Rate
InDel Concordance Rate
De Novo Mutations in Trio
GQ and DP of Shared de Novo Mutations
[deNovo and SV/CNV of NA12878 trio]
Agenda
Why You Should Care About Your Upstream?
Service Provider Deliverables: CEPH Trio Example
2
3
4
A Drink from the Bioinformatics Firehose
Background and Definitions1
Applications That Require Special Upstream
Analysis
5
Applications That Require Special Upstream Analysis
 MHC Region
 Somatic Variant Calling
 RNA-Seq
 Alu and other repeats
 Phased variants and complex MNP
 Moving to a new reference genome
Somatic Sniper and Friends
 Complete Genomics and IGN provide secondary alignment
specific to tumor/normal pairs.
 Do variant calling with on BAMs on pair in conjunction
 SomaticSniper approach:
 Covered by at least 3 reads
 Consensus quality of at least 20
 Called a SNP in the tumor sample with SNP quality of at least 20
 Maximum mapping quality of at least 40
 No high-quality predicted indel within 10 bp
 No more than 2 other SNVs called within 10 bp
 Not in dbSNP (non-cancer dbSNP)
 LOH filter (germline is het and tumor is homozygous)
What’s Next?
Download
www.goldenhelix.com
The Analysis and
Interpretation of My DTC
23andMe Exome
blog.goldenhelix.com
Killer App
Assembly and Alignment
Short Courses
Questions?
Use the Questions pane in
your GoToWebinar window

Más contenido relacionado

La actualidad más candente

BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Productsbiochain
 
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...eventi-ITBbari
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.mkim8
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)LOGESWARAN KA
 
A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...Alexander Jueterbock
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101Ino de Bruijn
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposalGenomeInABottle
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single CellQIAGEN
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
 
QIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene PanelsQIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene PanelsQIAGEN
 
Molecular QC: Interpreting your Bioinformatics Pipeline
Molecular QC: Interpreting your Bioinformatics PipelineMolecular QC: Interpreting your Bioinformatics Pipeline
Molecular QC: Interpreting your Bioinformatics PipelineCandy Smellie
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowBrian Krueger
 
Exploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingExploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingQIAGEN
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiomejukais
 

La actualidad más candente (20)

BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Products
 
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 
A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single Cell
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 
QIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene PanelsQIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene Panels
 
Molecular QC: Interpreting your Bioinformatics Pipeline
Molecular QC: Interpreting your Bioinformatics PipelineMolecular QC: Interpreting your Bioinformatics Pipeline
Molecular QC: Interpreting your Bioinformatics Pipeline
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
Ngs part i 2013
Ngs part i 2013Ngs part i 2013
Ngs part i 2013
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
Exploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingExploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencing
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 

Destacado

NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...VHIR Vall d’Hebron Institut de Recerca
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platformsAllSeq
 
Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Christian Frech
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsAnnelies Haegeman
 
Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)AllSeq
 
Illumina GAIIx for high throughput sequencing
Illumina GAIIx for high throughput sequencingIllumina GAIIx for high throughput sequencing
Illumina GAIIx for high throughput sequencingCristian Cosentino, PhD
 
The Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatmentThe Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatmentPremadarshini Sai
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...DavidClark206
 
Operating Systems - File Management
Operating Systems -  File ManagementOperating Systems -  File Management
Operating Systems - File ManagementDamian T. Gordon
 
The Ultimate Guide to Creating Visually Appealing Content
The Ultimate Guide to Creating Visually Appealing ContentThe Ultimate Guide to Creating Visually Appealing Content
The Ultimate Guide to Creating Visually Appealing ContentNeil Patel
 
Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShareKapost
 
Halal logistics in malaysia and 5 continents
Halal logistics  in malaysia and 5 continentsHalal logistics  in malaysia and 5 continents
Halal logistics in malaysia and 5 continentsMohd Farid Awang
 
Les 5 forces de porter - comprendre l'outil
Les 5 forces de porter - comprendre l'outilLes 5 forces de porter - comprendre l'outil
Les 5 forces de porter - comprendre l'outilMarie Delphine SANCHIS
 
10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation OptimizationOneupweb
 
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingHow To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingContent Marketing Institute
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...SlideShare
 

Destacado (20)

NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 
Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)
 
Illumina GAIIx for high throughput sequencing
Illumina GAIIx for high throughput sequencingIllumina GAIIx for high throughput sequencing
Illumina GAIIx for high throughput sequencing
 
The Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatmentThe Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatment
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...
 
Operating Systems - File Management
Operating Systems -  File ManagementOperating Systems -  File Management
Operating Systems - File Management
 
The Ultimate Guide to Creating Visually Appealing Content
The Ultimate Guide to Creating Visually Appealing ContentThe Ultimate Guide to Creating Visually Appealing Content
The Ultimate Guide to Creating Visually Appealing Content
 
Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShare
 
Halal logistics in malaysia and 5 continents
Halal logistics  in malaysia and 5 continentsHalal logistics  in malaysia and 5 continents
Halal logistics in malaysia and 5 continents
 
Les 5 forces de porter - comprendre l'outil
Les 5 forces de porter - comprendre l'outilLes 5 forces de porter - comprendre l'outil
Les 5 forces de porter - comprendre l'outil
 
10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization
 
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingHow To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
 
5000 Sat Words With Definitions
5000 Sat Words With Definitions5000 Sat Words With Definitions
5000 Sat Words With Definitions
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
 

Similar a Knowing Your NGS Upstream: Alignment and Variants

Under the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS ResearchersUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers Golden Helix Inc
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSGolden Helix Inc
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
BEAGLE Imputation in SVS for Human & Animal SNP Data
BEAGLE Imputation in SVS for Human & Animal SNP DataBEAGLE Imputation in SVS for Human & Animal SNP Data
BEAGLE Imputation in SVS for Human & Animal SNP DataGolden Helix
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?Nick Loman
 
CS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databasesCS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databasesGabe Rudy
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08Computer Science Club
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_PresentationToyin23
 
Eccmid meet the-expert
Eccmid meet the-expertEccmid meet the-expert
Eccmid meet the-expertNick Loman
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineGabe Rudy
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsGolden Helix Inc
 

Similar a Knowing Your NGS Upstream: Alignment and Variants (20)

Under the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS ResearchersUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Beagle Imputation in SVS
Beagle Imputation in SVSBeagle Imputation in SVS
Beagle Imputation in SVS
 
BEAGLE Imputation in SVS for Human & Animal SNP Data
BEAGLE Imputation in SVS for Human & Animal SNP DataBEAGLE Imputation in SVS for Human & Animal SNP Data
BEAGLE Imputation in SVS for Human & Animal SNP Data
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
CS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databasesCS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databases
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
Eccmid meet the-expert
Eccmid meet the-expertEccmid meet the-expert
Eccmid meet the-expert
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision Medicine
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
 

Más de Golden Helix Inc

Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...Golden Helix Inc
 
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...Golden Helix Inc
 
Authoring Clinical Reports in VarSeq
Authoring Clinical Reports in VarSeqAuthoring Clinical Reports in VarSeq
Authoring Clinical Reports in VarSeqGolden Helix Inc
 
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...Golden Helix Inc
 
Two Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical ReportTwo Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical ReportGolden Helix Inc
 
The Molecular Sciences Made Personal
The Molecular Sciences Made PersonalThe Molecular Sciences Made Personal
The Molecular Sciences Made PersonalGolden Helix Inc
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-AnalysisGolden Helix Inc
 
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeqIntroducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeqGolden Helix Inc
 
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...Golden Helix Inc
 
Cancer Workflows in VarSeq
Cancer Workflows in VarSeqCancer Workflows in VarSeq
Cancer Workflows in VarSeqGolden Helix Inc
 
Getting Started with VSWarehouse - The User Experience
Getting Started with VSWarehouse - The User ExperienceGetting Started with VSWarehouse - The User Experience
Getting Started with VSWarehouse - The User ExperienceGolden Helix Inc
 
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced CardiotoxicityPharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced CardiotoxicityGolden Helix Inc
 
Using WES in Distant Relationships to Identify Cardiomyopathy Genes
Using WES in Distant Relationships to Identify Cardiomyopathy GenesUsing WES in Distant Relationships to Identify Cardiomyopathy Genes
Using WES in Distant Relationships to Identify Cardiomyopathy GenesGolden Helix Inc
 
Using Clinical Reports as a part of a Gene Panel Pipeline
Using Clinical Reports as a part of a Gene Panel PipelineUsing Clinical Reports as a part of a Gene Panel Pipeline
Using Clinical Reports as a part of a Gene Panel PipelineGolden Helix Inc
 
Investigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceInvestigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceGolden Helix Inc
 
Personalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor SequencingPersonalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor SequencingGolden Helix Inc
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 

Más de Golden Helix Inc (20)

Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
 
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
 
Authoring Clinical Reports in VarSeq
Authoring Clinical Reports in VarSeqAuthoring Clinical Reports in VarSeq
Authoring Clinical Reports in VarSeq
 
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
 
Two Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical ReportTwo Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical Report
 
The Molecular Sciences Made Personal
The Molecular Sciences Made PersonalThe Molecular Sciences Made Personal
The Molecular Sciences Made Personal
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-Analysis
 
A Walk Through GWAS
A Walk Through GWASA Walk Through GWAS
A Walk Through GWAS
 
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeqIntroducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
 
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
 
Cancer Workflows in VarSeq
Cancer Workflows in VarSeqCancer Workflows in VarSeq
Cancer Workflows in VarSeq
 
Getting Started with VSWarehouse - The User Experience
Getting Started with VSWarehouse - The User ExperienceGetting Started with VSWarehouse - The User Experience
Getting Started with VSWarehouse - The User Experience
 
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced CardiotoxicityPharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
 
Custom Family Workflows
Custom Family WorkflowsCustom Family Workflows
Custom Family Workflows
 
Using WES in Distant Relationships to Identify Cardiomyopathy Genes
Using WES in Distant Relationships to Identify Cardiomyopathy GenesUsing WES in Distant Relationships to Identify Cardiomyopathy Genes
Using WES in Distant Relationships to Identify Cardiomyopathy Genes
 
Using Clinical Reports as a part of a Gene Panel Pipeline
Using Clinical Reports as a part of a Gene Panel PipelineUsing Clinical Reports as a part of a Gene Panel Pipeline
Using Clinical Reports as a part of a Gene Panel Pipeline
 
Investigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceInvestigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol Dependence
 
Personalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor SequencingPersonalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor Sequencing
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
CNV Analysis in VarSeq
CNV Analysis in VarSeqCNV Analysis in VarSeq
CNV Analysis in VarSeq
 

Último

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 

Último (20)

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 

Knowing Your NGS Upstream: Alignment and Variants

  • 1. Knowing Your NGS Upstream: Alignment and Variants March 27, 2013 Gabe Rudy, Vice President of Product Development
  • 2. Use the Questions pane in your GoToWebinar window Questions during the presentation
  • 3. Goals  What I Assume About You - Some experience with NGS technology - Not a command line bioinformatician by day; not afraid of technical terms  What You Will Learn - A healthy skepticism when looking at NGS data - What to expect/not expect from core labs or upstream sequencing service providers - Reading pile-ups in a genome browser and spotting high quality vs sketchy variants  What You Won’t Learn - Interpreting biological significance of variants - One true way to do secondary analysis I won’t do this… hopefully!
  • 4. My Background  Golden Helix - Founded in 1998 - Genetic association software - Analytic services - Hundreds of users worldwide - Over 700 customer citations in scientific journals  Products I Build with My Team - SNP & Variation Suite (SVS) - SNP, CNV, NGS tertiary analysis - Import and deal with all flavors of upstream data - GenomeBrowse - Visualization of everything with genomic coordinates. All standardized file formats. - RNA-Seq Pipeline - Expression profiling bioinformatics
  • 5. Agenda Why You Should Care About Your Upstream? Service Provider Deliverables: CEPH Trio Example 2 3 4 A Drink from the Bioinformatics Firehose Background and Definitions1 Applications That Require Special Upstream Analysis5
  • 6. Agenda Why You Should Care About Your Upstream? Service Provider Deliverables: CEPH Trio Example 2 3 4 A Drink from the Bioinformatics Firehose Background and Definitions1 Applications That Require Special Upstream Analysis5
  • 7. NGS Analysis Primary Analysis Secondary Analysis Tertiary Analysis “Sense Making”  Analysis of hardware generated data, on-machine real-time stats.  Production of sequence reads and quality scores  QA and clipping/filtering reads  Alignment/Assembly of reads  Recalibrating, de-duplication, variant calling on aligned reads  QA and filtering of variant calls  Annotation and filtering of variants  Multi-sample integration  Visualization of variants in genomic context  Experiment-specific inheritance/population analysis
  • 8. Primary Analysis: I’d like some AGCT’s please  Standardized on producing FASTQ - AGCT or N - Quality scores - Pair of files for paired end  Happens on machine for desktop sequencers - Ion Torrent processing microwell detectors - MiSeq doing optic processing of flowcell - PacBio processing optics of ZMW  HiSeq 2000/2500 - Requires off-machine base-calling - Can “call bases” with Illumina software on raw data collected tile by tile
  • 9. Assembly vs Alignment  De Novo Genome Assembly - Very difficult for large genomes to get to “finished” genome quality (traditionally done with Sanger). - Short reads will get you to contigs sizes of ~10-100Kb range. - Need long reads (PacBio) or restriction maps optical mapping (OpGen) to make chromosomal sized contigs  Alignment - Aligning to finished (or draft) genomes that is considered “reference” - Allows for some differences, but not too many between your reads and the reference
  • 10. The Human Reference Sequence  Genome Reference Consortium (GRCh37) - Feb 2009, previous was NCBI36 March 2006 - 9 alt loci and 187 patches (11 patch releases)  Supercontigs: Large unplaced contigs - Some localized to chr level and some unknown  Does not include a Mitochondrial reference - UCSC hg19 includes older NCBI 36 MT - 1000 genomes project using revised Cambridge Reference Sequence (rCRS) - Provide “g1k” reference: includes rCRS, Human herpesvirus 4 type 1, supercontigs and “decoy” sequence  v38 genome coming this summer: - Incorporate all patches into the reference - Some allele fixes to have reference match major
  • 11. Example Patch  The tiling path in GRCh37 switched in the middle of ABO gene resulting in a reference protein not present in humans.  Patch adjusts tile path and fixes the problem.  All patches will be incorporated into GRCh38, due this summer. Until then, all alignment is done against unpatched reference.
  • 12. Single Nucleotide Variants (i.e. SNVs or SNPs)  Single base substitution from reference  Note that “reference” is not always the “major” allele  “Multi-allelic” sites have more than 2 cataloged alleles Gholson Lyon, 2012
  • 13. Small Insertions/Deletions  Generally defined as being < 150bp (often much shorter)  Frameshift insertions/deletions important “loss of function” class of variants - Although InDels divisible by three are “in-frame” when in coding region  Hard to call consistently. Poor concordance between algorithms.  Where to call an InDel in a homopolymer? - GTTTAC - GTTTTAC - 01234567 - How do you describe the insertion? Ins of T at 5? Or ins of T at 1? - CGI in their v1 pipeline preferred calling insertion at end, others at beginning, now always at beginning  MNP – Can also be called differently
  • 14. Copy Number Variants  Required WGS - CNVs > 10kb pretty accurate. - 1kb to 10kb problematic.  Detecting Deletions - Can see coverage drop to near zero - Harder to pinpoint breakpoint - Possible false positives in low- mappability regions  Amplifications - Can see coverage jump - False positives due sample prep or sequence artifacts  Need “baseline,” look at Log Ratio - Somatic detection uses normal tissues - Can have control population Venter vs Watson WGS CNV-seq 64kbp gain in DDAH1 Gene of NA12878
  • 15. Structural Variants  Looking for: - Balanced rearrangements - Inversions - Translocations - Complex  Signals to detect SV: - Paired-end mappings too big (deletion) - Depth of coverage - Split-read mapping  Translocations can result in “fusion” genes. - For example BCR-ABL fusion gene central in pathogenesis certain leukemias.
  • 16. Example 1kb Inversion (intron of APP)
  • 17. Tertiary Analysis – “Sense Making”  Detecting Known Clinically Relevant Variants - Use targeted gene panels. Amplicons or custom capture. - Look for carrier status or present of pathogenic or PGx variants  Rare, Functional Variant Search and Interpretation - Rare Mendelian Diseases - Clinical Diagnostic: Ending Diagnostic Odyssey - Looking for rare variants of functional consequence to a known phenotype - Exome sequencing common, but whole genome has proponents - Trios often used for looking at inheritance of putative variants (compound hets)  Population Studies - Like NHLBI or others studying complex disease - Often looking at “variant burden” over genes between cases/controls  Driver Somatic Variant Identification - Looking for variants in tumor samples but not matched normal - Not just SNPs and InDels, but CNVs and SVs
  • 18. Not just DNA… but still DNA sequencing  RNA-Seq - Align to “transcriptome”, but often do analysis with reference genome coordinates and reads “gapped” over introns they span in their spliced form - Using read counts to approximate relative abundance of RNA in sample - Compare relative abundance between groups - Discover new transcribed genes or alternative splicing  ChIP-Seq - Measure sites and intensities of various proteins binding to DNA - ENCODE project used to catalog TFBS and other functional elements  Methyl-Seq - Get sequences only with epigenetic methylation mark - Run peak identification and intensity to look at relative levels of methylation
  • 19. Agenda Why You Should Care About Your Upstream? Service Provider Deliverables: CEPH Trio Example 2 3 4 A Drink from the Bioinformatics Firehose Background and Definitions1 Applications That Require Special Upstream Analysis5
  • 20. The Promise  Both in research and clinical care, NGS is powering discoveries making impactful diagnoses  Desktop sequencers and gene panels much more economical than gene-by-gene hunts  Exomes have lead to many rare disease diagnoses and affordably assay rare functional variants  Whole genomes have led to clinical success stories and promise to be instrumental to our understanding of complex disease genetics  Barrier to entry is lower than ever
  • 21. Things That Can Confound Your Experiment Library preparation errors Sequencing errors Analysis errors  PCR amplification point mutations (e.g. TruSeq protocol, amplicons)  Emultion PCR amplification point mutations (454, Ion Torrent and SOLiD)  Bridge amplification errors (Illumina)  Chimera generation (particularly during amplicon protocols)  Sample contamination  Amplification errors associated with high or low GC content  PCR duplicates  Base miscalls due to low signal  InDel errors (particular PacBio)  Short homopolymer associated InDels (Ion Torrent PGM)  Post-homopolymeric tract SNPs (Illumina) and/or read-through problems  Associated with inverted repeats (Illumina)  Specific motifs particularly with older Illumina chemistry  Calling variants without sufficient reads mapping  Bad mapping (incorrectly placed read)  Correctly placed read but InDels misaligned  Multi-mapping to repeat/paralogous regions  Sequence contamination e.g. adaptors  Error in reference sequence  Alignment to ends of contigs in draft assemblies  Incorrect trimming of reads, aligning adaptors  Inclusion of PCR duplicates Nick Loman: Sequencing data: I want the truth! (You can’t handle the truth!) Qual et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012 Jul
  • 22. Your Choice of Technologies, Sometimes… Platform Illumina MiSeq Ion Torrent PGM Ion Torrent Proton PacBio RS Illumina HiSeq 2000 Instrument Cost* $125 K $50 K $150K $695 K $654 K Sequence yield per run 1.5-2Gb 20-50 Mb on 314 chip, 100-200 Mb on 316, 1Gb on 318 10Gb on PI, 30GB on PII (Mid 2013) 100 Mb 600Gb Sequencing cost per Gb* $502 $500 (318 chip) $70 (PI chip) $2000 $41 Run Time 27 hours*** 2 hours 3 hours 2 hours 11 days Reported Accuracy Mostly > Q30 Mostly Q20 Claimed >Q30 <Q10 Mostly > Q30 Observed Raw Error Rate 0.80 % 1.71 % Probably ~1% 12.86 % 0.26 % Read length up to 150 bases ~200 bp 100bp (200bp PII) Average 1500 bases up to 150 bases Paired reads Yes Yes Yes No Yes Insert size up to 700 bases up to 250 bases up to 250 bases up to 10 kb up to 700 bases Typical DNA requirements 50-1000 ng 100-1000 ng 100-1000 ng ~1 μg 50-1000 ng Applications Targeted Targeted Exomes, RNA-Seq Assembly, Validation Exomes, Genomes, RNA Qual et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012 Jul 29
  • 24. Agenda Why You Should Care About Your Upstream? Service Provider Deliverables: CEPH Trio Example 2 3 4 A Drink from the Bioinformatics Firehose Background and Definitions1 Applications That Require Special Upstream Analysis5
  • 25. Agenda Why You Should Care About Your Upstream? Service Provider Deliverables: CEPH Trio Example 2 3 4 A Drink from the Bioinformatics Firehose Background and Definitions1 Applications That Require Special Upstream Analysis5  File formats  Popular tools  QA Filtering  Visualization
  • 26. FASTQ  Contains 3 things per read: - Sequence identifier (unique) - Sequence bases [len N] - Base quality scores [len N]  Often “gzip” compressed (fq.gz)  If not demultiplexed, first 4 or 6bp is the “barcode” index. Used to split lanes out by sample.  Filtering may include: - Removing adapters & primers - Clip poor quality bases at ends - Remove flagged low-quality reads @HWI-ST845:4:1101:16436:2254#0/1 CAAACAGGCATGCGAGGTGCCTTTGGAAAGCCCCAGGGCACTGTGGCCAG + Y[SQORPMPYRSNP_][_babBBBBBBBBBBBBBBBBBBBBBBBBBB
  • 27. SAM/BAM  Spec defined by samtools author Heng Li, aka Li H, aka lh3.  SAM is text version (easy for any program to output)  BAM is binary/compressed version with indexing support  Alignment in terms of code of matches, insertions, deletions, gaps and clipping  Can have any custom flags set by analysis program (and many do) Key Fields  Chr, position  Mapping quality  CIGAR  Name/position of mate  Total template length  Sequence  Quality
  • 28. VCF  Specification defined by the 1000 genomes group (now v4.1)  Commonly compressed indexed with bgzip/tabix (allows for reading directly by a Genome Browser)  Contains arbitrary data per “site” (INFO fields) and per sample  Single-Sample VCF: - Contains only the variants for the sample.  Multi-Sample VCF: - Whenever one sample has a variant, all samples get a “genotype” (often “ref”)  Caveat: - VCF requires a reference base be specified. Leaving insertions to be “encoded” 1bp differently than they are annotated ##fileformat=VCFv4.0 ##fileDate=20090805 ##source=myImputationProgramV3.1 ##reference=1000GenomesPilot-NCBI36 ##phasing=partial ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Sa ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth" ##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequen ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral All ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membershi ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 members ##FILTER=<ID=q10,Description="Quality below 10"> ##FILTER=<ID=s50,Description="Less than 50% of samples have d ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Q ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype
  • 29. Aligners  BWA (also by Li H) - Most prolifically used for genome alignment - BWA-SW version geared for long reads (>100bp) - Supports aligning with insertions/deletions to reference  Bowtie (John Hopkins, part of “Tuxedo” suite) - Very fast, used commonly in RNA-Seq workflows - Version 1 did not support “gapped” alignments - Bowtie2 supports local gapped, longer reads  Novoalign, Eland, SOAP, MAQ, - Seed and expand strategy  TopHat, SHRiMP, STAR, Gmap - Specifically designed for ESTs  Most improved by paired-end (mate-pairs)
  • 30. InDel Re-Alignment  Place, then realign “de Novo” - Each read aligned independently by global aligner. - May have different preference of how to handle “gaps” to reference.  Local Re-Aligners for InDels - Pindel - GATK  Important areas still problematic: - CpG islands - Promoter and 5′-UTR regions of the genome reference CAATC realignment CAATC read1 CA-TC ----> CA-TC read2 C-ATC CA-TC Weixin Wang. Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions. Scientific Reports 1, Article number: 55 AUC (area under the curve) comparison for different genetic regions.
  • 31. Variant Callers  Samtools - “mpileup” command computes BAQ, preforms local realignment - Many filters can be applied to get high-quality variants  GATK - More than just a variant caller, but UnifiedGenotyper is widely used - Also provides pre-calling tools like local InDel realignment and quality score recalibration  Custom tools specific to platform: - CASAVA includes a variant caller for illumna whole-genome data - Ion Torrent has a caller that handles InDels better for their tech
  • 32. Quality Score Recalibration DePristo (2011) A framework for variation discovery …. Nature Genetics 43(5) 491
  • 34. Getting to High Confidence Variants  Hard filters versus heuristic based statistics  <10bp considered threshold for “low coverage”  Quality score recalibration Unfiltered Provided RD>10 & GQ>20 Exonic SNPs 98621 89132 65009 19365 InDels 8141 7800 6503 428 Ts/Tv 2.36 2.45 2.54 3.26 Mendel Errors 234 202 46 3 Gabe Trio
  • 35. Ti/Tv  Ts/Tv ratio can measure true biological ratio of mutation types versus sequence error: - Random seq errors: 2/4 or 0.5 - Genome-wide: ~2.0-2.1 - Exome capture: ~2.5-2.8 - Coding: ~3.0-3.3  Divergent too far than this indicates random sequence errors biasing the number. DePristo (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43(5) 491
  • 36. Visualization  Genome browsers: - Validate variant calls - Look at gene annotations, problematic regions, population catalogs - Compare samples where no variant called  Free Genome Browsers: - IGV - Popular desktop by Broad - UCSC - Web-based, most extensive annotations - GenomeBrowse - Designed to be publication ready - Smooth zoom and navigation
  • 37. Updates in Software Can Introduce Bugs  Found 8K phantom variants in my “final” 23AndMe exome
  • 38. [My 23andMe “Buggy” Variant and Interpretation Example]
  • 39. Agenda Why You Should Care About Your Upstream? Service Provider Deliverables: CEPH Trio Example 2 3 4 A Drink from the Bioinformatics Firehose Background and Definitions1 Applications That Require Special Upstream Analysis5
  • 40. Recent Blog Post http://blog.goldenhelix.com/?p=1725 The State of NGS Variant Calling: Don’t Panic!!
  • 41. From which sequencing provider(s) do you receive your upstream data from? POLL:
  • 42. Complete Genomics  Different: - Sequencing technology - Alignment/variant calling algorithms - File formats  But high quality: - MNPs, Indels - CNV/SV calls  Whole genome only  Also provide tumor/normal pair analysis  Being acquired by BGI, some question their sustainability
  • 43. Complete Genomics Deliverables  Summary statistics  “var” and “masterVar” files - Can be converted to VCF - Some tools (like SVS) can import them directly  Evidence files - Can be converted to BAM  CNV, SV calls in text files - CNV: Chr1:85980000-86006000 2.06 4x gain, covers DDAH1 - SV: Chr21:27374158-27374699 common inversion
  • 44. Illumina Genome Network  Standardized sequencing and analysis, but multiple labs may be contracted service provider.  30x whole genomes - SNPs, InDels, CNVs, SVs - Concordance with SNP array (provided) - Summary report  Illumina provided tools used - CASAVA toolkit with ELAND aligner  Also provide Tumor/Normal pair - Somatic SNVs and InDels identified by looking at the tumor/normal together
  • 45. Your Local Core Lab – Or 23andMe Exome Pilot!  Research core labs often use a BWA+GATK pipeline - Especially for exomes  Deliverables: - VCF with SNVs, InDels - BAM  Tools for CNV/SV calling less standardized - Not commonly attempted with exomes
  • 46. CEPH Trio  The “benchmark” trio. - Child, female NA12878 may be the most sequenced cell line - Father NA12891 and Mother NA12892  Whole Genome Data for Trio - CGI with v2 pipeline - IGN WGS at 30x, 100bp PE - “Core Lab” BWA + GATK Best Practices on 100bp PE  Concordance and Comparisons - Lets interactively review examples where these three service providers differ and how.
  • 53. De Novo Mutations in Trio
  • 54. GQ and DP of Shared de Novo Mutations
  • 55. [deNovo and SV/CNV of NA12878 trio]
  • 56. Agenda Why You Should Care About Your Upstream? Service Provider Deliverables: CEPH Trio Example 2 3 4 A Drink from the Bioinformatics Firehose Background and Definitions1 Applications That Require Special Upstream Analysis 5
  • 57. Applications That Require Special Upstream Analysis  MHC Region  Somatic Variant Calling  RNA-Seq  Alu and other repeats  Phased variants and complex MNP  Moving to a new reference genome
  • 58. Somatic Sniper and Friends  Complete Genomics and IGN provide secondary alignment specific to tumor/normal pairs.  Do variant calling with on BAMs on pair in conjunction  SomaticSniper approach:  Covered by at least 3 reads  Consensus quality of at least 20  Called a SNP in the tumor sample with SNP quality of at least 20  Maximum mapping quality of at least 40  No high-quality predicted indel within 10 bp  No more than 2 other SNVs called within 10 bp  Not in dbSNP (non-cancer dbSNP)  LOH filter (germline is het and tumor is homozygous)
  • 59. What’s Next? Download www.goldenhelix.com The Analysis and Interpretation of My DTC 23andMe Exome blog.goldenhelix.com Killer App Assembly and Alignment Short Courses
  • 60. Questions? Use the Questions pane in your GoToWebinar window