SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
SMaSH
a Benchmarking Toolkit
for Variant Calling
SMaSH and GIAB:
A Good Match
High overlap with features in the Performance Metrics Specifications
doc.
Many of the features not currently supported are ones we'd like to
integrate.
About me
Worked at the UC Berkeley AMPlab for about a year
Currently the primary SMaSH developer
Starting a CS PhD at Berkeley in Programming Languages this fall
About this talk
SMaSH as it is now
SMaSH in the future
SMaSH and GIAB
SMaSH as it is
now
SMaSH
Project out of the AMP-X group at UC Berkeley
Talwalkar et al., 2014, Bioinformatics
smash.cs.berkeley.edu
Initial goal
Create a unified way of benchmarking germline variant calling pipelines.
SMaSH components
Codebase for comparing VCF callsets
Reads and ground truth datasets
Metrics for accuracy and computational performance
Codebase
For benchmarking purposes, we compare a predicted callset against a
ground truth callset
Comparing two predicted callsets works exactly the same.
Variant Classification
SNPs
Indel (less than 50 base pairs)
Structural variants
Evaluation
SNPs and indels are strictly evaluated.
Structural variants are evaluated on:
Same type (insertion/deletion/other)
Length same as true variant within specified tolerance
Position same as true variant within specified tolerance
Accuracy Metrics
Evaluate variants as true positive, false positive, false negative
Evaluate accuracy of genotyping
Error bars
Calculated on confidence in ground truth calls
Choose some upper bound on ground truth call error rate based on
validation methology
E.g., 2 out of every 1000 SNPs is wrong.
Use this error rate to calculate upper/lower bounds on precision and
recall.
The VCF format is
ambiguous!
SMaSH addresses this problem with two strategies:
Normalization
Rescue
Guiding principle: metrics should never be worse after
normalization/rescue than they were without them.
Normalization
A single variant may be plausibly placed in many different positions but
describe the same change.
For example, we normalize this variant:
First, we remove the longest proper suffix from the ref and alt alleles.
Then, we "slide" the variants by adding a base from the reference to the
head and removing a base from the tail, until the last bases on both
alleles are no longer the same.
Rescue
The same underlying haplotype can be represented by different sets of
variants.
True callset
Predicted callset
Rescue Algorithm
For every false negative, we attempt rescue:
Build up a window around the variant positive for the true and
predicted callsets
For all sets of non-overlapping variants, expand the underlying
haplotypes for the variants within those windows.
If the haplotypes match, mark all false negatives/false positives as
true positives.
Rescue Example
Outputs
Statistics, including counts for all categories, in plain text, TSV and
JSON formats
Calculations for precision and recall, including error bars
VCF containing variants from both callsets, annotated with the callset
they came from and their categorization (TP/FP/FN/rescued)
Where is SMaSH headed?
Global Alliance for Genomics &
Health
ga4gh.org
The benchmarking task force includes:
Illumina, Amazon, Google
UC Berkeley, UC Santa Cruz, NIST
Development continues by
GA4GH
Chief maintainers will be Kelly Westbrooks and Cassie Doll (Google).
Feature Roadmap
New variant types: complex variants, compound heterozygous variants,
etc.
Phasing evaluation
Better handling of known false positives
SMaSH and GIAB
Try it and let us know what you
think!
git clone https://github.com/amplab/smash.git
Complete documentation available at smash.cs.berkeley.edu
Post feedback at the Google Group smash-benchmarking
Code contributions
Open source and BSD-licensed; pull requests and issues very welcome
github.com/amplab/smash
Datasets
The SMaSH paper proposed eight datasets, including synthetic, sampled
human, and mouse.
Other data to use as ground truth?
NIST pedigree calls for NA12878
the Illumina Platinum Genome
Others?
Interpretation
of results
Tools for Downstream Analysis?
Visualizations?
Compatibility with genome browsers?
Other?
Thanks!
jnewcomb@eecs.berkeley.edu
smash.cs.berkeley.edu
github.com/amplab/smash

Más contenido relacionado

Similar a Aug2014 smash performance metrics tool

RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
Toyin23
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
butest
 
assignment2.doc
assignment2.docassignment2.doc
assignment2.doc
butest
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
Kushaal Singla
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
atmapandey
 
The Java Learning Kit Chapter 6 – Arrays Copyrigh.docx
The Java Learning Kit Chapter 6 – Arrays Copyrigh.docxThe Java Learning Kit Chapter 6 – Arrays Copyrigh.docx
The Java Learning Kit Chapter 6 – Arrays Copyrigh.docx
arnoldmeredith47041
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
pathsproject
 

Similar a Aug2014 smash performance metrics tool (20)

RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Blast
BlastBlast
Blast
 
Dependent Types for Cryptography Implementations
Dependent Types for Cryptography ImplementationsDependent Types for Cryptography Implementations
Dependent Types for Cryptography Implementations
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
 
assignment2.doc
assignment2.docassignment2.doc
assignment2.doc
 
Optimizing Near-Synonym System
Optimizing Near-Synonym SystemOptimizing Near-Synonym System
Optimizing Near-Synonym System
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
 
The Java Learning Kit Chapter 6 – Arrays Copyrigh.docx
The Java Learning Kit Chapter 6 – Arrays Copyrigh.docxThe Java Learning Kit Chapter 6 – Arrays Copyrigh.docx
The Java Learning Kit Chapter 6 – Arrays Copyrigh.docx
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Data Analysis project "TITANIC SURVIVAL"
Data Analysis project "TITANIC SURVIVAL"Data Analysis project "TITANIC SURVIVAL"
Data Analysis project "TITANIC SURVIVAL"
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
ders 7.1 VAR.pptx
ders 7.1 VAR.pptxders 7.1 VAR.pptx
ders 7.1 VAR.pptx
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.ppt
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 

Más de GenomeInABottle

Más de GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Último

Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
mahaiklolahd
 

Último (20)

Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
 
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
 
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
 
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
 
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
 

Aug2014 smash performance metrics tool