SlideShare a Scribd company logo
1 of 26
Genome in a Bottle Consortium
Progress Update
January 27, 2014
Justin Zook, Marc Salit, and the Genome in a
Bottle Consortium
Whole Genome RMs vs.
Current Validation Methods
• Sanger confirmation
– Limited by number of sites (and sometimes it’s wrong)

• High depth NGS confirmation
– May have same systematic errors

• Genotyping microarrays
– Limited to known (easier) variants
– Problems with neighboring “complex” variants, duplications

• Mendelian inheritance
– Can’t account for some systematic errors

• Simulated data
– Generally not very representative of errors in real data

• Ti/Tv
– Varies by region of genome, and only gives overall statistic
2
Goals for Data to Accompany RM
• ~0 false positive AND false negative calls in
confident regions
• Include as much of the genome as possible in
the confident regions (i.e., don’t just take the
intersection)
• Avoid bias towards any particular platform
– take advantage of strengths of each platform

• Avoid bias towards any particular
bioinformatics algorithms
3
Integrate 12 14 Datasets from 5
platforms

4
Integration of Data to
Form Highly Confident Genotype Calls
Candidate variants

Find all possible variant sites

Concordant variants

Find concordant sites across multiple datasets

Find characteristics
of bias

Identify sites with atypical characteristics signifying
sequencing, mapping, or alignment bias

Arbitrate using
evidence of bias

For each site, remove datasets with decreasingly atypical
characteristics until all datasets agree

Confidence Level

Even if all datasets agree, identify them as uncertain if
few have typical characteristics, or if they fall in known
segmental duplications, SVs, or long repeats
5
Verification of “Highly Confident”
Genotype accuracy
• Sanger sequencing
– 100% accuracy but only 100s of sites

• X Prize Fosmid sequencing
– Sometimes call only part of a complex variant

• Microarrays
– Differences appear to be FP or FN in arrays

• Broad 250bp HaplotypeCaller
– Very highly concordant

• Platinum genomes pedigree SNPs
– Some systematic errors are inherited; different representations
of complex variants

• Real Time Genomics SNPs and indels
– Some interesting sites called by RTG complex caller
6
GCAT – Interactive Performance
Metrics
• NIST is working with
GCAT to use our highly
confident variant calls
• Assess performance of
many combinations of
mappers and variant
callers
• www.bioplanet.com/gc
at

Improvement of FreeBayes over 1 year with indels

7
Why do calls differ from our highly
confident genotypes?
Apparent False Positives
• Platform-specific systematic
sequencing errors for SNPs
• Analysis-specific
• Difficult to map regions
• Indels in long
homopolymers

Apparent False Negatives
• Different complex variant
representation
• Near indels
• Inside repeats

8
Complex variants have multiple correct
unphased representations
BWA

T
insertion

CGTools

Ref:

FP indels

TCTCT
insertion

Traditional
comparison

0.38%
(610)

100%
(915)

6.5%
(733)

Comparison
with
realignment

ssaha2

Novoalign

FP SNPs FP MNPs

0.15%
(249)

4.2%
(38)

2.6%
(298)

• ~225,000 highly confident
variants are within 10bp of
another variant
• FPs and FNs are significantly
enriched for complex variants
• RTG vcfeval can fix this issue!
9
Reasons we exclude regions from highconfidence set
Reasons we exclude regions from highconfidence set
Structural variant analytical approach
Depth of coverage (DOC)
Control-FREEC
CnD
Paired-end mapping (PEM)
Breakdancer
Split read (SR)
Pindel
Assembly based (AS)
Velvet
ABySS
Combination
Genome-STRiP

SVMerge

List of
structural
variant calls
Validation parameters for each SV
• Coverage (mean and standard deviation)
• Paired-end distance/insert size (mean and
standard deviation)
• # of discordant paired-ends
• Soft clipping of the reads (mean and standard
deviation)
• Mapping quality (mean and standard deviation)
• # of heterozygous and homozygous SNP
genotype calls
Challenges with assessing
performance
• All variant types are not
equal
• All regions of the genome
are not equal
– Homopolymers, STRs, dupli
cations
– Can be similar or different
in different genomes

• Labeling difficult variants
as uncertain leads to
higher apparent accuracy
when assessing
performance
• Genotypes fall in 3+
categories (not
positive/negative)
– standard diagnostic
accuracy measures not
well posed
15
Pedigree calls
• RTG and Illumina Platinum
Genomes working on this
• Sequence
NA12878, husband, and 11
children to identify high
confidence variants
– Identify cross-over events
– Determine if genotypes are
consistent with inheritance

• Should we integrate these
with the NIST high-confidence
genotypes?
• Should we find larger families
for future genomes?
• See afternoon presentations!

Source: Mike Eberle, Illumina

16
Pedigree Calls in Uncertain Regions
GIAB Characterization of pilot RM
•
•
•
•

NIST – 300x 150x150bp HiSeq (from 6 vials)
NIST – 100x 75bp ECC SOLiD 5500W
Illumina – 50x 100x100bp HiSeq
Complete Genomics – Normal and LFR (nonRM)
• Garvan Institute – Illumina exome
• NCI – Ion Proton whole genome
• INOVA – Infinium SNP/CNV array
Homogeneity and Stability
Homogeneity
• Multiplex First and last vial
– 3 libraries x 33x HiSeq each

• Multiplex 4 Random vials
– 2 libraries x 12.5x HiSeq each

• Compare variability due to:
–
–
–
–
–
–

vial
library
day
flow cell
lane
sampling

• Run PFGE on each vial for size

Stability
• Run PFGE to detect DNA
degradation
• Freeze-thaw 2 and 5 times
• Vortex for 10s
• 4°C for 2 and 8 weeks
• 37°C for 2 and 8 weeks
FTP site and Amazon S3
• NCBI is hosting fastq, bam, and vcf files on the
giab ftp site
• These data are mirrored to Amazon S3, so we
encourage you to take advantage of this!
Pilot Reference Material
• High-confidence calls are available on the ftp
site and are already being used
• NIST plans to release this as a NIST Reference
Material in the next couple months
Future Directions
• Characterize more
“difficult” regions/variants
• Structural variants
• Compare to pedigree calls
• Examine potentially
clinically relevant
regions/variants in RMs
• Use long-read technologies
–
–
–
–
–

Moleculo
CG LFR
PacBio
BioNano Genomics
future technologies??

• Use glia/platypus to realign
reads to candidate variants

• Analyze interlaboratory
study data
• Characterize PGP genomes
–
–
–
–

Ashkenazim trio
son in Asian trio
DNA at NIST in Jan-Feb 2014
Volunteers to sequence?

• Select future genomes
• Tumor-normal?
Topic #1: Moving beyond the easy
regions/variants
Presentations
• Emerging Technologies
–
–
–
–

PacBio
Complete Genomics LFR
Moleculo
BioNano Genomics

• Structural Variants
– Bina Technologies

Topics
• Structural Variants
• Phasing
• Validation
• Where should we set the
threshold(s) for confidence?
Topic #2: Cancer and Future Genomes
Cancer
• Spike-ins
• Mixtures of normal cell lines
• Tumor-normal cell line pair
• Transriptome controls

Priorities for Future Genomes
• Diverse ancestry groups
• Larger families
• Recruitment with consent
for commercialization
• How many genomes?
• Should the parents be NIST
Reference Materials, or only
the child?
Working Group Questions
RM Selection & Design
• Spike-in controls
• FFPE
• Commercial RMs
• ABRF interlaboratory study
• Should we prioritize one or
two genomes?

RM Characterization
• Production mode for new
trios
– Pilot was characterized by
Illumina, SOLiD, Ion
Proton, and Complete
Genomics
– What resources should we
invest in measurements for
each new family?
Working Group Questions
Bioinformatics
• Storing data/pipelines
– Suggestions for ftp structure
– Data submission/accessioning
process
– Data model for genomic data
– Archiving pipelines and
reproducible research

• GRCh38
• How to use pedigree calls for pilot
genome?
• Clones for targeted regions (hard
regions if not whole genome)
• In which difficult regions should
we focus our characterization?

Performance Metrics
• Target audience
• Requirements for user
interface
– Establishing truth set(s)
– Inputs/Outputs
– Visualization

• Integration with GeT-RM

More Related Content

What's hot

RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variantsDenis C. Bauer
 
Nida ws neale_seq_data_gen
Nida ws neale_seq_data_genNida ws neale_seq_data_gen
Nida ws neale_seq_data_genFonareerat
 
Giab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarkingGiab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarkingGenomeInABottle
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review Paper
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review PaperApplication of Molecular Markers SNP and DArT in Plant Breeding: A Review Paper
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review PaperJournal of Agriculture and Crops
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqTimothy Tickle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2Dan Gaston
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim D. Pruitt
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewSean Davis
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizeAnn Loraine
 

What's hot (20)

171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
Nida ws neale_seq_data_gen
Nida ws neale_seq_data_genNida ws neale_seq_data_gen
Nida ws neale_seq_data_gen
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Giab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarkingGiab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarking
 
Rna seq
Rna seq Rna seq
Rna seq
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review Paper
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review PaperApplication of Molecular Markers SNP and DArT in Plant Breeding: A Review Paper
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review Paper
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 

Similar to 140127 GIAB update and NIST high-confidence calls

Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878GenomeInABottle
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
Aug2014 giab status update and wg charge
Aug2014 giab status update and wg chargeAug2014 giab status update and wg charge
Aug2014 giab status update and wg chargeGenomeInABottle
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsGenomeInABottle
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slidesGenomeInABottle
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studiesFOODCROPS
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_posterGenomeInABottle
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
140127 rm selection wg summary
140127 rm selection wg summary140127 rm selection wg summary
140127 rm selection wg summaryGenomeInABottle
 
Molecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptxMolecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptxTrilokMandal2
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.QIAGEN
 

Similar to 140127 GIAB update and NIST high-confidence calls (20)

Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Aug2014 giab status update and wg charge
Aug2014 giab status update and wg chargeAug2014 giab status update and wg charge
Aug2014 giab status update and wg charge
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Genotyping in Breeding programs
Genotyping in Breeding programsGenotyping in Breeding programs
Genotyping in Breeding programs
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slides
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
140127 rm selection wg summary
140127 rm selection wg summary140127 rm selection wg summary
140127 rm selection wg summary
 
Molecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptxMolecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptx
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
 

More from GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020GenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGenomeInABottle
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGenomeInABottle
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyGenomeInABottle
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GenomeInABottle
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphsGenomeInABottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normalGenomeInABottle
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccsGenomeInABottle
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seqGenomeInABottle
 

More from GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 

Recently uploaded

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

140127 GIAB update and NIST high-confidence calls

  • 1. Genome in a Bottle Consortium Progress Update January 27, 2014 Justin Zook, Marc Salit, and the Genome in a Bottle Consortium
  • 2. Whole Genome RMs vs. Current Validation Methods • Sanger confirmation – Limited by number of sites (and sometimes it’s wrong) • High depth NGS confirmation – May have same systematic errors • Genotyping microarrays – Limited to known (easier) variants – Problems with neighboring “complex” variants, duplications • Mendelian inheritance – Can’t account for some systematic errors • Simulated data – Generally not very representative of errors in real data • Ti/Tv – Varies by region of genome, and only gives overall statistic 2
  • 3. Goals for Data to Accompany RM • ~0 false positive AND false negative calls in confident regions • Include as much of the genome as possible in the confident regions (i.e., don’t just take the intersection) • Avoid bias towards any particular platform – take advantage of strengths of each platform • Avoid bias towards any particular bioinformatics algorithms 3
  • 4. Integrate 12 14 Datasets from 5 platforms 4
  • 5. Integration of Data to Form Highly Confident Genotype Calls Candidate variants Find all possible variant sites Concordant variants Find concordant sites across multiple datasets Find characteristics of bias Identify sites with atypical characteristics signifying sequencing, mapping, or alignment bias Arbitrate using evidence of bias For each site, remove datasets with decreasingly atypical characteristics until all datasets agree Confidence Level Even if all datasets agree, identify them as uncertain if few have typical characteristics, or if they fall in known segmental duplications, SVs, or long repeats 5
  • 6. Verification of “Highly Confident” Genotype accuracy • Sanger sequencing – 100% accuracy but only 100s of sites • X Prize Fosmid sequencing – Sometimes call only part of a complex variant • Microarrays – Differences appear to be FP or FN in arrays • Broad 250bp HaplotypeCaller – Very highly concordant • Platinum genomes pedigree SNPs – Some systematic errors are inherited; different representations of complex variants • Real Time Genomics SNPs and indels – Some interesting sites called by RTG complex caller 6
  • 7. GCAT – Interactive Performance Metrics • NIST is working with GCAT to use our highly confident variant calls • Assess performance of many combinations of mappers and variant callers • www.bioplanet.com/gc at Improvement of FreeBayes over 1 year with indels 7
  • 8. Why do calls differ from our highly confident genotypes? Apparent False Positives • Platform-specific systematic sequencing errors for SNPs • Analysis-specific • Difficult to map regions • Indels in long homopolymers Apparent False Negatives • Different complex variant representation • Near indels • Inside repeats 8
  • 9. Complex variants have multiple correct unphased representations BWA T insertion CGTools Ref: FP indels TCTCT insertion Traditional comparison 0.38% (610) 100% (915) 6.5% (733) Comparison with realignment ssaha2 Novoalign FP SNPs FP MNPs 0.15% (249) 4.2% (38) 2.6% (298) • ~225,000 highly confident variants are within 10bp of another variant • FPs and FNs are significantly enriched for complex variants • RTG vcfeval can fix this issue! 9
  • 10. Reasons we exclude regions from highconfidence set
  • 11. Reasons we exclude regions from highconfidence set
  • 12. Structural variant analytical approach Depth of coverage (DOC) Control-FREEC CnD Paired-end mapping (PEM) Breakdancer Split read (SR) Pindel Assembly based (AS) Velvet ABySS Combination Genome-STRiP SVMerge List of structural variant calls
  • 13.
  • 14. Validation parameters for each SV • Coverage (mean and standard deviation) • Paired-end distance/insert size (mean and standard deviation) • # of discordant paired-ends • Soft clipping of the reads (mean and standard deviation) • Mapping quality (mean and standard deviation) • # of heterozygous and homozygous SNP genotype calls
  • 15. Challenges with assessing performance • All variant types are not equal • All regions of the genome are not equal – Homopolymers, STRs, dupli cations – Can be similar or different in different genomes • Labeling difficult variants as uncertain leads to higher apparent accuracy when assessing performance • Genotypes fall in 3+ categories (not positive/negative) – standard diagnostic accuracy measures not well posed 15
  • 16. Pedigree calls • RTG and Illumina Platinum Genomes working on this • Sequence NA12878, husband, and 11 children to identify high confidence variants – Identify cross-over events – Determine if genotypes are consistent with inheritance • Should we integrate these with the NIST high-confidence genotypes? • Should we find larger families for future genomes? • See afternoon presentations! Source: Mike Eberle, Illumina 16
  • 17. Pedigree Calls in Uncertain Regions
  • 18. GIAB Characterization of pilot RM • • • • NIST – 300x 150x150bp HiSeq (from 6 vials) NIST – 100x 75bp ECC SOLiD 5500W Illumina – 50x 100x100bp HiSeq Complete Genomics – Normal and LFR (nonRM) • Garvan Institute – Illumina exome • NCI – Ion Proton whole genome • INOVA – Infinium SNP/CNV array
  • 19. Homogeneity and Stability Homogeneity • Multiplex First and last vial – 3 libraries x 33x HiSeq each • Multiplex 4 Random vials – 2 libraries x 12.5x HiSeq each • Compare variability due to: – – – – – – vial library day flow cell lane sampling • Run PFGE on each vial for size Stability • Run PFGE to detect DNA degradation • Freeze-thaw 2 and 5 times • Vortex for 10s • 4°C for 2 and 8 weeks • 37°C for 2 and 8 weeks
  • 20. FTP site and Amazon S3 • NCBI is hosting fastq, bam, and vcf files on the giab ftp site • These data are mirrored to Amazon S3, so we encourage you to take advantage of this!
  • 21. Pilot Reference Material • High-confidence calls are available on the ftp site and are already being used • NIST plans to release this as a NIST Reference Material in the next couple months
  • 22. Future Directions • Characterize more “difficult” regions/variants • Structural variants • Compare to pedigree calls • Examine potentially clinically relevant regions/variants in RMs • Use long-read technologies – – – – – Moleculo CG LFR PacBio BioNano Genomics future technologies?? • Use glia/platypus to realign reads to candidate variants • Analyze interlaboratory study data • Characterize PGP genomes – – – – Ashkenazim trio son in Asian trio DNA at NIST in Jan-Feb 2014 Volunteers to sequence? • Select future genomes • Tumor-normal?
  • 23. Topic #1: Moving beyond the easy regions/variants Presentations • Emerging Technologies – – – – PacBio Complete Genomics LFR Moleculo BioNano Genomics • Structural Variants – Bina Technologies Topics • Structural Variants • Phasing • Validation • Where should we set the threshold(s) for confidence?
  • 24. Topic #2: Cancer and Future Genomes Cancer • Spike-ins • Mixtures of normal cell lines • Tumor-normal cell line pair • Transriptome controls Priorities for Future Genomes • Diverse ancestry groups • Larger families • Recruitment with consent for commercialization • How many genomes? • Should the parents be NIST Reference Materials, or only the child?
  • 25. Working Group Questions RM Selection & Design • Spike-in controls • FFPE • Commercial RMs • ABRF interlaboratory study • Should we prioritize one or two genomes? RM Characterization • Production mode for new trios – Pilot was characterized by Illumina, SOLiD, Ion Proton, and Complete Genomics – What resources should we invest in measurements for each new family?
  • 26. Working Group Questions Bioinformatics • Storing data/pipelines – Suggestions for ftp structure – Data submission/accessioning process – Data model for genomic data – Archiving pipelines and reproducible research • GRCh38 • How to use pedigree calls for pilot genome? • Clones for targeted regions (hard regions if not whole genome) • In which difficult regions should we focus our characterization? Performance Metrics • Target audience • Requirements for user interface – Establishing truth set(s) – Inputs/Outputs – Visualization • Integration with GeT-RM

Editor's Notes

  1. ----- Meeting Notes (5/28/13 17:05) -----ask heng for decoy