SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
NIST	
  Program	
  to	
  Develop	
  
Genomic	
  Reference	
  Materials	
  
      Jus<n	
  Zook	
  and	
  Marc	
  Salit	
  
Scope	
  of	
  NIST	
  work	
  
•  Human	
  Whole	
  Genome	
  RMs	
  
•  Synthe<c	
  DNA	
  constructs	
  
•  Microbial	
  Whole	
  Genome	
  RMs	
  
RM	
  Development	
  Process	
  
1.  Select	
  and	
  procure	
  materials	
  
2.  Characterize	
  materials	
  
3.  Process	
  and	
  integrate	
  data	
  from	
  mul<ple	
  
    plaMorms	
  
4.  Confirm	
  selected	
  genotypes	
  
5.  Write	
  Report	
  of	
  Analysis	
  
6.  Develop	
  methods	
  for	
  end	
  users	
  to	
  obtain	
  
    performance	
  metrics	
  from	
  the	
  materials	
  
Proposed	
  Timeline	
  for	
  Human	
  RMs	
  
Proposed	
  Timeline	
  for	
  Synthe<c	
  
                                                Structures	
  

Title                                                 2011         Effort   2012   2013   2014   2015   2
  1) Human RMs                                                     535w
         1.1) Select/Procure human DNA for RM                       32w
         1.2) **NIST receives packaged DNA for RM/SRM
         1.3) Develop bioinformatics pipeline for data              97w
              integration
         1.4) Human Primary Sequencing                             147w
         1.5) Human Homogeneity assessment                           8w
         1.6) Analyze homogeneity data and produce preliminary      10w
              SNP calls for RM
         1.7) Write human RM Report of Analysis                     10w
         1.8) Process Human RM for release                          24w
         1.9) **Human RM officially released
        1.10) Human Sequencing data integration                     25w
        1.11) Human Validation                                      20w
        1.12) Human other characterization methods                  48w
        1.13) Analyze validation data and refine sequencing calls    12w
        1.14) Develop pipeline for SVs and test                     40w
        1.15) Write Human SRM Report of Analysis                      8w
        1.16) Process Human SRM for release                         24w
        1.17) **Human SRM officially released
        1.18) Procure local data storage                            10w
        1.19) Procure Bioinformatics data analysis tools            10w
        1.20) Procure Automated sample prep instrumentation         10w
  2) Microbial RMs                                                 279w
         2.1) Select/Procure microbial DNA for RMs                  31w
         2.2) Microbial Primary Sequencing                         124w
         2.3) Microbial Homogeneity assessment                       6w
         2.4) Microbial Sequencing data integration                 40w
          2.4.1) Mapping/Alignment                                  10w
          2.4.2) Variant calling                                    12w
          2.4.3) Form consensus variant calls                       12w
Proposed	
  Characteriza<on	
  Methods	
  
          for	
  Whole	
  Genomes	
  
Whole	
  Genome	
  Sequencing	
               Other	
  
•  ABI	
  5500	
  (1kb,	
  6kb,	
  and	
      •  Genotyping	
  microarrays	
  
   10kb	
  mate-­‐pair	
  libraries)	
        •  Array	
  CGH	
  
•  Illumina	
                                 •  Targeted	
  sequencing	
  
•  Complete	
  Genomics	
                     •  Fosmid	
  sequencing?	
  
•  Upcoming	
  technologies?	
  	
            •  Op<cal	
  Mapping?	
  
     –  Ion	
  Proton?	
  	
  
     –  Oxford	
  Nanopore?	
  
                                                               Father	
        Mother	
  
•  3x	
  replica<on	
  of	
  sequencing	
  
   (3	
  library	
  preps)	
                      Husband	
            NA12878	
  


                                                     Son	
            Daughter	
  
Integra<on	
  of	
  Exis<ng	
  Data	
  to	
  Form	
  
     Consensus	
  Genotype	
  Calls	
  
                            Find	
  all	
  possible	
  variant	
  sites	
  



                       Find	
  sites	
  where	
  all	
  datasets	
  agree	
  


           Iden<fy	
  sites	
  with	
  atypical	
  characteris<cs	
  signifying	
  
                sequencing,	
  mapping,	
  or	
  alignment	
  bias	
  


       For	
  each	
  site,	
  remove	
  datasets	
  with	
  decreasingly	
  atypical	
  
                       characteris<cs	
  un<l	
  all	
  datasets	
  agree	
  


        Even	
  if	
  all	
  datasets	
  agree,	
  iden<fy	
  them	
  as	
  uncertain	
  if	
  
                              few	
  have	
  typical	
  characteris<cs	
  
Consensus	
  has	
  lower	
  FN	
  rate	
  than	
  
                                       individual	
  datasets	
  
                                                                            Illumina	
  Omni	
  SNP	
  Array	
  
                                                              Homozygous	
                                    Homozygous	
  
HiSeq	
  –	
  GATK	
  



                                                                                        Heterozygous	
                                             Uncertain	
  
                                                               Reference	
                                        Variant	
  
                                  Homozygous	
                                                          “FNs”	
  
                                   Reference/	
                  1.45M	
                 7.24k	
  (1.34%)	
   5.28k	
  (0.65%)	
                        N/A	
  
                                     No	
  Call	
               “FPs*”	
  
                                  Heterozygous	
               196	
  (0.03%)	
          411k	
  (60.7%)	
           133	
  (0.02%)	
                   N/A	
  
                                  Homozygous	
  
                                                               154	
  (0.02%)	
           150	
  (0.02%)	
          249k	
  (37.0%)	
                   N/A	
  
                                     Variant	
  
                                                                               Illumina	
  Omni	
  SNP	
  Array	
  
Integrated	
  Consensus	
  




                                                              Homozygous	
                                   Homozygous	
  
                                                                                        Heterozygous	
                                              Uncertain	
  
                                                               Reference	
                                          Variant	
  
                                  Homozygous	
                                                          “FNs”	
  
     Genotypes	
  




                                   Reference/	
                    1.45M	
                613	
  (0.09%)	
        977	
  (0.15%)	
                      N/A	
  
                                     No	
  Call	
                 “FPs*”	
  
                                  Heterozygous	
               241	
  (0.04%)	
          414k	
  (61.5%)	
           173	
  (0.03%)	
                   N/A	
  
                                  Homozygous	
                 152	
  (0.02%)	
            61	
  (0.01%)	
          249k	
  (36.9%)	
                   N/A	
  
                                     Variant	
  
                                    Uncertain	
               5458	
  (0.81%)	
          3421	
  (0.51%)	
          4808	
  (0.71%)	
                   N/A	
  

                              *	
  Note	
  that	
  most	
  or	
  all	
  of	
  the	
  puta<ve	
  FPs	
  seem	
  to	
  actually	
  be	
  FNs	
  on	
  the	
  microarray	
  
SNP	
  arrays	
  overesMmate	
  performance	
  
                                                                   Illumina	
  Omni	
  SNP	
  Array	
  
                                                        Homozygous	
                              Homozygous	
  
HiSeq	
  –	
  GATK	
  



                                                                             Heterozygous	
                                  Uncertain	
  
                                                         Reference	
                                  Variant	
  
                                    Homozygous	
                                            “FNs”	
  
                                     Reference/	
         1.45M	
            7.24k	
  (1.34%)	
   5.28k	
  (0.65%)	
             N/A	
  
                                       No	
  Call	
      “FPs*”	
  
                                    Heterozygous	
      196	
  (0.03%)	
      411k	
  (60.7%)	
      133	
  (0.02%)	
            N/A	
  
                                    Homozygous	
  
                                                        154	
  (0.02%)	
      150	
  (0.02%)	
       249k	
  (37.0%)	
           N/A	
  
                                       Variant	
  


                                                           Integrated	
  Consensus	
  Genotypes	
  
                                                        Homozygous	
                              Homozygous	
  
         HiSeq	
  –	
  GATK	
  




                                                                             Heterozygous	
                                  Uncertain	
  
                                                         Reference	
                                  Variant	
  
                                    Homozygous	
                                            “FNs”	
  
                                     Reference/	
           1.52M	
           157k	
  (4.68%)	
   30.3k	
  (0.90%)	
            4.17M	
  
                                       No	
  Call	
        “FPs”	
  
                                    Heterozygous	
       47	
  (0.00%)	
     1.90M	
  (56.4%)	
       34	
  (0.00%)	
      16.9k	
  (0.50%)	
  
                                    Homozygous	
         1	
  (0.00%)	
       298	
  (0.01%)	
      1.19M	
  (35.3%)	
     73.3k	
  (2.18%)	
  
                                       Variant	
  
Samtools	
  has	
  higher	
  FP	
  and	
  lower	
  FN	
  
                                                  than	
  GATK	
  
                                                             Integrated	
  Consensus	
  Genotypes	
  
HiSeq	
  –	
  samtools	
  



                                                         Homozygous	
                              Homozygous	
  
                                                                              Heterozygous	
                                  Uncertain	
  
                                                          Reference	
                                  Variant	
  
                                     Homozygous	
                                            “FNs”	
  
                                      Reference/	
          1.51M	
           49.6k	
  (1.47%)	
   6.74k	
  (0.20%)	
           3.93M	
  
                                        No	
  Call	
       “FPs”	
  
                                     Heterozygous	
      3141(0.09%)	
        2.00M	
  (59.6%)	
       74	
  (0.00%)	
      175k	
  (5.19%)	
  
                                     Homozygous	
                                                                           192k	
  (5.71%)	
  
                                                          21	
  (0.00%)	
      777	
  (0.02%)	
      1.21M	
  (36.0%)	
  
                                        Variant	
  


                                                            Integrated	
  Consensus	
  Genotypes	
  
                                                         Homozygous	
                              Homozygous	
  
          HiSeq	
  –	
  GATK	
  




                                                                              Heterozygous	
                                  Uncertain	
  
                                                          Reference	
                                  Variant	
  
                                     Homozygous	
                                            “FNs”	
  
                                      Reference/	
           1.52M	
           157k	
  (4.68%)	
   30.3k	
  (0.90%)	
            4.17M	
  
                                        No	
  Call	
        “FPs”	
  
                                     Heterozygous	
       47	
  (0.00%)	
     1.90M	
  (56.4%)	
       34	
  (0.00%)	
      16.9k	
  (0.50%)	
  
                                     Homozygous	
         1	
  (0.00%)	
       298	
  (0.01%)	
      1.19M	
  (35.3%)	
     73.3k	
  (2.18%)	
  
                                        Variant	
  
Performance	
  Metrics:	
  Characteris<cs	
  
                                           of	
  Mis-­‐calls	
  
                                                                                   Consensus	
  Genotypes	
  
                                                                Hom.	
  Ref.	
     Heterozygous	
          Hom.	
  Variant	
     Uncertain	
  
                 Heterozygous	
   Hom.	
  Ref./No	
  call	
  
HiSeq/GATK	
  
                 Hom.	
  Variant	
  




                                                                                     QUAL/Depth	
  of	
  Coverage	
  
                                                                                            Strand	
  Bias	
  
                                                                                                     .	
  .	
  .	
  
Challenges	
  with	
  assessing	
  
                     performance	
  
•  All	
  variant	
  types	
  are	
  not	
  equal	
  
•  Nearby	
  variants	
  are	
  ojen	
  difficult	
  to	
  align	
  
•  All	
  regions	
  of	
  the	
  genome	
  are	
  not	
  equal	
  
    –  Homopolymers,	
  STRs,	
  duplica<ons	
  
    –  Can	
  be	
  similar	
  or	
  different	
  in	
  different	
  genomes	
  
•  Labeling	
  difficult	
  variants	
  as	
  “uncertain”	
  in	
  the	
  
   Reference	
  Material	
  leads	
  to	
  higher	
  apparent	
  accuracy	
  
   when	
  assessing	
  performance	
  
•  Genotypes	
  fall	
  in	
  3+	
  categories	
  (not	
  posi<ve/nega<ve)	
  
•  It’s	
  important	
  to	
  consider	
  data	
  from	
  mul<ple	
  plaMorms	
  
   and	
  library	
  prepara<ons	
  when	
  characterizing	
  a	
  
   Reference	
  Material	
  

Más contenido relacionado

Destacado

A National Network of Biomedical Research Expertise
A National Network of Biomedical Research ExpertiseA National Network of Biomedical Research Expertise
A National Network of Biomedical Research ExpertiseManinder Kahlon
 
George Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGeorge Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGenomeInABottle
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16GenomeInABottle
 
Biomedical research
Biomedical researchBiomedical research
Biomedical researchAnjo Yumol
 
Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...Matt Stine
 
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...Matt Stine
 
I V I F2 F July 2005 Talk
I V I  F2 F  July 2005  TalkI V I  F2 F  July 2005  Talk
I V I F2 F July 2005 Talkbattagline
 
decentralization: a trend in biomedical research
decentralization: a trend in biomedical researchdecentralization: a trend in biomedical research
decentralization: a trend in biomedical researchBrian Bot
 
Making Biomedical Research More Like Airbnb
Making Biomedical Research More Like AirbnbMaking Biomedical Research More Like Airbnb
Making Biomedical Research More Like AirbnbPhilip Bourne
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Larry Smarr
 
Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterprisePhilip Bourne
 
Core Facility 2.0 - leveraging social media to enhance visibility
Core Facility 2.0 - leveraging social media to enhance visibilityCore Facility 2.0 - leveraging social media to enhance visibility
Core Facility 2.0 - leveraging social media to enhance visibilityRyan Duggan
 
Future of biomedical instrumentation
Future of biomedical instrumentationFuture of biomedical instrumentation
Future of biomedical instrumentationFida Fidai
 
Supporting the Scientists: Working as a research technician in a Core Service...
Supporting the Scientists: Working as a research technician in a Core Service...Supporting the Scientists: Working as a research technician in a Core Service...
Supporting the Scientists: Working as a research technician in a Core Service...Chris Willmott
 
Biomedical instrumentation PPT
Biomedical instrumentation PPTBiomedical instrumentation PPT
Biomedical instrumentation PPTabhi1802verma
 

Destacado (20)

A National Network of Biomedical Research Expertise
A National Network of Biomedical Research ExpertiseA National Network of Biomedical Research Expertise
A National Network of Biomedical Research Expertise
 
George Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGeorge Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait Data
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
 
Biomedical research
Biomedical researchBiomedical research
Biomedical research
 
Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...
 
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
 
I V I F2 F July 2005 Talk
I V I  F2 F  July 2005  TalkI V I  F2 F  July 2005  Talk
I V I F2 F July 2005 Talk
 
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
 
Clean Labs Training
Clean Labs TrainingClean Labs Training
Clean Labs Training
 
decentralization: a trend in biomedical research
decentralization: a trend in biomedical researchdecentralization: a trend in biomedical research
decentralization: a trend in biomedical research
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
Making Biomedical Research More Like Airbnb
Making Biomedical Research More Like AirbnbMaking Biomedical Research More Like Airbnb
Making Biomedical Research More Like Airbnb
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2
 
Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital Enterprise
 
Core Facility 2.0 - leveraging social media to enhance visibility
Core Facility 2.0 - leveraging social media to enhance visibilityCore Facility 2.0 - leveraging social media to enhance visibility
Core Facility 2.0 - leveraging social media to enhance visibility
 
HIE technical infrastructure
HIE technical infrastructureHIE technical infrastructure
HIE technical infrastructure
 
Future of biomedical instrumentation
Future of biomedical instrumentationFuture of biomedical instrumentation
Future of biomedical instrumentation
 
Supporting the Scientists: Working as a research technician in a Core Service...
Supporting the Scientists: Working as a research technician in a Core Service...Supporting the Scientists: Working as a research technician in a Core Service...
Supporting the Scientists: Working as a research technician in a Core Service...
 
Biomedical instrumentation PPT
Biomedical instrumentation PPTBiomedical instrumentation PPT
Biomedical instrumentation PPT
 

Similar a NIST program to develop genomic reference materials

Automated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNAAutomated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNALuc Van Laer
 
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...Eastern Pennsylvania Branch ASM
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectFundación Ramón Areces
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;applicationFyzah Bashir
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingFOODCROPS
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slidesGenomeInABottle
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...Borlaug Global Rust Initiative
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsgroovescience
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningReece Hart
 
15 molecular markers techniques
15 molecular markers techniques15 molecular markers techniques
15 molecular markers techniquesAVINASH KUSHWAHA
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingThomas Keane
 
Fundamentals of Fluorescence in situ Hybridization
Fundamentals of Fluorescence in situ Hybridization Fundamentals of Fluorescence in situ Hybridization
Fundamentals of Fluorescence in situ Hybridization Amartya Pradhan
 

Similar a NIST program to develop genomic reference materials (20)

Automated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNAAutomated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNA
 
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome Project
 
Church gmod2012 pt2
Church gmod2012 pt2Church gmod2012 pt2
Church gmod2012 pt2
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breeding
 
Whole Genome Analysis
Whole Genome AnalysisWhole Genome Analysis
Whole Genome Analysis
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slides
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traits
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
 
15 molecular markers techniques
15 molecular markers techniques15 molecular markers techniques
15 molecular markers techniques
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-Editing
 
Natasha de Vere - Plants Plenary
Natasha de Vere - Plants PlenaryNatasha de Vere - Plants Plenary
Natasha de Vere - Plants Plenary
 
Fundamentals of Fluorescence in situ Hybridization
Fundamentals of Fluorescence in situ Hybridization Fundamentals of Fluorescence in situ Hybridization
Fundamentals of Fluorescence in situ Hybridization
 
Mushroom breeding
Mushroom breedingMushroom breeding
Mushroom breeding
 

Más de GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGenomeInABottle
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020GenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGenomeInABottle
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGenomeInABottle
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyGenomeInABottle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 

Más de GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Último

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 

Último (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 

NIST program to develop genomic reference materials

  • 1. NIST  Program  to  Develop   Genomic  Reference  Materials   Jus<n  Zook  and  Marc  Salit  
  • 2. Scope  of  NIST  work   •  Human  Whole  Genome  RMs   •  Synthe<c  DNA  constructs   •  Microbial  Whole  Genome  RMs  
  • 3. RM  Development  Process   1.  Select  and  procure  materials   2.  Characterize  materials   3.  Process  and  integrate  data  from  mul<ple   plaMorms   4.  Confirm  selected  genotypes   5.  Write  Report  of  Analysis   6.  Develop  methods  for  end  users  to  obtain   performance  metrics  from  the  materials  
  • 4. Proposed  Timeline  for  Human  RMs  
  • 5. Proposed  Timeline  for  Synthe<c   Structures   Title 2011 Effort 2012 2013 2014 2015 2 1) Human RMs 535w 1.1) Select/Procure human DNA for RM 32w 1.2) **NIST receives packaged DNA for RM/SRM 1.3) Develop bioinformatics pipeline for data 97w integration 1.4) Human Primary Sequencing 147w 1.5) Human Homogeneity assessment 8w 1.6) Analyze homogeneity data and produce preliminary 10w SNP calls for RM 1.7) Write human RM Report of Analysis 10w 1.8) Process Human RM for release 24w 1.9) **Human RM officially released 1.10) Human Sequencing data integration 25w 1.11) Human Validation 20w 1.12) Human other characterization methods 48w 1.13) Analyze validation data and refine sequencing calls 12w 1.14) Develop pipeline for SVs and test 40w 1.15) Write Human SRM Report of Analysis 8w 1.16) Process Human SRM for release 24w 1.17) **Human SRM officially released 1.18) Procure local data storage 10w 1.19) Procure Bioinformatics data analysis tools 10w 1.20) Procure Automated sample prep instrumentation 10w 2) Microbial RMs 279w 2.1) Select/Procure microbial DNA for RMs 31w 2.2) Microbial Primary Sequencing 124w 2.3) Microbial Homogeneity assessment 6w 2.4) Microbial Sequencing data integration 40w 2.4.1) Mapping/Alignment 10w 2.4.2) Variant calling 12w 2.4.3) Form consensus variant calls 12w
  • 6. Proposed  Characteriza<on  Methods   for  Whole  Genomes   Whole  Genome  Sequencing   Other   •  ABI  5500  (1kb,  6kb,  and   •  Genotyping  microarrays   10kb  mate-­‐pair  libraries)   •  Array  CGH   •  Illumina   •  Targeted  sequencing   •  Complete  Genomics   •  Fosmid  sequencing?   •  Upcoming  technologies?     •  Op<cal  Mapping?   –  Ion  Proton?     –  Oxford  Nanopore?   Father   Mother   •  3x  replica<on  of  sequencing   (3  library  preps)   Husband   NA12878   Son   Daughter  
  • 7. Integra<on  of  Exis<ng  Data  to  Form   Consensus  Genotype  Calls   Find  all  possible  variant  sites   Find  sites  where  all  datasets  agree   Iden<fy  sites  with  atypical  characteris<cs  signifying   sequencing,  mapping,  or  alignment  bias   For  each  site,  remove  datasets  with  decreasingly  atypical   characteris<cs  un<l  all  datasets  agree   Even  if  all  datasets  agree,  iden<fy  them  as  uncertain  if   few  have  typical  characteris<cs  
  • 8. Consensus  has  lower  FN  rate  than   individual  datasets   Illumina  Omni  SNP  Array   Homozygous   Homozygous   HiSeq  –  GATK   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Reference/   1.45M   7.24k  (1.34%)   5.28k  (0.65%)   N/A   No  Call   “FPs*”   Heterozygous   196  (0.03%)   411k  (60.7%)   133  (0.02%)   N/A   Homozygous   154  (0.02%)   150  (0.02%)   249k  (37.0%)   N/A   Variant   Illumina  Omni  SNP  Array   Integrated  Consensus   Homozygous   Homozygous   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Genotypes   Reference/   1.45M   613  (0.09%)   977  (0.15%)   N/A   No  Call   “FPs*”   Heterozygous   241  (0.04%)   414k  (61.5%)   173  (0.03%)   N/A   Homozygous   152  (0.02%)   61  (0.01%)   249k  (36.9%)   N/A   Variant   Uncertain   5458  (0.81%)   3421  (0.51%)   4808  (0.71%)   N/A   *  Note  that  most  or  all  of  the  puta<ve  FPs  seem  to  actually  be  FNs  on  the  microarray  
  • 9. SNP  arrays  overesMmate  performance   Illumina  Omni  SNP  Array   Homozygous   Homozygous   HiSeq  –  GATK   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Reference/   1.45M   7.24k  (1.34%)   5.28k  (0.65%)   N/A   No  Call   “FPs*”   Heterozygous   196  (0.03%)   411k  (60.7%)   133  (0.02%)   N/A   Homozygous   154  (0.02%)   150  (0.02%)   249k  (37.0%)   N/A   Variant   Integrated  Consensus  Genotypes   Homozygous   Homozygous   HiSeq  –  GATK   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Reference/   1.52M   157k  (4.68%)   30.3k  (0.90%)   4.17M   No  Call   “FPs”   Heterozygous   47  (0.00%)   1.90M  (56.4%)   34  (0.00%)   16.9k  (0.50%)   Homozygous   1  (0.00%)   298  (0.01%)   1.19M  (35.3%)   73.3k  (2.18%)   Variant  
  • 10. Samtools  has  higher  FP  and  lower  FN   than  GATK   Integrated  Consensus  Genotypes   HiSeq  –  samtools   Homozygous   Homozygous   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Reference/   1.51M   49.6k  (1.47%)   6.74k  (0.20%)   3.93M   No  Call   “FPs”   Heterozygous   3141(0.09%)   2.00M  (59.6%)   74  (0.00%)   175k  (5.19%)   Homozygous   192k  (5.71%)   21  (0.00%)   777  (0.02%)   1.21M  (36.0%)   Variant   Integrated  Consensus  Genotypes   Homozygous   Homozygous   HiSeq  –  GATK   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Reference/   1.52M   157k  (4.68%)   30.3k  (0.90%)   4.17M   No  Call   “FPs”   Heterozygous   47  (0.00%)   1.90M  (56.4%)   34  (0.00%)   16.9k  (0.50%)   Homozygous   1  (0.00%)   298  (0.01%)   1.19M  (35.3%)   73.3k  (2.18%)   Variant  
  • 11. Performance  Metrics:  Characteris<cs   of  Mis-­‐calls   Consensus  Genotypes   Hom.  Ref.   Heterozygous   Hom.  Variant   Uncertain   Heterozygous   Hom.  Ref./No  call   HiSeq/GATK   Hom.  Variant   QUAL/Depth  of  Coverage   Strand  Bias   .  .  .  
  • 12. Challenges  with  assessing   performance   •  All  variant  types  are  not  equal   •  Nearby  variants  are  ojen  difficult  to  align   •  All  regions  of  the  genome  are  not  equal   –  Homopolymers,  STRs,  duplica<ons   –  Can  be  similar  or  different  in  different  genomes   •  Labeling  difficult  variants  as  “uncertain”  in  the   Reference  Material  leads  to  higher  apparent  accuracy   when  assessing  performance   •  Genotypes  fall  in  3+  categories  (not  posi<ve/nega<ve)   •  It’s  important  to  consider  data  from  mul<ple  plaMorms   and  library  prepara<ons  when  characterizing  a   Reference  Material