SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Understanding sources of bias and
 error from a prospective Reference
 Material (NA12878)


Ryan Poplin, on behalf of the
Genome Sequencing and Analysis Group
Program in Medical and Population Genetics

August 16, 2012
NA12878 is a wonderful reference sample!

•  Unrestricted cell lines!
•  Extensive pedigree available!
•  Extensively sequenced and genotyped at the
   Broad and elsewhere!
  –  All Broad techs (both production and
     experimental)!
  –  Fosmids!
  –  Many library designs and sample prep
     protocols!
Our framework for variation discovery
                                                       !
                Phase 1: NGS data processing        Phase 2: Variant discovery and genotyping             Phase 3: Integrative analysis
                        Typically by lane                Typically multiple samples simultaneously but can be single sample alone

                                                            Sample 1             Sample N                Raw           Raw             Raw
               Input      Raw reads
                                                             reads                reads                 indels         SNPs            SVs




                                                                                                                  External data
                            Mapping
                                                                                                                                   Known
                                                                                                         Pedigrees
                                                                        SNPs                                                      variation

                                                                                                         Population                Known
                             Local                                                                        structure               genotypes
                          realignment


                                                                        Indels

                           Duplicate                                                                              Variant quality
                           marking                                                                                 recalibration



                                                                      Structural
                          Base quality                              variation (SV)                                     Genotype
                          recalibration                                                                               refinement




                        Analysis-ready                                                                            Analysis-ready
              Output                                                Raw variants
                            reads                                                                                    variants



DePristo, M., Banks, E., Poplin, R. et. al, (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. !
Lots of work required to turn raw sequencing
               reads into something that is useful!
       Phase 1:!
  NGS data processing!

Input     Raw reads
                          Desired	
  proper=es	
  of	
  analysis-­‐ready	
  reads:	
  

           Mapping
                           •  Unbiased	
  sampling	
  of	
  alleles	
  
                           •  Calibrated	
  mapping	
  quality	
  scores	
  
             Local
          realignment      •  Indels	
  have	
  correct	
  and	
  consistent	
  
                              alignment	
  in	
  reads	
  
           Duplicate
           marking         •  Duplicate	
  molecules	
  shouldn’t	
  count	
  as	
  
                              extra	
  evidence	
  for	
  event	
  
          Base quality
          recalibration    •  Calibrated	
  base	
  quality	
  scores	
  for	
  base	
  
                              subs=tu=ons,	
  base	
  inser=ons,	
  and	
  base	
  
Output
         Analysis-ready
             reads            dele=ons	
  
Indels	
  have	
  correct	
  and	
  consistent	
  alignment	
  in	
  reads	
  
                   through multiple sequence local realignment!
         Phase 1:!
    NGS data processing!
                                                                          Effect of MSA on alignments
                                                                                 NA12878, chr1:1,510,530-1,510,589
                                                         rs28782535
  Input      Raw reads
                                                             rs28783181                     rs28788974      rs34877486                           rs28788974




              Mapping




                Local
             realignment


                                               1,000 Genomes Pilot 2 data, raw MAQ alignments            1,000 Genomes Pilot 2 data, after MSA


              Duplicate
              marking




            Base quality
            recalibration




           Analysis-ready
 Output
               reads                           HiSeq data, raw BWA alignments                            HiSeq data, after MSA

                                                                                                                                                              5!
DePristo, M., Banks, E., Poplin, R. et. al, (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. !
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias

Más contenido relacionado

Destacado

140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMs
GenomeInABottle
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
GenomeInABottle
 
140127 measurements for rm characterization wg summary
140127 measurements for rm characterization wg summary140127 measurements for rm characterization wg summary
140127 measurements for rm characterization wg summary
GenomeInABottle
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WG
GenomeInABottle
 
140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls
GenomeInABottle
 
140127 bioinformatics wg summary
140127 bioinformatics wg summary140127 bioinformatics wg summary
140127 bioinformatics wg summary
GenomeInABottle
 
140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses
GenomeInABottle
 
Aug2013 Mike Snyder the genomics revolution and human health
Aug2013 Mike Snyder the genomics revolution and human healthAug2013 Mike Snyder the genomics revolution and human health
Aug2013 Mike Snyder the genomics revolution and human health
GenomeInABottle
 
Aug2013 reference material selection and design working group
Aug2013 reference material selection and design working groupAug2013 reference material selection and design working group
Aug2013 reference material selection and design working group
GenomeInABottle
 
NIST program to develop genomic reference materials
NIST program to develop genomic reference materialsNIST program to develop genomic reference materials
NIST program to develop genomic reference materials
GenomeInABottle
 
March 2013 NIST Reference Material Program and Data Integration
March 2013 NIST Reference Material Program and Data IntegrationMarch 2013 NIST Reference Material Program and Data Integration
March 2013 NIST Reference Material Program and Data Integration
GenomeInABottle
 
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
GenomeInABottle
 
Aug2013 performance metrics working group
Aug2013 performance metrics working groupAug2013 performance metrics working group
Aug2013 performance metrics working group
GenomeInABottle
 
Aug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browserAug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browser
GenomeInABottle
 
Aug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materialsAug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materials
GenomeInABottle
 

Destacado (19)

140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMs
 
Aug2014 working group report rm selection and design
Aug2014 working group report rm selection and designAug2014 working group report rm selection and design
Aug2014 working group report rm selection and design
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
 
140127 measurements for rm characterization wg summary
140127 measurements for rm characterization wg summary140127 measurements for rm characterization wg summary
140127 measurements for rm characterization wg summary
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WG
 
140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls
 
140127 bioinformatics wg summary
140127 bioinformatics wg summary140127 bioinformatics wg summary
140127 bioinformatics wg summary
 
140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses
 
Mason u41 grant figures
Mason u41 grant figuresMason u41 grant figures
Mason u41 grant figures
 
Aug2013 Mike Snyder the genomics revolution and human health
Aug2013 Mike Snyder the genomics revolution and human healthAug2013 Mike Snyder the genomics revolution and human health
Aug2013 Mike Snyder the genomics revolution and human health
 
Aug2013 reference material selection and design working group
Aug2013 reference material selection and design working groupAug2013 reference material selection and design working group
Aug2013 reference material selection and design working group
 
NIST program to develop genomic reference materials
NIST program to develop genomic reference materialsNIST program to develop genomic reference materials
NIST program to develop genomic reference materials
 
March 2013 NIST Reference Material Program and Data Integration
March 2013 NIST Reference Material Program and Data IntegrationMarch 2013 NIST Reference Material Program and Data Integration
March 2013 NIST Reference Material Program and Data Integration
 
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Aug2013 performance metrics working group
Aug2013 performance metrics working groupAug2013 performance metrics working group
Aug2013 performance metrics working group
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
Aug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browserAug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browser
 
Aug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materialsAug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materials
 

Similar a Ryan Poplin - Sources of Bias

Automated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNAAutomated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNA
Luc Van Laer
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model library
laserxiong
 

Similar a Ryan Poplin - Sources of Bias (10)

An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Pathway analysis 2012
Pathway analysis 2012Pathway analysis 2012
Pathway analysis 2012
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome Project
 
Automated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNAAutomated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNA
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...
 
Next-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-SeqNext-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-Seq
 
Herring SNP Sneak Peak
Herring SNP Sneak PeakHerring SNP Sneak Peak
Herring SNP Sneak Peak
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model library
 
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
 

Más de GenomeInABottle

Más de GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Ryan Poplin - Sources of Bias

  • 1. Understanding sources of bias and error from a prospective Reference Material (NA12878) Ryan Poplin, on behalf of the Genome Sequencing and Analysis Group Program in Medical and Population Genetics August 16, 2012
  • 2. NA12878 is a wonderful reference sample! •  Unrestricted cell lines! •  Extensive pedigree available! •  Extensively sequenced and genotyped at the Broad and elsewhere! –  All Broad techs (both production and experimental)! –  Fosmids! –  Many library designs and sample prep protocols!
  • 3. Our framework for variation discovery ! Phase 1: NGS data processing Phase 2: Variant discovery and genotyping Phase 3: Integrative analysis Typically by lane Typically multiple samples simultaneously but can be single sample alone Sample 1 Sample N Raw Raw Raw Input Raw reads reads reads indels SNPs SVs External data Mapping Known Pedigrees SNPs variation Population Known Local structure genotypes realignment Indels Duplicate Variant quality marking recalibration Structural Base quality variation (SV) Genotype recalibration refinement Analysis-ready Analysis-ready Output Raw variants reads variants DePristo, M., Banks, E., Poplin, R. et. al, (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. !
  • 4. Lots of work required to turn raw sequencing reads into something that is useful! Phase 1:! NGS data processing! Input Raw reads Desired  proper=es  of  analysis-­‐ready  reads:   Mapping •  Unbiased  sampling  of  alleles   •  Calibrated  mapping  quality  scores   Local realignment •  Indels  have  correct  and  consistent   alignment  in  reads   Duplicate marking •  Duplicate  molecules  shouldn’t  count  as   extra  evidence  for  event   Base quality recalibration •  Calibrated  base  quality  scores  for  base   subs=tu=ons,  base  inser=ons,  and  base   Output Analysis-ready reads dele=ons  
  • 5. Indels  have  correct  and  consistent  alignment  in  reads   through multiple sequence local realignment! Phase 1:! NGS data processing! Effect of MSA on alignments NA12878, chr1:1,510,530-1,510,589 rs28782535 Input Raw reads rs28783181 rs28788974 rs34877486 rs28788974 Mapping Local realignment 1,000 Genomes Pilot 2 data, raw MAQ alignments 1,000 Genomes Pilot 2 data, after MSA Duplicate marking Base quality recalibration Analysis-ready Output reads HiSeq data, raw BWA alignments HiSeq data, after MSA 5! DePristo, M., Banks, E., Poplin, R. et. al, (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. !