SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
	
  
NA12878	
  Trio/Pedigree	
  Analysis	
  
Francisco	
  M.	
  De	
  La	
  Vega,	
  D.Sc.	
  
VP	
  Genome	
  Science	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Leveraging trio information
•  GiaB has selected reference materials in the form of father,
mother, offspring trios
•  The goal was to leverage the Mendelian inheritance patterns
to:
–  Identify variant genotype errors that are inconsistent with
Mendelian inheritance
–  Remove these errors from the reference baseline calls
•  However, if variant identification methods don't use directly
pedigree information and jointly analyze the trio alignments,
an opportunity to improve the genotype calls would be
missed
•  We focused on using the RTG Family caller to better leverage
the shared information in the trios and improve the call set,
whilst reducing Mendelian inconsistent genotype errors
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
C
AA
A
A
A
A
A
A
A
A
A
A
A A/Genotype:
A A
CA
C
C
A
A
A
A
A /Genotype: C
C
A /Genotype:
AC
C
C
|
||
Variant calling can be improved by jointly
analyzing related samples
Shared	
  
haplotypes	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
C
AA
A
A
A
A
A
A
A
A
A
A
A A/Genotype:
A A
CA
C
C
A
A
A
A
A /Genotype: C
C
A /Genotype:
AC
C
C
|
||
Variant calling can be improved by jointly
analyzing related samples
Mendelian	
  variant	
  
segregaJon	
  
Shared	
  
haplotypes	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Mendelian inconsistency
C
C
/Genotype: C
C
C
C
C
C
C
A
A
A
A A/Genotype: (Low QV)
C
A
A
A
A
A
A /Genotype:
C
C
C
A
A A
CC
AC
|
||
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Joint trio analysis corrects Mendelian errors 
C
/Genotype: C
C
C
C
C T
G
G
G
C T
C T
C T
C
A
A
A
A
A
Genotype:
C
A / C
G
G
G
G
G
G
G
A
A
A
Genotype: (Good QV)
C T
C T
C T
C T
A / C
G
G
G
A A
CC
AC
|
||
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
NA12878 calls from trio calling
•  Comparing offspring variants from singleton vs
pedigree calling
–  Both showing good quality metrics
•  Using family information more good calls can be
made and dubious calls are downgraded
NA12878	
  	
  
Call	
  set SNVs Indels MNPs
SNV	
  
Het/Hom Ti/Tv	
  
%	
  dbSNP	
  
(r129)
RTG	
  single	
   3,329,797 558,242 31,070 1.55	
   2.11	
   90.8%	
  
RTG	
  trio	
   3,363,619 595,030 33,686 1.57	
   2.11	
   90.4%	
  
GATK/VQSR	
  	
   3,263,289 610,837 N/A 1.51	
   2.09	
   91.7%	
  
Variant	
  StaBsBcs	
  
Data:	
  WGS	
  2x100bp	
  >50X	
  	
  Illumina	
  PlaJnum	
  Genomes	
  data	
  (ENA	
  Acc.	
  No.	
  ERP001960).	
  RTG	
  AVR	
  score	
  cut-­‐off	
  0.15;	
  GATK	
  v1.7	
  &	
  BWA	
  0.6.1.	
  
142,848	
  
68,000	
  
Family	
  
Singleton	
  
3,849,457	
  
NA12878
NA12891 NA12892
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
NA12878 vs reference datasets 
NA12878	
  	
  
Call	
  set
1kP	
  OMNI	
  
	
  Poly	
  (TP%)	
  
1kP	
  OMNI	
  	
  
Mono	
  (FP%)	
  
Get-­‐RM¶	
  
(TP	
  %)	
  
GiaB	
  
(TP%)	
  
GiaB-­‐BED	
  
(TP%)	
  
RTG	
  single	
   97.5%	
   0.10%	
   97.4%	
   N/A	
   N/A	
  
RTG	
  trio	
   97.5%	
   0.24%	
   97.0%	
   90.5%	
   94.1%	
  
GATK/VQSR	
  	
   97.8%	
   0.17%	
   87.8%	
   88.4%	
   92.5%	
  
§	
  RelaJve	
  to	
  dbSNP	
  137;	
  StaJsJcs	
  for	
  SNVs	
  only.	
  ¶Get-­‐RM	
  consistent	
  high-­‐quality	
  variants;	
  n=498	
  	
  
NA12878
NA12891 NA12892
–  1000 Genomes Illumina OMNI SNP array
•  Polymorphic sites – TP proxy
•  Monomorphic sites – FP proxy
–  Get-RM high confidence call set
–  GiaB high confidence calls in BED region
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
ROC Trio calls vs. GiAB baseline (BED)
RTG	
  snpsimeval	
  tool;	
  SNV/indel/MNP;	
  zygosity	
  match	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
ROC Trio calls vs. GiaB baseline
RTG	
  snpsimeval	
  tool;	
  SNV/indel/MNP;	
  zygosity	
  match	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
ROC Trio calls vs. CGI baseline
RTG	
  snpsimeval	
  tool;	
  SNV/indel/MNP;	
  zygosity	
  match	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Mendelian inconsistency errors
RTG family caller reduces Mendelian Inheritance Errors over 60X vs. RTG
singleton calling (over 70X vs. GATK/VQSR)
Log	
  Counts	
  of	
  MIE	
  
1	
  
10	
  
100	
  
1000	
  
10000	
  
100000	
  
1000000	
  
RTG	
  single	
   RTG	
  trio	
   GATK/VQSR	
  
335,625	
  
4,870	
  
351,904	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Pattern #1: Heterozygous variant
TrioCalling
NA12878
NA12892NA12891
NA12877
NA12889 NA12890
NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893
0/1
0/10/0
0/0 0/0 0/00/0 0/00/1 0/1 0/10/10/1
	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Segregation of heterozygous variants
0	
  
20,000	
  
40,000	
  
60,000	
  
80,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
SNV	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
SNV	
  
0	
  
100	
  
200	
  
300	
  
400	
  
500	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
MNP	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
MNP	
  
0	
  
2,000	
  
4,000	
  
6,000	
  
8,000	
  
10,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
indel	
  	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
indel	
  
0	
  
20,000	
  
40,000	
  
60,000	
  
80,000	
  
100,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
Variant	
  count	
  
#	
  of	
  	
  offspirng	
  segregaBng	
  
All	
  Variants	
  
SegregaJon	
  of	
  NA12878	
  heterozygous	
  variants	
  called	
  as	
  family,	
  GQ>50,	
  homozygous	
  reference	
  in	
  other	
  parent.	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Pattern #2: Homozygous-alt variant
TrioCalling
NA12878
NA12892NA12891
NA12877
NA12889 NA12890
NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893
0/1
1/10/0
0/1 0/1 0/10/10/10/1 0/1 0/1 0/1 0/1
	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Segregation of homo-alt variants
0	
  
20,000	
  
40,000	
  
60,000	
  
80,000	
  
100,000	
  
120,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
SNV	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
SNV	
  
0	
  
100	
  
200	
  
300	
  
400	
  
500	
  
600	
  
700	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
MNP	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
MNP	
  
0	
  
2,000	
  
4,000	
  
6,000	
  
8,000	
  
10,000	
  
12,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
indel	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
indel	
  
0	
  
20,000	
  
40,000	
  
60,000	
  
80,000	
  
100,000	
  
120,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
Variant	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
All	
  Variants	
  
SegregaJon	
  of	
  NA12878	
  homozygous	
  alternaJve	
  variants	
  called	
  as	
  family,	
  GQ>50,	
  homozygous	
  reference	
  in	
  other	
  parent.	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
False positive estimate by segregation
	
  GT	
  Type	
   All	
  variants	
   SNV	
   MNP	
   indel	
  
	
  Het	
  
TP	
  (10-­‐11)	
   123672	
   110262	
   693	
   12717	
  
FP	
  (1-­‐8)	
   1901	
   1000	
   47	
   854	
  
FP%	
   1.40%	
   0.88%	
   1.42%	
   5.67%	
  
	
  Homo-­‐alt	
  
TP	
  (2-­‐10)	
   373260	
   329642	
   2258	
   41360	
  
FP	
  (1,11)	
   4457	
   3672	
   36	
   749	
  
FP%	
   1.18%	
   1.10%	
   1.57%	
   1.78%	
  
	
  Overall	
  
TP	
   496932	
   439904	
   2951	
   54077	
  
FP	
   6358	
   4672	
   83	
   1603	
  
Overall	
  FP%	
   1.26%	
   1.05%	
   2.74%	
   2.88%	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Data imputation by pedigree caller
•  For genomes with no data use population priors
–  With care can iterate over offspring then each of parents
independently
–  Avoid exponential explosion so can do whole extended
family in one calling step
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Imputation of family members with no data
Simulated	
  data	
  	
  	
  
True	
  PosiJves	
  
False	
  PosiJves	
  
1	
  offspring	
  
2	
  offspring	
  
4	
  offspring	
  
4	
  offspring	
  +	
  father	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
ROC vs NA12878 imputed baseline
RTG	
  snpsimeval	
  tool;	
  SNV/indel/MNP;	
  zygosity	
  match	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
de novo mutation identification
Call	
  set
de	
  novo	
  
candidates
de	
  novo	
  
germline*	
  
de	
  novo	
  
somaBc*	
   TP/FP	
  
Singleton	
  calls 16,902 49	
  (100%)	
   941	
  (99%)	
   1:17	
  
Trio	
  calls 2,205 49	
  (100%)	
   941	
  (99%)	
   1:2.2	
  
de	
  novo	
  MutaBon	
  Accuracy	
  (NA12878)	
  
*SensiJvity	
  vs.	
  Conrad	
  et	
  al.	
  (2011)	
  validated	
  dataset	
  of	
  germline	
  and	
  somaJc	
  cell	
  line	
  de	
  novo	
  mutaJons.	
  
–  Uses the parental genomes to identify & score de novo
mutations in offspring
–  Greater than 7X improvement in precision to find de novo
mutations vs. naïve methods
NA12878
NA12891 NA12892
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Status
•  Working through the complete trio datasets for
producing joint pedigree calls for NA12878 trio
– Aiming for a trio call set and another that
includes full Platinum pedigree data 
– There is disproportionally more data for
NA12878 than her parents or offspring
•  Comprehensive segregation analysis that
includes all Mendelian patterns
•  Phasing analysis to identify variants that are
inconsistent with transmitted phases
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Issues
•  How to integrate pedigree calls with other data?
– Variants that segregate appropriately
candidates for inclusion in baseline
– Variants that don’t segregate appropriately
candidates for removal of baseline
– Improvement of baseline genotypes using
pedigree-based genotypes
•  Use of the imputed NA12878 baseline
•  Creation of a more inclusive baseline for ROC
curves to compare new methods and select
thresholds
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Acknowledgements
•  RTG team at Hamilton, New Zealand
–  Led by John Cleary, CTO
•  RTG team at San Bruno, CA
–  Sahar Malakshah
–  Minita Shah
–  Brian Hilbush
•  Michael Eberle, Illumina, Inc. – Platinum Data
•  Justin Zook, NIST
•  1000 Genomes Project
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  All	
  rights	
  reserved.	
  
US	
  Patent	
  7,640,256.	
  Other	
  patents	
  pending.	
  
For	
  research	
  use	
  only.	
  Not	
  for	
  diagnosJc	
  applicaJons.	
  

Más contenido relacionado

Similar a Aug2013 real time genomics trio pedigree analysis

Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
Ulises Urzua
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Candy Smellie
 
140127 measurements for rm characterization wg summary
140127 measurements for rm characterization wg summary140127 measurements for rm characterization wg summary
140127 measurements for rm characterization wg summary
GenomeInABottle
 
140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses
GenomeInABottle
 

Similar a Aug2013 real time genomics trio pedigree analysis (20)

20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics
 
Metro nome agbt-poster
Metro nome agbt-posterMetro nome agbt-poster
Metro nome agbt-poster
 
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
NAIMA method
NAIMA methodNAIMA method
NAIMA method
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
 
140127 measurements for rm characterization wg summary
140127 measurements for rm characterization wg summary140127 measurements for rm characterization wg summary
140127 measurements for rm characterization wg summary
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
 
2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
 
SNP
SNPSNP
SNP
 
Jan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincolnJan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincoln
 
140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses
 
DNA Profiling_HMD_2020.pptx
DNA Profiling_HMD_2020.pptxDNA Profiling_HMD_2020.pptx
DNA Profiling_HMD_2020.pptx
 
Assay Development in Digital PCR
Assay Development in Digital PCRAssay Development in Digital PCR
Assay Development in Digital PCR
 
Errorpredictorsmodeling
ErrorpredictorsmodelingErrorpredictorsmodeling
Errorpredictorsmodeling
 
Microsatellite
MicrosatelliteMicrosatellite
Microsatellite
 
A Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with HypertableA Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with Hypertable
 
Molecular marker
Molecular markerMolecular marker
Molecular marker
 
SNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping SolutionSNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping Solution
 

Más de GenomeInABottle

Más de GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Aug2013 real time genomics trio pedigree analysis

  • 1. ©  2013  Real  Time  Genomics,  Inc.       NA12878  Trio/Pedigree  Analysis   Francisco  M.  De  La  Vega,  D.Sc.   VP  Genome  Science  
  • 2. ©  2013  Real  Time  Genomics,  Inc.     Leveraging trio information •  GiaB has selected reference materials in the form of father, mother, offspring trios •  The goal was to leverage the Mendelian inheritance patterns to: –  Identify variant genotype errors that are inconsistent with Mendelian inheritance –  Remove these errors from the reference baseline calls •  However, if variant identification methods don't use directly pedigree information and jointly analyze the trio alignments, an opportunity to improve the genotype calls would be missed •  We focused on using the RTG Family caller to better leverage the shared information in the trios and improve the call set, whilst reducing Mendelian inconsistent genotype errors
  • 3. ©  2013  Real  Time  Genomics,  Inc.     C AA A A A A A A A A A A A A/Genotype: A A CA C C A A A A A /Genotype: C C A /Genotype: AC C C | || Variant calling can be improved by jointly analyzing related samples Shared   haplotypes  
  • 4. ©  2013  Real  Time  Genomics,  Inc.     C AA A A A A A A A A A A A A/Genotype: A A CA C C A A A A A /Genotype: C C A /Genotype: AC C C | || Variant calling can be improved by jointly analyzing related samples Mendelian  variant   segregaJon   Shared   haplotypes  
  • 5. ©  2013  Real  Time  Genomics,  Inc.     Mendelian inconsistency C C /Genotype: C C C C C C C A A A A A/Genotype: (Low QV) C A A A A A A /Genotype: C C C A A A CC AC | ||
  • 6. ©  2013  Real  Time  Genomics,  Inc.     Joint trio analysis corrects Mendelian errors C /Genotype: C C C C C T G G G C T C T C T C A A A A A Genotype: C A / C G G G G G G G A A A Genotype: (Good QV) C T C T C T C T A / C G G G A A CC AC | ||
  • 7. ©  2013  Real  Time  Genomics,  Inc.     NA12878 calls from trio calling •  Comparing offspring variants from singleton vs pedigree calling –  Both showing good quality metrics •  Using family information more good calls can be made and dubious calls are downgraded NA12878     Call  set SNVs Indels MNPs SNV   Het/Hom Ti/Tv   %  dbSNP   (r129) RTG  single   3,329,797 558,242 31,070 1.55   2.11   90.8%   RTG  trio   3,363,619 595,030 33,686 1.57   2.11   90.4%   GATK/VQSR     3,263,289 610,837 N/A 1.51   2.09   91.7%   Variant  StaBsBcs   Data:  WGS  2x100bp  >50X    Illumina  PlaJnum  Genomes  data  (ENA  Acc.  No.  ERP001960).  RTG  AVR  score  cut-­‐off  0.15;  GATK  v1.7  &  BWA  0.6.1.   142,848   68,000   Family   Singleton   3,849,457   NA12878 NA12891 NA12892
  • 8. ©  2013  Real  Time  Genomics,  Inc.     NA12878 vs reference datasets NA12878     Call  set 1kP  OMNI    Poly  (TP%)   1kP  OMNI     Mono  (FP%)   Get-­‐RM¶   (TP  %)   GiaB   (TP%)   GiaB-­‐BED   (TP%)   RTG  single   97.5%   0.10%   97.4%   N/A   N/A   RTG  trio   97.5%   0.24%   97.0%   90.5%   94.1%   GATK/VQSR     97.8%   0.17%   87.8%   88.4%   92.5%   §  RelaJve  to  dbSNP  137;  StaJsJcs  for  SNVs  only.  ¶Get-­‐RM  consistent  high-­‐quality  variants;  n=498     NA12878 NA12891 NA12892 –  1000 Genomes Illumina OMNI SNP array •  Polymorphic sites – TP proxy •  Monomorphic sites – FP proxy –  Get-RM high confidence call set –  GiaB high confidence calls in BED region
  • 9. ©  2013  Real  Time  Genomics,  Inc.     ROC Trio calls vs. GiAB baseline (BED) RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  • 10. ©  2013  Real  Time  Genomics,  Inc.     ROC Trio calls vs. GiaB baseline RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  • 11. ©  2013  Real  Time  Genomics,  Inc.     ROC Trio calls vs. CGI baseline RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  • 12. ©  2013  Real  Time  Genomics,  Inc.     Mendelian inconsistency errors RTG family caller reduces Mendelian Inheritance Errors over 60X vs. RTG singleton calling (over 70X vs. GATK/VQSR) Log  Counts  of  MIE   1   10   100   1000   10000   100000   1000000   RTG  single   RTG  trio   GATK/VQSR   335,625   4,870   351,904  
  • 13. ©  2013  Real  Time  Genomics,  Inc.     Pattern #1: Heterozygous variant TrioCalling NA12878 NA12892NA12891 NA12877 NA12889 NA12890 NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893 0/1 0/10/0 0/0 0/0 0/00/0 0/00/1 0/1 0/10/10/1    
  • 14. ©  2013  Real  Time  Genomics,  Inc.     Segregation of heterozygous variants 0   20,000   40,000   60,000   80,000   1   2   3   4   5   6   7   8   9   10   11   SNV  count   #  of  offspring  segregaBng   SNV   0   100   200   300   400   500   1   2   3   4   5   6   7   8   9   10   11   MNP  count   #  of  offspring  segregaBng   MNP   0   2,000   4,000   6,000   8,000   10,000   1   2   3   4   5   6   7   8   9   10   11   indel    count   #  of  offspring  segregaBng   indel   0   20,000   40,000   60,000   80,000   100,000   1   2   3   4   5   6   7   8   9   10   11   Variant  count   #  of    offspirng  segregaBng   All  Variants   SegregaJon  of  NA12878  heterozygous  variants  called  as  family,  GQ>50,  homozygous  reference  in  other  parent.  
  • 15. ©  2013  Real  Time  Genomics,  Inc.     Pattern #2: Homozygous-alt variant TrioCalling NA12878 NA12892NA12891 NA12877 NA12889 NA12890 NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893 0/1 1/10/0 0/1 0/1 0/10/10/10/1 0/1 0/1 0/1 0/1    
  • 16. ©  2013  Real  Time  Genomics,  Inc.     Segregation of homo-alt variants 0   20,000   40,000   60,000   80,000   100,000   120,000   1   2   3   4   5   6   7   8   9   10   11   SNV  count   #  of  offspring  segregaBng   SNV   0   100   200   300   400   500   600   700   1   2   3   4   5   6   7   8   9   10   11   MNP  count   #  of  offspring  segregaBng   MNP   0   2,000   4,000   6,000   8,000   10,000   12,000   1   2   3   4   5   6   7   8   9   10   11   indel  count   #  of  offspring  segregaBng   indel   0   20,000   40,000   60,000   80,000   100,000   120,000   1   2   3   4   5   6   7   8   9   10   11   Variant  count   #  of  offspring  segregaBng   All  Variants   SegregaJon  of  NA12878  homozygous  alternaJve  variants  called  as  family,  GQ>50,  homozygous  reference  in  other  parent.  
  • 17. ©  2013  Real  Time  Genomics,  Inc.     False positive estimate by segregation  GT  Type   All  variants   SNV   MNP   indel    Het   TP  (10-­‐11)   123672   110262   693   12717   FP  (1-­‐8)   1901   1000   47   854   FP%   1.40%   0.88%   1.42%   5.67%    Homo-­‐alt   TP  (2-­‐10)   373260   329642   2258   41360   FP  (1,11)   4457   3672   36   749   FP%   1.18%   1.10%   1.57%   1.78%    Overall   TP   496932   439904   2951   54077   FP   6358   4672   83   1603   Overall  FP%   1.26%   1.05%   2.74%   2.88%  
  • 18. ©  2013  Real  Time  Genomics,  Inc.     Data imputation by pedigree caller •  For genomes with no data use population priors –  With care can iterate over offspring then each of parents independently –  Avoid exponential explosion so can do whole extended family in one calling step
  • 19. ©  2013  Real  Time  Genomics,  Inc.     Imputation of family members with no data Simulated  data       True  PosiJves   False  PosiJves   1  offspring   2  offspring   4  offspring   4  offspring  +  father  
  • 20. ©  2013  Real  Time  Genomics,  Inc.     ROC vs NA12878 imputed baseline RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  • 21. ©  2013  Real  Time  Genomics,  Inc.     de novo mutation identification Call  set de  novo   candidates de  novo   germline*   de  novo   somaBc*   TP/FP   Singleton  calls 16,902 49  (100%)   941  (99%)   1:17   Trio  calls 2,205 49  (100%)   941  (99%)   1:2.2   de  novo  MutaBon  Accuracy  (NA12878)   *SensiJvity  vs.  Conrad  et  al.  (2011)  validated  dataset  of  germline  and  somaJc  cell  line  de  novo  mutaJons.   –  Uses the parental genomes to identify & score de novo mutations in offspring –  Greater than 7X improvement in precision to find de novo mutations vs. naïve methods NA12878 NA12891 NA12892
  • 22. ©  2013  Real  Time  Genomics,  Inc.     Status •  Working through the complete trio datasets for producing joint pedigree calls for NA12878 trio – Aiming for a trio call set and another that includes full Platinum pedigree data – There is disproportionally more data for NA12878 than her parents or offspring •  Comprehensive segregation analysis that includes all Mendelian patterns •  Phasing analysis to identify variants that are inconsistent with transmitted phases
  • 23. ©  2013  Real  Time  Genomics,  Inc.     Issues •  How to integrate pedigree calls with other data? – Variants that segregate appropriately candidates for inclusion in baseline – Variants that don’t segregate appropriately candidates for removal of baseline – Improvement of baseline genotypes using pedigree-based genotypes •  Use of the imputed NA12878 baseline •  Creation of a more inclusive baseline for ROC curves to compare new methods and select thresholds
  • 24. ©  2013  Real  Time  Genomics,  Inc.     Acknowledgements •  RTG team at Hamilton, New Zealand –  Led by John Cleary, CTO •  RTG team at San Bruno, CA –  Sahar Malakshah –  Minita Shah –  Brian Hilbush •  Michael Eberle, Illumina, Inc. – Platinum Data •  Justin Zook, NIST •  1000 Genomes Project ©  2013  Real  Time  Genomics,  Inc.  All  rights  reserved.   US  Patent  7,640,256.  Other  patents  pending.   For  research  use  only.  Not  for  diagnosJc  applicaJons.