SlideShare una empresa de Scribd logo
1 de 73
Descargar para leer sin conexión
High-­‐Resolu,on	
  Views	
  of	
  
    Cancer	
  Genomes	
  
The	
  Central	
  Dogma	
  
+	
  
Your	
  Nature	
  Paper	
  
Our	
  First	
  Experiment	
  
Overview	
  of	
  BAC	
  in	
  the	
  Genome	
  
Sequencing	
  a	
  BAC	
  
Sequence	
  Coverage	
  
Repeats	
  
Repeats	
  
Repeats	
  are	
  not	
  created	
  equal	
  
Genomic	
  Sequencing	
  

   TargeFng	
  the	
  Exome	
  
    Long	
  oligos	
  synthesized	
  on	
  
     arrays	
  (DNA)	
  
    RNA	
  baits	
  synthesized	
  
     from	
  DNA	
  oligo	
  template	
  
    RNA	
  baits	
  hybridized	
  to	
  
     DNA	
  sequencing	
  library	
  
    Targets	
  captured	
  using	
  
     beads	
  and	
  bioFn-­‐labeled	
  
     baits	
  
    RNA	
  bait	
  degraded,	
  
     leaving	
  sequencing	
  library	
  
     enriched	
  for	
  target	
  regions	
  
Data	
  Flow	
  
    FASTQ	
  files	
  generated	
  by	
  Illumina	
  pipeline	
  
    Aligned	
  to	
  reference	
  genome	
  (hg18,	
  excluding	
  
     _random,	
  unmapped,	
  and	
  hap)	
  using	
  Novoalign	
  
         SAM/BAM	
  used	
  extensively	
  
    Follow	
  Broad	
  InsFtute	
  GATK	
  pipeline	
  for	
  exome	
  
     capture	
  
    Use	
  picard	
  java	
  library	
  for	
  quality	
  assessment	
  
    Processed	
  BAM	
  files	
  available	
  via	
  local	
  hZp	
  for	
  
     browsing	
  
Data	
  Pipeline....	
  
    Samtools	
  import	
  
    Samtools	
  sort	
  
    Picard	
  MarkDuplicates	
  
    GATK	
  Indel	
  Realignment	
  
    GATK	
  Quality	
  RecalibraFon	
  
    Picard	
  QC	
  metrics	
  
Realignment	
  around	
  Indels	
  
    The	
  problem	
  
               Aligners	
  align	
  each	
  read	
  independently	
  
               PotenFally	
  leads	
  to	
  increased	
  error	
  rates	
  around	
  
               indels	
  
    A	
  potenFal	
  soluFon	
  
               Locally	
  realign	
  reads	
  in	
  regions	
  that	
  might	
  
               harbor	
  an	
  indel	
  
               Goal	
  is	
  to	
  align	
  reads	
  overlying	
  indels	
  more	
  
               accurately,	
  reducing	
  errors	
  in	
  each	
  read	
  and,	
  in	
  
               turn,	
  reducing	
  SNV	
  call	
  error	
  rates	
  
Quality Recalibration
    Since most SNV callers will rely on quality scores to
     estimate error probabilities, having the best possible
     estimates for error rates is important
    Reported error rates from the Illumina sequencer
     generally reflect technical parameters of the base call
     process, but not other systematic biases
    Quality recalibration can include covariates to
     account for systematic biases
              Cycle count, dinucleotide context, original quality,
              and sample/library variables
Variant	
  Calling	
  and	
  EvaluaFon	
  

            A	
  developing	
  art	
  
Sequencing	
  Tumor/Normal	
  Pairs	
  
Good	
  SNP	
  
Suspect	
  Variant	
  
SomaFc	
  (tumor	
  only)	
  Variant	
  
Likely	
  False	
  PosiFve	
  (normal	
  only)	
  
LOH	
  
NCI60	
  Exome	
  Sequencing	
  

     No	
  Normals	
  Available!	
  
Variants	
  by	
  Genomic	
  LocaFon	
  
All	
  Coding	
  Variants	
  
Type	
  1:	
  in	
  dbSNP,	
  Type	
  2:	
  not	
  in	
  dbSNP	
  
Coding,	
  novel	
  (no	
  dbSNP)	
  
Copy	
  Number	
  from	
  Exomes	
  
Complete	
  Genome	
  Sequencing	
  

       Complete	
  Genomics	
  Data	
  
Data	
  
    Delivery	
  
         Via	
  USB	
  results	
  
    Storage	
  
         Sizes	
  are	
  LARGE	
  
                400GB	
  per	
  sample	
  as	
  delivered	
  with	
  raw	
  reads	
  included	
  
         Should	
  use	
  2-­‐locaFon	
  backed-­‐up	
  storage	
  
                Not	
  trivial	
  to	
  find	
  such	
  storage,	
  so	
  might	
  resort	
  to	
  mulFple	
  
                USB	
  drives	
  
         Minimize:	
  
                Data	
  movement	
  
                Keeping	
  mulFple	
  copies	
  indefinitely	
  
Breakdown	
  of	
  Data	
  Sizes	
  
Data	
  
    Delivery	
  
    Storage	
  
    Processing	
  
         Data	
  are	
  typically	
  tab-­‐delimited	
  text	
  files,	
  so	
  Excel	
  
          can	
  be	
  useful	
  for	
  examining	
  individual	
  small	
  files	
  
         Generally,	
  command-­‐line	
  tools	
  needed	
  
         MacOS	
  and	
  linux	
  only	
  supported	
  operaFng	
  
          systems,	
  but	
  Windows	
  might	
  work....	
  
         Some	
  analyses	
  (snpdiff)	
  require	
  large	
  memory	
  
Directory	
  Structure	
  
Workflows	
  
    Tumor/Normal	
  
         Copy	
  Number	
  
         Structural	
  Varia,on	
  
         Annotated	
  SomaFc	
  Variants	
  
    Germline	
  
         List	
  of	
  annotated	
  genotypes	
  per	
  individual,	
  
          summarized	
  into	
  a	
  single	
  file	
  that	
  can	
  be	
  used	
  for	
  
          filtering	
  
Germline	
  Workflow	
  
Germline	
  Workflow	
  
    Output	
  
    Future	
  direcFons	
  
         Be	
  “smarter”	
  about	
  inheritance	
  framework	
  
         Further	
  refinements	
  of	
  comparison	
  to	
  other	
  data	
  
          types	
  (exomes,	
  snp	
  arrays,	
  RNA-­‐seq)	
  
Tumor/Normal	
  Workflow	
  
Medvedev	
  et	
  al.,	
  Nature	
  2009	
  
Frequent	
  geneFc	
  alteraFons	
  in	
  three	
  criFcal	
  signalling	
  pathways.	
  




    The	
  Cancer	
  Genome	
  Atlas	
  Research	
  Network	
  Nature	
  000,	
  1-­‐8	
  (2008)	
  doi:10.1038/nature07385	
  
ChromaFn	
  




    ChromaFn	
  is	
  the	
  complex	
  of	
  protein	
  and	
  DNA	
  that	
  make	
  up	
  
     the	
  chromosomes.	
  	
  It	
  is	
  not	
  a	
  staFc	
  structure.	
  
    DNAse	
  is	
  an	
  enzyme	
  
     that	
  cuts	
  DNA	
  at	
  
     locaFons	
  where	
  DNA	
  is	
  
     accessible	
  
    These	
  “accessible”	
  
     regions	
  have	
  been	
  
     associated	
  with	
  open	
  
     chromaFn	
  
    Regions	
  of	
  open	
  
     chromaFn	
  are	
  
     necessary	
  for	
  
     transcripFonal	
  and	
  
     regulatory	
  machinery	
  to	
  
     have	
  access	
  to	
  gene	
  
     neighborhoods	
  and	
  
     facilitate	
  transcripFon	
  
DNAse	
  HypersensiFvity	
  
              Method	
  for	
  finding	
  regions	
  of	
  “open”	
  
               chromaFn	
  
              In	
  data	
  published	
  with	
  the	
  ENCODE	
  
               consorFum,	
  DNAse	
  hypersensiFve	
  (HS)	
  
               were	
  shown	
  to	
  be	
  correlated	
  with:	
  
                      Histone	
  modificaFon	
  
                      TranscripFon	
  start	
  sites	
  
                      Early	
  replicaFng	
  regions	
  
                      TranscripFon	
  factor	
  binding	
  sites	
  
                       (experimentally	
  determined	
  by	
  ChIP/chip,	
  
                       etc.)	
  
IdenFficaFon	
  and	
  analysis	
  of	
  funcFonal	
  elements	
  in	
  1%	
  of	
  the	
  human	
  genome	
  by	
  the	
  ENCODE	
  
pilot	
  project.	
  	
  The	
  ENCODE	
  ConsorFum.	
  	
  Nature,	
  2007.	
  
DNAse-­‐chip	
  Method	
  




Crawford,	
  G.E.,	
  Davis,	
  S.,	
  Scacheri,	
  P.C.,	
  Renaud,	
  G.,	
  Halawi,	
  M.J.,	
  Erdos,	
  M.R.,	
  Green,	
  R.,	
  
Meltzer,	
  P.S.,	
  Wolfsberg,	
  T.G.,	
  and	
  Collins,	
  F.S.	
  Nat	
  Methods,	
  2006	
  
DNAse-­‐Seq	
  Method	
  




Crawford,	
  G.E.,	
  Davis,	
  S.,	
  Scacheri,	
  P.C.,	
  Renaud,	
  G.,	
  Halawi,	
  M.J.,	
  Erdos,	
  M.R.,	
  Green,	
  R.,	
  
Meltzer,	
  P.S.,	
  Wolfsberg,	
  T.G.,	
  and	
  Collins,	
  F.S.	
  Nat	
  Methods,	
  2006	
  
DNAse	
  Sites	
  RelaFve	
  to	
  Genes	
  
DNAse	
  HS	
  Sites	
  and	
  Gene	
  Expression	
  




    DNAse	
  HS	
  sites	
  near	
  
     transcripFon	
  start	
  sites	
  
     are	
  associated	
  with	
  
     acFvely	
  transcribed	
  
     genes.	
  
Nucleosome	
  PosiFoning	
  
    Distances	
  between	
  sequences	
  
     in	
  non-­‐DNAse	
  HS	
  regions	
  have	
  
     an	
  oscillaFng	
  paZern	
  with	
  
     frequency	
  that	
  corresponds	
  to	
  
     a	
  single	
  turn	
  of	
  the	
  double-­‐
     helix	
  
    DNAse	
  is	
  known	
  to	
  cut	
  
     preferenFally	
  in	
  the	
  minor	
  
     groove,	
  which	
  is	
  exposed	
  every	
  
     10.4	
  bases	
  when	
  wrapped	
  
     around	
  a	
  nucleosome	
  
    A	
  nucleosome	
  is	
  wrapped	
  by	
  
     147	
  base	
  pairs	
  when	
  
     complexed	
  with	
  DNA	
  
    ImplicaFon:	
  Nucleosomes	
  are	
  
     posiFoned	
  in	
  a	
  highly	
  
     organized,	
  precise	
  manner	
  
The	
  Last	
  Mile	
  

Más contenido relacionado

La actualidad más candente

Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
Nikolay Vyahhi
 

La actualidad más candente (20)

RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
Next-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-SeqNext-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-Seq
 
Rna seq
Rna seqRna seq
Rna seq
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goal
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 

Destacado

OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
Sean Davis
 
Lect 1 scientific-method-bsc-1010_f13_jc
Lect 1 scientific-method-bsc-1010_f13_jcLect 1 scientific-method-bsc-1010_f13_jc
Lect 1 scientific-method-bsc-1010_f13_jc
Junior Umeh
 
Lect 4-&-5 cells-bsc-1010_f13_jc
Lect 4-&-5 cells-bsc-1010_f13_jcLect 4-&-5 cells-bsc-1010_f13_jc
Lect 4-&-5 cells-bsc-1010_f13_jc
Junior Umeh
 
Lect 2 biomolecules-bsc-1010_f13_jc
Lect 2 biomolecules-bsc-1010_f13_jcLect 2 biomolecules-bsc-1010_f13_jc
Lect 2 biomolecules-bsc-1010_f13_jc
Junior Umeh
 
Patient- and Family Centered Care: "Resident Performance from the Patient's V...
Patient- and Family Centered Care: "Resident Performance from the Patient's V...Patient- and Family Centered Care: "Resident Performance from the Patient's V...
Patient- and Family Centered Care: "Resident Performance from the Patient's V...
hanscomhh5
 

Destacado (16)

Genetics Branch Journal club
Genetics Branch Journal clubGenetics Branch Journal club
Genetics Branch Journal club
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Public datatutorialoverview
Public datatutorialoverviewPublic datatutorialoverview
Public datatutorialoverview
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Sssc retreat.bioinfo resources.20110411
Sssc retreat.bioinfo resources.20110411Sssc retreat.bioinfo resources.20110411
Sssc retreat.bioinfo resources.20110411
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis
 
Lect 1 scientific-method-bsc-1010_f13_jc
Lect 1 scientific-method-bsc-1010_f13_jcLect 1 scientific-method-bsc-1010_f13_jc
Lect 1 scientific-method-bsc-1010_f13_jc
 
Bioc strucvariant seattle_11_09
Bioc strucvariant seattle_11_09Bioc strucvariant seattle_11_09
Bioc strucvariant seattle_11_09
 
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor packageShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
 
Culture project
Culture projectCulture project
Culture project
 
Lect 4-&-5 cells-bsc-1010_f13_jc
Lect 4-&-5 cells-bsc-1010_f13_jcLect 4-&-5 cells-bsc-1010_f13_jc
Lect 4-&-5 cells-bsc-1010_f13_jc
 
Lect 2 biomolecules-bsc-1010_f13_jc
Lect 2 biomolecules-bsc-1010_f13_jcLect 2 biomolecules-bsc-1010_f13_jc
Lect 2 biomolecules-bsc-1010_f13_jc
 
SRAdb Bioconductor Package Overview
SRAdb Bioconductor Package OverviewSRAdb Bioconductor Package Overview
SRAdb Bioconductor Package Overview
 
Patient- and Family Centered Care: "Resident Performance from the Patient's V...
Patient- and Family Centered Care: "Resident Performance from the Patient's V...Patient- and Family Centered Care: "Resident Performance from the Patient's V...
Patient- and Family Centered Care: "Resident Performance from the Patient's V...
 

Similar a Forsharing cshl2011 sequencing

Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 
Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02
t7260678
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
mkim8
 
Gene mapping and cloning of disease gene
Gene mapping and cloning of disease geneGene mapping and cloning of disease gene
Gene mapping and cloning of disease gene
Dineshk117
 

Similar a Forsharing cshl2011 sequencing (20)

EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Genome editing tools in plants
Genome editing tools in plantsGenome editing tools in plants
Genome editing tools in plants
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
Molecular marker by anil bl gather
Molecular marker by anil bl gatherMolecular marker by anil bl gather
Molecular marker by anil bl gather
 
RNA fusion transcripts
RNA fusion transcriptsRNA fusion transcripts
RNA fusion transcripts
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
15 molecular markers techniques
15 molecular markers techniques15 molecular markers techniques
15 molecular markers techniques
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Present status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptxPresent status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptx
 
Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
Molecular markers types and applications
Molecular markers types and applicationsMolecular markers types and applications
Molecular markers types and applications
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Gene mapping and cloning of disease gene
Gene mapping and cloning of disease geneGene mapping and cloning of disease gene
Gene mapping and cloning of disease gene
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Forsharing cshl2011 sequencing

  • 1. High-­‐Resolu,on  Views  of   Cancer  Genomes  
  • 2.
  • 4.
  • 5.
  • 7.
  • 10. Overview  of  BAC  in  the  Genome  
  • 15. Repeats  are  not  created  equal  
  • 16. Genomic  Sequencing   TargeFng  the  Exome  
  • 17.   Long  oligos  synthesized  on   arrays  (DNA)     RNA  baits  synthesized   from  DNA  oligo  template     RNA  baits  hybridized  to   DNA  sequencing  library     Targets  captured  using   beads  and  bioFn-­‐labeled   baits     RNA  bait  degraded,   leaving  sequencing  library   enriched  for  target  regions  
  • 18. Data  Flow     FASTQ  files  generated  by  Illumina  pipeline     Aligned  to  reference  genome  (hg18,  excluding   _random,  unmapped,  and  hap)  using  Novoalign     SAM/BAM  used  extensively     Follow  Broad  InsFtute  GATK  pipeline  for  exome   capture     Use  picard  java  library  for  quality  assessment     Processed  BAM  files  available  via  local  hZp  for   browsing  
  • 19. Data  Pipeline....     Samtools  import     Samtools  sort     Picard  MarkDuplicates     GATK  Indel  Realignment     GATK  Quality  RecalibraFon     Picard  QC  metrics  
  • 20. Realignment  around  Indels     The  problem     Aligners  align  each  read  independently     PotenFally  leads  to  increased  error  rates  around   indels     A  potenFal  soluFon     Locally  realign  reads  in  regions  that  might   harbor  an  indel     Goal  is  to  align  reads  overlying  indels  more   accurately,  reducing  errors  in  each  read  and,  in   turn,  reducing  SNV  call  error  rates  
  • 21. Quality Recalibration   Since most SNV callers will rely on quality scores to estimate error probabilities, having the best possible estimates for error rates is important   Reported error rates from the Illumina sequencer generally reflect technical parameters of the base call process, but not other systematic biases   Quality recalibration can include covariates to account for systematic biases   Cycle count, dinucleotide context, original quality, and sample/library variables
  • 22. Variant  Calling  and  EvaluaFon   A  developing  art  
  • 23.
  • 27. SomaFc  (tumor  only)  Variant  
  • 28. Likely  False  PosiFve  (normal  only)  
  • 30. NCI60  Exome  Sequencing   No  Normals  Available!  
  • 31.
  • 32.
  • 33. Variants  by  Genomic  LocaFon  
  • 35. Type  1:  in  dbSNP,  Type  2:  not  in  dbSNP  
  • 36. Coding,  novel  (no  dbSNP)  
  • 37.
  • 38. Copy  Number  from  Exomes  
  • 39.
  • 40.
  • 41.
  • 42. Complete  Genome  Sequencing   Complete  Genomics  Data  
  • 43. Data     Delivery     Via  USB  results     Storage     Sizes  are  LARGE     400GB  per  sample  as  delivered  with  raw  reads  included     Should  use  2-­‐locaFon  backed-­‐up  storage     Not  trivial  to  find  such  storage,  so  might  resort  to  mulFple   USB  drives     Minimize:     Data  movement     Keeping  mulFple  copies  indefinitely  
  • 44. Breakdown  of  Data  Sizes  
  • 45.
  • 46. Data     Delivery     Storage     Processing     Data  are  typically  tab-­‐delimited  text  files,  so  Excel   can  be  useful  for  examining  individual  small  files     Generally,  command-­‐line  tools  needed     MacOS  and  linux  only  supported  operaFng   systems,  but  Windows  might  work....     Some  analyses  (snpdiff)  require  large  memory  
  • 48. Workflows     Tumor/Normal     Copy  Number     Structural  Varia,on     Annotated  SomaFc  Variants     Germline     List  of  annotated  genotypes  per  individual,   summarized  into  a  single  file  that  can  be  used  for   filtering  
  • 50. Germline  Workflow     Output     Future  direcFons     Be  “smarter”  about  inheritance  framework     Further  refinements  of  comparison  to  other  data   types  (exomes,  snp  arrays,  RNA-­‐seq)  
  • 52. Medvedev  et  al.,  Nature  2009  
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58. Frequent  geneFc  alteraFons  in  three  criFcal  signalling  pathways.   The  Cancer  Genome  Atlas  Research  Network  Nature  000,  1-­‐8  (2008)  doi:10.1038/nature07385  
  • 59.
  • 60.
  • 61. ChromaFn     ChromaFn  is  the  complex  of  protein  and  DNA  that  make  up   the  chromosomes.    It  is  not  a  staFc  structure.  
  • 62.   DNAse  is  an  enzyme   that  cuts  DNA  at   locaFons  where  DNA  is   accessible     These  “accessible”   regions  have  been   associated  with  open   chromaFn     Regions  of  open   chromaFn  are   necessary  for   transcripFonal  and   regulatory  machinery  to   have  access  to  gene   neighborhoods  and   facilitate  transcripFon  
  • 63. DNAse  HypersensiFvity     Method  for  finding  regions  of  “open”   chromaFn     In  data  published  with  the  ENCODE   consorFum,  DNAse  hypersensiFve  (HS)   were  shown  to  be  correlated  with:     Histone  modificaFon     TranscripFon  start  sites     Early  replicaFng  regions     TranscripFon  factor  binding  sites   (experimentally  determined  by  ChIP/chip,   etc.)   IdenFficaFon  and  analysis  of  funcFonal  elements  in  1%  of  the  human  genome  by  the  ENCODE   pilot  project.    The  ENCODE  ConsorFum.    Nature,  2007.  
  • 64. DNAse-­‐chip  Method   Crawford,  G.E.,  Davis,  S.,  Scacheri,  P.C.,  Renaud,  G.,  Halawi,  M.J.,  Erdos,  M.R.,  Green,  R.,   Meltzer,  P.S.,  Wolfsberg,  T.G.,  and  Collins,  F.S.  Nat  Methods,  2006  
  • 65. DNAse-­‐Seq  Method   Crawford,  G.E.,  Davis,  S.,  Scacheri,  P.C.,  Renaud,  G.,  Halawi,  M.J.,  Erdos,  M.R.,  Green,  R.,   Meltzer,  P.S.,  Wolfsberg,  T.G.,  and  Collins,  F.S.  Nat  Methods,  2006  
  • 66.
  • 67. DNAse  Sites  RelaFve  to  Genes  
  • 68. DNAse  HS  Sites  and  Gene  Expression     DNAse  HS  sites  near   transcripFon  start  sites   are  associated  with   acFvely  transcribed   genes.  
  • 69.
  • 70. Nucleosome  PosiFoning     Distances  between  sequences   in  non-­‐DNAse  HS  regions  have   an  oscillaFng  paZern  with   frequency  that  corresponds  to   a  single  turn  of  the  double-­‐ helix     DNAse  is  known  to  cut   preferenFally  in  the  minor   groove,  which  is  exposed  every   10.4  bases  when  wrapped   around  a  nucleosome     A  nucleosome  is  wrapped  by   147  base  pairs  when   complexed  with  DNA     ImplicaFon:  Nucleosomes  are   posiFoned  in  a  highly   organized,  precise  manner  
  • 71.
  • 72.