SlideShare una empresa de Scribd logo
1 de 16
Luc Dehaspe Genomics Core, UZ Leuven WOUD – Onderzoeksgroep Associatie Universiteit Gent - 28 Sept 2011  Race against the sequencing machineProcessing of raw DNA sequence data at the Genomics Core
DNA sequencing determines the order of nucleotide bases in a genome DNA replicationmachinary HumanGenome 2 x 3 billion bases Human Genome 2 x 3 billion bases hours Sequencing machine FinalGenerationSequencing machine Computer’s copyfunction Human Genome 2 x 800 Mbtext Human Genome 2 x 800 Mbtext minutes
Nextgeneration sequencing Qualitydeterioratesafter 100-1000 base pairs Solution: Cut genomes in readablefragments Sequencefragments->reads Usebioinformatics to reconstruct genomes fromreads HumanGenome 2 x 3 billion bases NextGenerationSequencing machine Reads in textformat bioinformatics Human Genome 2 x 800 Mbtext
SequencersvsBioinformatics HumanGenome 2 x 3 billion bases HiSeq 2000 v3 HiSeq 2000 v2 Roche GS FLX 55billion bases per day 6 Human Genomes in 10 days 18billion bases per day 1billionbpd bioinformatics Scale up bioinformaticsor pile up sequencer output Human Genome 2 x 800 Mbtext
 Case: HumanExome, raw data = 1.1 billionreads2x100bp , HiSeq 2000 v3, ½ run Bioinformaticspipeline Demultiplex Sortindexedreads per sample Alignment Alignreads per sample to reference genome
 Case: HumanExome, raw data = 1.1 billionreads2x100bp , HiSeq 2000 v3, ½ run Bioinformaticspipeline Demultiplex Sortindexedreads per sample Alignment Alignreads per sample to reference genome Variant Calling Comparepileup of reads at givenlocus to reference, identifySNPs, insertions and deletions
A bioinformaticspipeline  Case: HumanExome, raw data = 1.1 billionreads2x100bp , HiSeq 2000 v3, ½ run Demultiplex Sortindexedreads per sample Alignment Alignreads per sample to reference genome Variant Calling Compare to reference, identifySNPs, insertions and deletions Annotatevariants (gene, effect onproteinsequence, conservation, frequency, predicted effect onproteinfunction, … Annotation Sequencing: 10 days Abovepipeline: > 60 dayson 1 cpu Scale up orpile up
Favourable race conditions Sametaskperformedonmanyreadsorloci FOR 1.1 billionindexedreads DO Identify sample FOR 3 billionHuman Genome loci DO Comparelocus in alignedreads to reference and identify homo- and heterozygoticSNPs Resultsforoneread/locus independent of resultsforotherreads/loci Suggestsnaturalscale up strategy …
Data parallelism Reads or loci partitioned among nodes of computer cluster  Each node demultiplexes, aligns, etc on local partition Speed up (near) linear to number of cluster nodes Variant calling 3 billionHuman Genome loci Variant calling Chr1 Variant callingChrY Cluster of 24 computers (nodes)
Data parallelism DemultiplexHiSeq 2000 microplate 1 node, 1.1 billionreads 1600 reads per second 8 days 1 microplate ,[object Object],1 1 day …  8 lanes ,[object Object],8 1 1 384 ½ hour 384 tiles …
Favourable race conditions MapReduce: data parallelism made easy Developed and extensivelyused at Google Open sourcelibrary (C++) takes care of Parallelization Fault Tolerance Data Distribution Load Balancing No knowledge of parallel systems required User implements functions Map() and Reduce()
MapReduce: demultiplexreads 8 lanes 8 Map tasks … Map: sortreads Map: sortreads Sample1 Sample3 Sample2 Sample1 Sample3 Sample2 Waituntil map has finished 8 1  Sample1 reads  Sample3 reads  Sample2 reads Reduce: deduplicatereads Reduce: deduplicatereads Reduce: deduplicatereads Sample1.fastq.gz Sample3.fastq.gz Sample2.fastq.gz
Favourable Race Conditions GATK: MapReducefor sequencing projects Genome analysis toolkit Developedby and usedextensively at BroadInstitute (Harvard and MIT) Open Source, Java 1.6 framework Provides common data accesspatterns Traversalbyread Traversalbylocus
Favourable race conditions Data parallelismsupportedbymany (open source) bioinformatics tools Number of nodes is parameter Full analysispipelineswidelyavailable GATK CASAVA …
Conclusion Data parallelism is key Scale up bybuying extra cluster nodes Genomics core recentlyadded 400 nodes(shared) Cannedsolutionsforcommonbioinformaticstasks Establishedprogrammingframeworksforcustomsolutions MapReduce GATK
Conclusion Bioinformaticiansenjoyfavourableconditionsforkeepingpacewithsequencer … HumanGenome 2 x 3 billion bases NextGenerationSequencing machine FinalGeneration Sequencing machine Reads in textformat Bioinformaticsusing data parallelism Human Genome 2 x 800 Mbtext ,[object Object]

Más contenido relacionado

Destacado

China health presentation may 2012
China health presentation may 2012China health presentation may 2012
China health presentation may 2012
healthchina
 

Destacado (7)

China health presentation may 2012
China health presentation may 2012China health presentation may 2012
China health presentation may 2012
 
Opportunities and Challenges Associated with Novel Companion Diagnostic Techn...
Opportunities and Challenges Associated with Novel Companion Diagnostic Techn...Opportunities and Challenges Associated with Novel Companion Diagnostic Techn...
Opportunities and Challenges Associated with Novel Companion Diagnostic Techn...
 
China Exit or Co-Investment Opportunities for German PE Investors
China Exit or Co-Investment Opportunities for German PE InvestorsChina Exit or Co-Investment Opportunities for German PE Investors
China Exit or Co-Investment Opportunities for German PE Investors
 
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 

Similar a Race against the sequencing machine: processing of raw DNA sequence data at the Genomics Core

Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
Dr. Olusoji Adewumi
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
Computer Science Club
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64
PeterMaf
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64
PeterMaf
 

Similar a Race against the sequencing machine: processing of raw DNA sequence data at the Genomics Core (20)

Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
2012 oslo-talk
2012 oslo-talk2012 oslo-talk
2012 oslo-talk
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing Data
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
DNA memories
DNA memoriesDNA memories
DNA memories
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
NCBI
NCBINCBI
NCBI
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 

Más de Maté Ongenaert

Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
Maté Ongenaert
 
Exploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosisExploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosis
Maté Ongenaert
 
High-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting themHigh-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting them
Maté Ongenaert
 
The post-genomic era: epigenetic sequencing applications and data integration
The post-genomic era: epigenetic sequencing applications and data integrationThe post-genomic era: epigenetic sequencing applications and data integration
The post-genomic era: epigenetic sequencing applications and data integration
Maté Ongenaert
 
Literature managment training
Literature managment trainingLiterature managment training
Literature managment training
Maté Ongenaert
 

Más de Maté Ongenaert (18)

Unleash transcriptomics to gain insights in disease mechanisms: integration i...
Unleash transcriptomics to gain insights in disease mechanisms: integration i...Unleash transcriptomics to gain insights in disease mechanisms: integration i...
Unleash transcriptomics to gain insights in disease mechanisms: integration i...
 
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
 
Ecobouwers opendeur passiefhuis Lokeren
Ecobouwers opendeur passiefhuis LokerenEcobouwers opendeur passiefhuis Lokeren
Ecobouwers opendeur passiefhuis Lokeren
 
Workshop NGS data analysis - 3
Workshop NGS data analysis - 3Workshop NGS data analysis - 3
Workshop NGS data analysis - 3
 
ENCODE project: brief summary of main findings
ENCODE project: brief summary of main findingsENCODE project: brief summary of main findings
ENCODE project: brief summary of main findings
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
Exploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosisExploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosis
 
High-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting themHigh-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting them
 
Microarray data and pathway analysis: example from the bench
Microarray data and pathway analysis: example from the benchMicroarray data and pathway analysis: example from the bench
Microarray data and pathway analysis: example from the bench
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
 
Integrative transcriptomics to study non-coding RNA functions
Integrative transcriptomics to study non-coding RNA functionsIntegrative transcriptomics to study non-coding RNA functions
Integrative transcriptomics to study non-coding RNA functions
 
Bringing the data back to the researchers
Bringing the data back to the researchersBringing the data back to the researchers
Bringing the data back to the researchers
 
The post-genomic era: epigenetic sequencing applications and data integration
The post-genomic era: epigenetic sequencing applications and data integrationThe post-genomic era: epigenetic sequencing applications and data integration
The post-genomic era: epigenetic sequencing applications and data integration
 
Introduction
IntroductionIntroduction
Introduction
 
Literature managment training
Literature managment trainingLiterature managment training
Literature managment training
 
Scientific literature managment - exercises
Scientific literature managment - exercisesScientific literature managment - exercises
Scientific literature managment - exercises
 

Último

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Race against the sequencing machine: processing of raw DNA sequence data at the Genomics Core

  • 1. Luc Dehaspe Genomics Core, UZ Leuven WOUD – Onderzoeksgroep Associatie Universiteit Gent - 28 Sept 2011 Race against the sequencing machineProcessing of raw DNA sequence data at the Genomics Core
  • 2. DNA sequencing determines the order of nucleotide bases in a genome DNA replicationmachinary HumanGenome 2 x 3 billion bases Human Genome 2 x 3 billion bases hours Sequencing machine FinalGenerationSequencing machine Computer’s copyfunction Human Genome 2 x 800 Mbtext Human Genome 2 x 800 Mbtext minutes
  • 3. Nextgeneration sequencing Qualitydeterioratesafter 100-1000 base pairs Solution: Cut genomes in readablefragments Sequencefragments->reads Usebioinformatics to reconstruct genomes fromreads HumanGenome 2 x 3 billion bases NextGenerationSequencing machine Reads in textformat bioinformatics Human Genome 2 x 800 Mbtext
  • 4. SequencersvsBioinformatics HumanGenome 2 x 3 billion bases HiSeq 2000 v3 HiSeq 2000 v2 Roche GS FLX 55billion bases per day 6 Human Genomes in 10 days 18billion bases per day 1billionbpd bioinformatics Scale up bioinformaticsor pile up sequencer output Human Genome 2 x 800 Mbtext
  • 5. Case: HumanExome, raw data = 1.1 billionreads2x100bp , HiSeq 2000 v3, ½ run Bioinformaticspipeline Demultiplex Sortindexedreads per sample Alignment Alignreads per sample to reference genome
  • 6. Case: HumanExome, raw data = 1.1 billionreads2x100bp , HiSeq 2000 v3, ½ run Bioinformaticspipeline Demultiplex Sortindexedreads per sample Alignment Alignreads per sample to reference genome Variant Calling Comparepileup of reads at givenlocus to reference, identifySNPs, insertions and deletions
  • 7. A bioinformaticspipeline Case: HumanExome, raw data = 1.1 billionreads2x100bp , HiSeq 2000 v3, ½ run Demultiplex Sortindexedreads per sample Alignment Alignreads per sample to reference genome Variant Calling Compare to reference, identifySNPs, insertions and deletions Annotatevariants (gene, effect onproteinsequence, conservation, frequency, predicted effect onproteinfunction, … Annotation Sequencing: 10 days Abovepipeline: > 60 dayson 1 cpu Scale up orpile up
  • 8. Favourable race conditions Sametaskperformedonmanyreadsorloci FOR 1.1 billionindexedreads DO Identify sample FOR 3 billionHuman Genome loci DO Comparelocus in alignedreads to reference and identify homo- and heterozygoticSNPs Resultsforoneread/locus independent of resultsforotherreads/loci Suggestsnaturalscale up strategy …
  • 9. Data parallelism Reads or loci partitioned among nodes of computer cluster Each node demultiplexes, aligns, etc on local partition Speed up (near) linear to number of cluster nodes Variant calling 3 billionHuman Genome loci Variant calling Chr1 Variant callingChrY Cluster of 24 computers (nodes)
  • 10.
  • 11. Favourable race conditions MapReduce: data parallelism made easy Developed and extensivelyused at Google Open sourcelibrary (C++) takes care of Parallelization Fault Tolerance Data Distribution Load Balancing No knowledge of parallel systems required User implements functions Map() and Reduce()
  • 12. MapReduce: demultiplexreads 8 lanes 8 Map tasks … Map: sortreads Map: sortreads Sample1 Sample3 Sample2 Sample1 Sample3 Sample2 Waituntil map has finished 8 1 Sample1 reads Sample3 reads Sample2 reads Reduce: deduplicatereads Reduce: deduplicatereads Reduce: deduplicatereads Sample1.fastq.gz Sample3.fastq.gz Sample2.fastq.gz
  • 13. Favourable Race Conditions GATK: MapReducefor sequencing projects Genome analysis toolkit Developedby and usedextensively at BroadInstitute (Harvard and MIT) Open Source, Java 1.6 framework Provides common data accesspatterns Traversalbyread Traversalbylocus
  • 14. Favourable race conditions Data parallelismsupportedbymany (open source) bioinformatics tools Number of nodes is parameter Full analysispipelineswidelyavailable GATK CASAVA …
  • 15. Conclusion Data parallelism is key Scale up bybuying extra cluster nodes Genomics core recentlyadded 400 nodes(shared) Cannedsolutionsforcommonbioinformaticstasks Establishedprogrammingframeworksforcustomsolutions MapReduce GATK
  • 16.