SlideShare una empresa de Scribd logo
1 de 5
Descargar para leer sin conexión
Files, Tools, and Bioinformatics in the Cloud


   Thomas Keane

   Vertebrate Resequencing Informatics
   WTSI
   thomas.keane@sanger.ac.uk




Vertebrate Resequencing Informatics   17th November, 2009
DATA is the problem!

 NGS means large volumes of raw data
     Previously SRF (~8-10bytes per bp), now BAM (~1.6bytes per bp)
 How much data can a sequencing machine produce?
     20Gbp per lane, 16 lanes per run (1 run = 1.5 weeks) => 11Tbp/year
     Small sequencing center: 4 machines?
     44Tbp per year!
 Raw data in BAM: 70Tbytes                             SV Calling: SVMerge
 Processed calls much smaller
     1000G pilot VCF < 1Gbyte

        Alignment + BAM improvement




Vertebrate Resequencing Informatics   17th November, 2009
Simplistic Model: Cloud as compute resource




                                                                               Processes
                                                                  1. Align


                            SRF/Fastq/BAM
                            (2Mbps/sec)                           Variant calling (n x SNP callers, n indel
                                                                  callers, SV callers)
Sequencing Center/Institute                                   BAM + VCF
                                                              (2Mbps/sec)


                                           BAM                                      3,240 days
                                           VCF                                      to upload!


  Vertebrate Resequencing Informatics   17th November, 2009
Move the raw data generation to the compute




                                                                Variant calling (n x SNP callers, n indel
                                                                callers, SV callers)



Sequencing Center/Institute
                                            VCF
                         BAM
                         VCF

    Vertebrate Resequencing Informatics   17th November, 2009
Large Collaborative Projects: Cloud centric model




                                                            VCF

                                               Analysis groups
Vertebrate Resequencing Informatics   17th November, 2009

Más contenido relacionado

Destacado

Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computing
Haslina
 
2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study
North Bridge
 

Destacado (10)

Cloud Computing And Mobility
Cloud Computing And MobilityCloud Computing And Mobility
Cloud Computing And Mobility
 
Cloud Computing and the Datacenter of the Future
Cloud Computing and the Datacenter of the FutureCloud Computing and the Datacenter of the Future
Cloud Computing and the Datacenter of the Future
 
Trend and Future of Cloud Computing
Trend and Future of Cloud ComputingTrend and Future of Cloud Computing
Trend and Future of Cloud Computing
 
2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computing
 
2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple ppt
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computing
 
cloud computing ppt
cloud computing pptcloud computing ppt
cloud computing ppt
 

Similar a Next generation sequencing in cloud computing era

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
Thomas Keane
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and Challenges
Thomas Keane
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
Thomas Keane
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
PTIHPA
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
Daniel Bimschas
 
Distributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] worldDistributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] world
SATOSHI TAGOMORI
 
Cambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specificationCambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specification
Advantec Distribution
 

Similar a Next generation sequencing in cloud computing era (20)

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data Preprocessing
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and Challenges
 
ULTRA WIDE BAND BODY AREA NETWORK
ULTRA WIDE BAND BODY AREA NETWORKULTRA WIDE BAND BODY AREA NETWORK
ULTRA WIDE BAND BODY AREA NETWORK
 
Creating a SNP calling pipeline
Creating a SNP calling pipelineCreating a SNP calling pipeline
Creating a SNP calling pipeline
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 
Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)
 
PBCore, METS, PREMIS, MODS, METSRights...oh my!
PBCore, METS, PREMIS, MODS, METSRights...oh my!PBCore, METS, PREMIS, MODS, METSRights...oh my!
PBCore, METS, PREMIS, MODS, METSRights...oh my!
 
IPv6 strategy for deployment at ETH Switzerland
IPv6 strategy for deployment at ETH SwitzerlandIPv6 strategy for deployment at ETH Switzerland
IPv6 strategy for deployment at ETH Switzerland
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Cambium networks ugps_specification
Cambium networks ugps_specificationCambium networks ugps_specification
Cambium networks ugps_specification
 
DAC 2012
DAC 2012DAC 2012
DAC 2012
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
 
Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling
Providing Performance Guarantees to Virtual Machines using Real-Time SchedulingProviding Performance Guarantees to Virtual Machines using Real-Time Scheduling
Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling
 
ProcessOne Push Platform: pubsub.p1pp.net
ProcessOne Push Platform: pubsub.p1pp.netProcessOne Push Platform: pubsub.p1pp.net
ProcessOne Push Platform: pubsub.p1pp.net
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
Distributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] worldDistributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] world
 
Cambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specificationCambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specification
 

Más de Thomas Keane

2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
Thomas Keane
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
Thomas Keane
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-Editing
Thomas Keane
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...
Thomas Keane
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
Thomas Keane
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010
Thomas Keane
 

Más de Thomas Keane (10)

Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotations
 
Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)
 
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-Editing
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...
 
Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Next generation sequencing in cloud computing era

  • 1. Files, Tools, and Bioinformatics in the Cloud Thomas Keane Vertebrate Resequencing Informatics WTSI thomas.keane@sanger.ac.uk Vertebrate Resequencing Informatics 17th November, 2009
  • 2. DATA is the problem! NGS means large volumes of raw data   Previously SRF (~8-10bytes per bp), now BAM (~1.6bytes per bp) How much data can a sequencing machine produce?   20Gbp per lane, 16 lanes per run (1 run = 1.5 weeks) => 11Tbp/year   Small sequencing center: 4 machines?   44Tbp per year! Raw data in BAM: 70Tbytes SV Calling: SVMerge Processed calls much smaller   1000G pilot VCF < 1Gbyte Alignment + BAM improvement Vertebrate Resequencing Informatics 17th November, 2009
  • 3. Simplistic Model: Cloud as compute resource Processes 1. Align SRF/Fastq/BAM (2Mbps/sec) Variant calling (n x SNP callers, n indel callers, SV callers) Sequencing Center/Institute BAM + VCF (2Mbps/sec) BAM 3,240 days VCF to upload! Vertebrate Resequencing Informatics 17th November, 2009
  • 4. Move the raw data generation to the compute Variant calling (n x SNP callers, n indel callers, SV callers) Sequencing Center/Institute VCF BAM VCF Vertebrate Resequencing Informatics 17th November, 2009
  • 5. Large Collaborative Projects: Cloud centric model VCF Analysis groups Vertebrate Resequencing Informatics 17th November, 2009