SlideShare una empresa de Scribd logo
1 de 10
Cloud-scale genomics: examples and lessons ,[object Object],Department of Biostatistics
Why? ,[object Object],[object Object],[object Object],[object Object],Why not? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Cloud debate on 1 slide 1.6 Gbp/day 1 5 Gbp/day 1 25 Gbp/day 2 1. http://www.politigenomics.com/next-generation-sequencing-informatics 2. http://www.politigenomics.com/2010/01/hiseq-2000.html Conclusion: let’s try it but hedge our bets
Crossbow GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC |||||||||  ||||| TCTCTCCCA GG AGAGC Align Aggregate Reference Call: HET A, G p-value: 0.0023 GTCGCAGTATCTGTCT GTCGCAGTATCTGT NN TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TAT A TCGCAGTATCT T TAT A TCGCAGTATCTG N AT A TCGCAGTAT N TG CCCTAT A TCGCAGTAT A CACCCTATGTCGCA A CACCCTAT C TCGCA A CACCCTATGTCGCA GA - CACCCTATGTCGC CCGGA - CACCCTAT A T CCGGA - CACCCTAT A T GCCGGA - CACCCTATG Statistics Parallel by read Handled by Hadoop Parallel by genome bin
Myrna Gene 1 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT CCGGAGCACCCTATAT GCCGGAGCACCCTATG GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC |||||||||  ||||| TCTCTCCCA GG AGAGC Align Gene 1 differentially expressed?: YES p-value: 0.0012 TGTCGCAGTATCTGTC AGCACCCTATGTCGCA GCCGGAGCACCCTATG GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC |||||||||  ||||| TCTCTCCCA GG AGAGC Sample A Sample B Align Aggregate Aggregate Overlap Aggregate Normalize Aggregate Normalize Aggregate Statistics Parallel by read Handled by Hadoop Parallel by genome bin Handled by Hadoop Parallel by sample Handled by Hadoop Parallel by gene
Myrna Table 1 . Timing and cost for a Myrna experiment with 1.1 billion 35 bp unpaired reads   from the Pickrell   et al  study as input.  Costs are approximate and based on the pricing as of this writing, that is, $0.68 per extra-large high-CPU EC2 node per hour in the Northern Virginia zone and $0.78 in other zones, plus a $0.12 per-node-per-hour surcharge for Elastic MapReduce in all zones.  Times can vary subject to, for example, congestion and Internet traffic conditions. Data transfer adds about 1hr:15m, $11 Myrna Runtime, Cost for 1.1 billion reads from Pickrell  et al  study EC2 Nodes 1 master,  10 workers 1 master,  20 workers 1 master,  40 workers Worker CPU cores 80 160 320 Wall clock time 4h:20m 2h:32m 1h:38m Cluster setup 4m 4m 3m Align 2h:56m 1h:31m 54m Overlap 52m 31m 16m Normalize 6m 7m 6m Statistics 9m 6m 6m Summarize & Postprocess 13m 14m 13m Approximate cost (N. Virginia / Elsewhere) $44.00 / $49.50 $50.40 / $56.70 $65.60 / $73.80
Myrna 71% 55%
Bet-hedging architecture Cloud driver script Wrapper bowtie Wrapper soapsnp Postprocess Hadoop Wrapper bowtie Wrapper soapsnp Postprocess Hadoop Singleton driver script Wrapper bowtie Wrapper soapsnp Postprocess Perl, fork, sort Hadoop driver script Cloud mode Hadoop mode Single-computer mode
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Crossbow Data transfer adds about 1hr:15m, $28
Crossbow 43% 58%

Más contenido relacionado

Destacado

Issr plodinec
Issr plodinecIssr plodinec
Issr plodinecplodinec
 
M2 k4.2 e1 bantuan pernafasan
M2 k4.2 e1 bantuan pernafasanM2 k4.2 e1 bantuan pernafasan
M2 k4.2 e1 bantuan pernafasanAbang Ensem
 
BTM Group Overview
BTM Group OverviewBTM Group Overview
BTM Group OverviewSteve Marsh
 
любовь твоя бог
любовь твоя боглюбовь твоя бог
любовь твоя богko63ar
 
How To Use Your Website to Get Customers
How To Use Your Website to Get CustomersHow To Use Your Website to Get Customers
How To Use Your Website to Get CustomersclickTRUE
 
Aprendiendo uml en_24_horas
Aprendiendo uml en_24_horasAprendiendo uml en_24_horas
Aprendiendo uml en_24_horascesaraugusta
 
Vänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i AlmedalenVänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i AlmedalenInfopaq Sverige
 
605專屬搭畢業特輯
605專屬搭畢業特輯605專屬搭畢業特輯
605專屬搭畢業特輯musicghost
 
Bird oral gr 5
Bird oral gr 5Bird oral gr 5
Bird oral gr 5Damian
 
Presentacion ingles jaime torres
Presentacion ingles jaime torresPresentacion ingles jaime torres
Presentacion ingles jaime torresIE EL TESORO
 

Destacado (18)

Issr plodinec
Issr plodinecIssr plodinec
Issr plodinec
 
M2 k4.2 e1 bantuan pernafasan
M2 k4.2 e1 bantuan pernafasanM2 k4.2 e1 bantuan pernafasan
M2 k4.2 e1 bantuan pernafasan
 
Take Your Small Business Global
Take Your Small Business GlobalTake Your Small Business Global
Take Your Small Business Global
 
BTM Group Overview
BTM Group OverviewBTM Group Overview
BTM Group Overview
 
любовь твоя бог
любовь твоя боглюбовь твоя бог
любовь твоя бог
 
Linked In Power Point 2
Linked In Power Point 2Linked In Power Point 2
Linked In Power Point 2
 
How To Use Your Website to Get Customers
How To Use Your Website to Get CustomersHow To Use Your Website to Get Customers
How To Use Your Website to Get Customers
 
Pileoffruit
PileoffruitPileoffruit
Pileoffruit
 
Job
JobJob
Job
 
中秋 快 _1[1..
中秋 快 _1[1..中秋 快 _1[1..
中秋 快 _1[1..
 
Aprendiendo uml en_24_horas
Aprendiendo uml en_24_horasAprendiendo uml en_24_horas
Aprendiendo uml en_24_horas
 
Vänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i AlmedalenVänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i Almedalen
 
Camera care
Camera careCamera care
Camera care
 
605專屬搭畢業特輯
605專屬搭畢業特輯605專屬搭畢業特輯
605專屬搭畢業特輯
 
Bird oral gr 5
Bird oral gr 5Bird oral gr 5
Bird oral gr 5
 
Final project lourdes
Final project lourdesFinal project lourdes
Final project lourdes
 
Battery care
Battery careBattery care
Battery care
 
Presentacion ingles jaime torres
Presentacion ingles jaime torresPresentacion ingles jaime torres
Presentacion ingles jaime torres
 

Similar a Langmead bosc2010 cloud-genomics

Towards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesTowards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesWesley De Neve
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
 
FIWARE Global Summit - Smart City / Community Services and Infrastructures
FIWARE Global Summit - Smart City / Community Services and InfrastructuresFIWARE Global Summit - Smart City / Community Services and Infrastructures
FIWARE Global Summit - Smart City / Community Services and InfrastructuresFIWARE
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...NECST Lab @ Politecnico di Milano
 
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-FlopIRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-FlopIRJET Journal
 
Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008bosc_2008
 
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...IRJET- Study of Real Time Kinematica Survey with Differential Global Position...
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...IRJET Journal
 
Biotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsBiotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsTaha A. Taha
 
SRv6 Mobile User Plane : Initial POC and Implementation
SRv6 Mobile User Plane : Initial POC and ImplementationSRv6 Mobile User Plane : Initial POC and Implementation
SRv6 Mobile User Plane : Initial POC and ImplementationKentaro Ebisawa
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
PIT Overload Analysis in Content Centric Networks - Slides ICN '13
PIT Overload Analysis in Content Centric Networks - Slides ICN '13PIT Overload Analysis in Content Centric Networks - Slides ICN '13
PIT Overload Analysis in Content Centric Networks - Slides ICN '13Matteo Virgilio
 
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...ADVA
 
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...IRJET Journal
 
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdf
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdfKey Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdf
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdfssuser3be61c1
 
Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Piotr Dziurzanski
 
Gene mutations
Gene mutationsGene mutations
Gene mutationspawl9
 
Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network IJECEIAES
 
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...Yole Developpement
 

Similar a Langmead bosc2010 cloud-genomics (20)

In silico analysis for unknown data
In silico analysis for unknown dataIn silico analysis for unknown data
In silico analysis for unknown data
 
Towards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesTowards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniques
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
FIWARE Global Summit - Smart City / Community Services and Infrastructures
FIWARE Global Summit - Smart City / Community Services and InfrastructuresFIWARE Global Summit - Smart City / Community Services and Infrastructures
FIWARE Global Summit - Smart City / Community Services and Infrastructures
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...
 
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-FlopIRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
 
Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008
 
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...IRJET- Study of Real Time Kinematica Survey with Differential Global Position...
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...
 
Biotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsBiotech Era Ahead: Transcriptomics
Biotech Era Ahead: Transcriptomics
 
SRv6 Mobile User Plane : Initial POC and Implementation
SRv6 Mobile User Plane : Initial POC and ImplementationSRv6 Mobile User Plane : Initial POC and Implementation
SRv6 Mobile User Plane : Initial POC and Implementation
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
PIT Overload Analysis in Content Centric Networks - Slides ICN '13
PIT Overload Analysis in Content Centric Networks - Slides ICN '13PIT Overload Analysis in Content Centric Networks - Slides ICN '13
PIT Overload Analysis in Content Centric Networks - Slides ICN '13
 
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...
 
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...
 
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdf
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdfKey Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdf
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdf
 
Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...
 
Edge trends mizuno
Edge trends mizunoEdge trends mizuno
Edge trends mizuno
 
Gene mutations
Gene mutationsGene mutations
Gene mutations
 
Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network
 
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...
 

Más de BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 embossBOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evokerBOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorBOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorBOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfBOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsBOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perlBOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaBOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytowebBOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloBOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptxBOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 

Más de BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 

Último

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Último (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Langmead bosc2010 cloud-genomics

  • 1.
  • 2.
  • 3. Crossbow GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC ||||||||| ||||| TCTCTCCCA GG AGAGC Align Aggregate Reference Call: HET A, G p-value: 0.0023 GTCGCAGTATCTGTCT GTCGCAGTATCTGT NN TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TAT A TCGCAGTATCT T TAT A TCGCAGTATCTG N AT A TCGCAGTAT N TG CCCTAT A TCGCAGTAT A CACCCTATGTCGCA A CACCCTAT C TCGCA A CACCCTATGTCGCA GA - CACCCTATGTCGC CCGGA - CACCCTAT A T CCGGA - CACCCTAT A T GCCGGA - CACCCTATG Statistics Parallel by read Handled by Hadoop Parallel by genome bin
  • 4. Myrna Gene 1 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT CCGGAGCACCCTATAT GCCGGAGCACCCTATG GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC ||||||||| ||||| TCTCTCCCA GG AGAGC Align Gene 1 differentially expressed?: YES p-value: 0.0012 TGTCGCAGTATCTGTC AGCACCCTATGTCGCA GCCGGAGCACCCTATG GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC ||||||||| ||||| TCTCTCCCA GG AGAGC Sample A Sample B Align Aggregate Aggregate Overlap Aggregate Normalize Aggregate Normalize Aggregate Statistics Parallel by read Handled by Hadoop Parallel by genome bin Handled by Hadoop Parallel by sample Handled by Hadoop Parallel by gene
  • 5. Myrna Table 1 . Timing and cost for a Myrna experiment with 1.1 billion 35 bp unpaired reads from the Pickrell et al study as input. Costs are approximate and based on the pricing as of this writing, that is, $0.68 per extra-large high-CPU EC2 node per hour in the Northern Virginia zone and $0.78 in other zones, plus a $0.12 per-node-per-hour surcharge for Elastic MapReduce in all zones. Times can vary subject to, for example, congestion and Internet traffic conditions. Data transfer adds about 1hr:15m, $11 Myrna Runtime, Cost for 1.1 billion reads from Pickrell et al study EC2 Nodes 1 master, 10 workers 1 master, 20 workers 1 master, 40 workers Worker CPU cores 80 160 320 Wall clock time 4h:20m 2h:32m 1h:38m Cluster setup 4m 4m 3m Align 2h:56m 1h:31m 54m Overlap 52m 31m 16m Normalize 6m 7m 6m Statistics 9m 6m 6m Summarize & Postprocess 13m 14m 13m Approximate cost (N. Virginia / Elsewhere) $44.00 / $49.50 $50.40 / $56.70 $65.60 / $73.80
  • 7. Bet-hedging architecture Cloud driver script Wrapper bowtie Wrapper soapsnp Postprocess Hadoop Wrapper bowtie Wrapper soapsnp Postprocess Hadoop Singleton driver script Wrapper bowtie Wrapper soapsnp Postprocess Perl, fork, sort Hadoop driver script Cloud mode Hadoop mode Single-computer mode
  • 8.
  • 9. Crossbow Data transfer adds about 1hr:15m, $28