SlideShare una empresa de Scribd logo
1 de 43
Bionimbus: A Cloud-Based Infrastructure for Managing, Analyzing and Sharing Genomics Data  April 21, 2011 Robert Grossman Institute for Genomics & Systems Biology (IGSB) Computation InstituteUniversity of Chicago and Open Cloud Consortium
Background
Growth of Genomic Data Sequence everything AWS   Hadoop GFS Sequence environment 2006 2008 2003 Sequence species ENCODE HGP 2003 2001 1977 1995 2005 Sanger Sequencing Microarray technology 454, Solexa sequencing 10^10 Genbank 10^5 10^8
Source: Lincoln Stein
The Challenge is to Support Cubes of High Throughput Sequence Data Each cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq,  movie, etc. data set. Different developmental stages Different pathologies Perturb the environment
We Have a Problem … vs More and more of your colleagues produce so much data that they cannot easily manage, move, analyze and share it.   Centers and large projects build their own infrastructure. Every else is on their own.
Part 1.  Using Bionimbus www.bionimbus.org
Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data. 8
Enabling a broad community to utilize genome research User 1. 3. 2. 9 Bionimbus Cloud Sequencing Partner or Center
Step 1. Prepare a Sample
Step 2.  Login to Bionimbus and get a Bionimbus Key.
Step 3.  Fedex your sample to CGI.
Step 4.  Login on to Bionimbus and view your data
Step 5.  Use Bionimbus to perform standard and custom pipelines. Using the ability of Bionimbus to launch multiple virtual machines reduced this analysis from 25 days to 1 day.
Step 2. Send sample tobe sequenced. Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc. InternalSequencers BID Generator CGI Step 5.  Cloud based analysis using IGSB and 3rd party tools and applications.  Step 3a. Return rawreads. Step 3b. Returnvariant calls, CNV, annotation… Bionimbus Private Cloud UC Bionimbus Community Cloud Step 4. Secure datarouting to appropriatecloud based upon BID. Bionimbus Private Cloud XY Amazon dbGaP
Part 2. Introduction to Clouds
Clouds provide on-demand computing and storage resources at the scale and with the reliability of a data center. Computer scientists were caught by surprise. 17
What is a Cloud? 18 Software as a Service (SaaS)
What Else a Cloud? 19 Infrastructure as a Service (IaaS) Users get one or more virtual machines “on demand”
Are There Other Types of Clouds? 20 ad targeting  Hadoop was developed for processing Internet scale data for ad targeting and related applications but is now used for processing genomics data and may other applications.
What is a new about clouds? 21
22 Scale is New
Elastic, On-Demand Computing with Usage Based Pricing Is New 23 costs the same as 1 computer in a rack for 120 hours 120 computers in  three racks for 1 hour Data center scale computing often leverages virtualization technologies.
Part 3. Some BionimbusCases
Case Study: Public Datasets in Bionimbus
Case Study:  ModENCODE Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments). BionimbusVMs were used for some of the integrative analysis. Bionimbus is used as a backup for the modENCODE DCC
28 >300 ChIP datasets ,[object Object]
CBP
PolII
Pho/silencers
HDACs
Insulators
TFsPredictions 537 silencers 2,307 new promoters 12,285 enhancers 14,145 insulators www.modencode.org www.cistrack.org Negre et al. Nature 2011
Case Study: IGSB All samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.
Bionimbus Virtual Machine Releases  30
Part 4 31 Data Centers for Science
2004 10x-100x 1976 10x-100x data science 1670 250x simulation science 1609 30x experimental science
Open Science Data Cloud Astronomical data Biological data (Bionimbus) NSF-PIRE OSDC Data Challenge Earth science data (& disaster relief)
The goal is to build a data center in Chicago for biological, scientific, medical and health care data in 4 to 5 years.
Part 5. More About Bionimbus
GWT-based Front End Elastic Cloud Services Database Services Analysis Pipelines & Re-analysis Services Intercloud Services Large Data Cloud Services Data Ingestion Services
(Eucalyptus, OpenStack) GWT-based Front End Elastic Cloud Services (PostgreSQL) Database Services Analysis Pipelines & Re-analysis Services Intercloud Services (IDs, etc.) Large Data Cloud Services (UDT, replication) Data Ingestion Services (Hadoop, Sector/Sphere)

Más contenido relacionado

La actualidad más candente

Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Microsoft Azure for Research
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridIan Foster
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis GannonMicrosoft Azure for Research
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonMicrosoft Azure for Research
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardPacificResearchPlatform
 
Eyeo 2019-Lightning-Cytoscape
Eyeo 2019-Lightning-CytoscapeEyeo 2019-Lightning-Cytoscape
Eyeo 2019-Lightning-CytoscapeKeiichiro Ono
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...GigaScience, BGI Hong Kong
 
Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI)
Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI)Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI)
Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI)Larry Smarr
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneIan Foster
 
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataAzure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataMicrosoft Technet France
 

La actualidad más candente (20)

containers2016
containers2016containers2016
containers2016
 
Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
 
A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Accelerating your research with Microsoft Azure
Accelerating your research with Microsoft AzureAccelerating your research with Microsoft Azure
Accelerating your research with Microsoft Azure
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
 
Eyeo 2019-Lightning-Cytoscape
Eyeo 2019-Lightning-CytoscapeEyeo 2019-Lightning-Cytoscape
Eyeo 2019-Lightning-Cytoscape
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI)
Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI)Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI)
Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI)
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataAzure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big data
 

Similar a Bionimbus - Northwestern CGI Workshop 4-21-2011

Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Robert Grossman
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Xing Xu
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...i_scienceEU
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The BoxIan Foster
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated GenomicsIdan Tohami
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / PhoenixAllen Day, PhD
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer OverlordsIan Foster
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011Ian Foster
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataXing Xu
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloudstratuslab
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
 

Similar a Bionimbus - Northwestern CGI Workshop 4-21-2011 (20)

Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The Box
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated Genomics
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
grid mining
grid mininggrid mining
grid mining
 
Grid computing
Grid computingGrid computing
Grid computing
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
TerraEchos Kairos on IBM PowerLinux servers
TerraEchos Kairos on IBM PowerLinux serversTerraEchos Kairos on IBM PowerLinux servers
TerraEchos Kairos on IBM PowerLinux servers
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 

Más de Robert Grossman

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchRobert Grossman
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016Robert Grossman
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...Robert Grossman
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Robert Grossman
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsRobert Grossman
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkRobert Grossman
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 

Más de Robert Grossman (20)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 

Último

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Bionimbus - Northwestern CGI Workshop 4-21-2011

  • 1. Bionimbus: A Cloud-Based Infrastructure for Managing, Analyzing and Sharing Genomics Data April 21, 2011 Robert Grossman Institute for Genomics & Systems Biology (IGSB) Computation InstituteUniversity of Chicago and Open Cloud Consortium
  • 3. Growth of Genomic Data Sequence everything AWS Hadoop GFS Sequence environment 2006 2008 2003 Sequence species ENCODE HGP 2003 2001 1977 1995 2005 Sanger Sequencing Microarray technology 454, Solexa sequencing 10^10 Genbank 10^5 10^8
  • 5. The Challenge is to Support Cubes of High Throughput Sequence Data Each cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq, movie, etc. data set. Different developmental stages Different pathologies Perturb the environment
  • 6. We Have a Problem … vs More and more of your colleagues produce so much data that they cannot easily manage, move, analyze and share it. Centers and large projects build their own infrastructure. Every else is on their own.
  • 7. Part 1. Using Bionimbus www.bionimbus.org
  • 8. Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data. 8
  • 9. Enabling a broad community to utilize genome research User 1. 3. 2. 9 Bionimbus Cloud Sequencing Partner or Center
  • 10. Step 1. Prepare a Sample
  • 11. Step 2. Login to Bionimbus and get a Bionimbus Key.
  • 12. Step 3. Fedex your sample to CGI.
  • 13. Step 4. Login on to Bionimbus and view your data
  • 14. Step 5. Use Bionimbus to perform standard and custom pipelines. Using the ability of Bionimbus to launch multiple virtual machines reduced this analysis from 25 days to 1 day.
  • 15. Step 2. Send sample tobe sequenced. Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc. InternalSequencers BID Generator CGI Step 5. Cloud based analysis using IGSB and 3rd party tools and applications. Step 3a. Return rawreads. Step 3b. Returnvariant calls, CNV, annotation… Bionimbus Private Cloud UC Bionimbus Community Cloud Step 4. Secure datarouting to appropriatecloud based upon BID. Bionimbus Private Cloud XY Amazon dbGaP
  • 16. Part 2. Introduction to Clouds
  • 17. Clouds provide on-demand computing and storage resources at the scale and with the reliability of a data center. Computer scientists were caught by surprise. 17
  • 18. What is a Cloud? 18 Software as a Service (SaaS)
  • 19. What Else a Cloud? 19 Infrastructure as a Service (IaaS) Users get one or more virtual machines “on demand”
  • 20. Are There Other Types of Clouds? 20 ad targeting Hadoop was developed for processing Internet scale data for ad targeting and related applications but is now used for processing genomics data and may other applications.
  • 21. What is a new about clouds? 21
  • 22. 22 Scale is New
  • 23. Elastic, On-Demand Computing with Usage Based Pricing Is New 23 costs the same as 1 computer in a rack for 120 hours 120 computers in three racks for 1 hour Data center scale computing often leverages virtualization technologies.
  • 24. Part 3. Some BionimbusCases
  • 25. Case Study: Public Datasets in Bionimbus
  • 26.
  • 27. Case Study: ModENCODE Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments). BionimbusVMs were used for some of the integrative analysis. Bionimbus is used as a backup for the modENCODE DCC
  • 28.
  • 29. CBP
  • 30. PolII
  • 32. HDACs
  • 34. TFsPredictions 537 silencers 2,307 new promoters 12,285 enhancers 14,145 insulators www.modencode.org www.cistrack.org Negre et al. Nature 2011
  • 35. Case Study: IGSB All samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.
  • 37. Part 4 31 Data Centers for Science
  • 38. 2004 10x-100x 1976 10x-100x data science 1670 250x simulation science 1609 30x experimental science
  • 39. Open Science Data Cloud Astronomical data Biological data (Bionimbus) NSF-PIRE OSDC Data Challenge Earth science data (& disaster relief)
  • 40. The goal is to build a data center in Chicago for biological, scientific, medical and health care data in 4 to 5 years.
  • 41. Part 5. More About Bionimbus
  • 42. GWT-based Front End Elastic Cloud Services Database Services Analysis Pipelines & Re-analysis Services Intercloud Services Large Data Cloud Services Data Ingestion Services
  • 43. (Eucalyptus, OpenStack) GWT-based Front End Elastic Cloud Services (PostgreSQL) Database Services Analysis Pipelines & Re-analysis Services Intercloud Services (IDs, etc.) Large Data Cloud Services (UDT, replication) Data Ingestion Services (Hadoop, Sector/Sphere)
  • 44. Bionimbus Deployment Options Bionimbus Community Cloudwww.bionimbus.org BionimbusAMIs & Amazon hosted applications Bionimbus Private Clouds
  • 45. A successful cloud will… 3. High performance ingestion and transport of data. 2. Provide Compute services at the scale of a data center. 1. Provide long term persistent storage services at the scale of a data center.
  • 46. A successful cloud will… 6. Peer with private genomics clouds. 5. Peer with public clouds. 4. Support the liberation of data.
  • 47. Bionimbus satisfies each of these six requirements.
  • 48. Bionimbus Road Map Over the next 3 to 4 months, we will: Launch Bionimbus (we are in a pre-launch) Add Galaxy-based workflow to Bionimbus Add secure routing of genomes Add more public datasets Add more pipelines