Axa Assurance Maroc - Insurer Innovation Award 2024
Bionimbus Cambridge Workshop (3-28-11, v7)
1. Bionimbus: A Cloud-Based Infrastructure for Managing, Analyzing and Sharing Genomics Data March 29, 2011 Robert Grossman Institute for Genomics & Systems Biology Computation InstituteUniversity of Chicago and Open Cloud Consortium
2. Part 1Biology, Big Data & Clouds 2 Two of the 14 high throughput sequencers at the Ontario Institute for Cancer Research (OICR).
4. The Challenge is to Support Cubes of Next Gen Sequence Data Each cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq, movie, etc. data set. Different developmental stages Different pathologies Perturb the environment
8. Elastic, On-Demand Computing with Usage Based Pricing Is New 8 costs the same as 1 computer in a rack for 120 hours 120 computers in three racks for 1 hour
10. Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data.
11. Step 2. Send sample tobe sequenced. Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc. IGSBSequencers BID Generator External Sequencers Step 5. Cloud based analysis using IGSB and 3rd party tools and applications. Step 3a. Return rawreads. Step 3b. Returnvariant calls, CNV, annotation… Bionimbus Private Cloud UC Bionimbus Community Cloud Step 4. Secure datarouting to appropriatecloud based upon BID. Bionimbus Private Cloud XY Amazon dbGaP
12. What is a good unit to understand data intensive computing of biological data?
13. Bionimbus & OSDC Today The NIH in the U.S. currently makes available for download approximately 2PB of data. Bionimbus 2010 consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage. Bionimbus is part of the POC Open Science Data Cloud that consists of 14 racks, 472 nodes, 3776 cores and 3+ PB of storage.
14. GWT-based Front End Elastic Cloud Services Database Services Analysis Pipelines & Re-analysis Services Intercloud Services Large Data Cloud Services Data Ingestion Services
18. Case Study: ModENCODE Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments). BionimbusVMs were used for some of the integrative analysis. Bionimbus is used as a backup for the modENCODE DCC
19. Case Study: IGSB All samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.
27. OCC Members Companies: Cisco, Citrix, Yahoo!, … Universities: University of Chicago, Calit2, Johns Hopkins, Northwestern Univ., ORNL, University of Illinois at Chicago, … Federal agencies: NASA Other: National Lambda Rail Adding international partnersin 2011. 24
28. Infrastructure 2010 Proof-of-Concept Infrastructure 450+ nodes 3000+ cores 3+ PB Four data centers (two more to come in 2011) Data centers have 10G network connections (some 100G links in 2011) Plan to add approximately 1 PB of data in 2011. With current funding, we will refresh 1/3 of the infrastructure in 2011 and 2012.
29. Towards a Long Term, Sustainable Model Cap Exp about $1M/year Op Exp about $1M/year Moore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.
30. Variety of analysis Scientist with laptop Wide Open Science Data Cloud Med Sequencing centers, LHC, LSST Low Data Size Medium to Large Small Very Large Dedicated infrastructure No infrastructure General infrastructure
31. Persistent data Large data clouds Med databases HPC Small Cycles Large & spec. clusters Small to medium clusters Single workstations
32. Bionimbus Team* David Hanley, Nicolas Negre, Elizabeth Bartom, Nicholas Bild, Christopher D. Brown, Marc Domanus, , Robert L Grossman, A. Jason Grundstad, Xiangjun Liu, Michal Sabala, Parantu K Shah, Kevin P White Institute for Genomics & Systems BiologyUniversity of Chicago Jia Chen, YunhongGu and Damian Roqueiro University of Illinois at Chicago Lincoln Stein and ZhengZha Ontario Institute for Cancer Research *In alphabetical order