SlideShare una empresa de Scribd logo
1 de 46
Descargar para leer sin conexión
The Pulse of Cloud Computing
with Bioinformatics as an example
Nuwan Goonasekera†
, Enis Afgan*
†
University of Melbourne, Melbourne Bioinformatics, Australia
* Johns Hopkins University, Taylor Lab, USA
@ University of Colombo
Feb 2017
The answer to everything?
Overview
• The key characteristics of Cloud Computing
• Using Cloud Computing for bioinformatics
Source: http://dilbert.com/strips/comic/2012-05-25/
A modern data-center
Source: http://www.businessinsider.com/google-data-centers-2014-10?op=1
Data center use before cloud computing
source: http://www.rackspace.com/knowledge_center/whitepaper/revolution-not-evolution-how-cloud-computing-differs-from-traditional-it-and-why-it
Cloud Computing: A Definition
• NIST definition: “Cloud computing is a model for enabling
ubiquitous, convenient, on-demand network access to a
shared pool of configurable computing resources (e.g.,
networks, servers, storage, applications, and services) that
can be rapidly provisioned and released with minimal
management effort or service provider interaction.”
» National Institute of Standards and Technology
(http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf)
The Cloud Model
Private Community Public Hybrid
Deployment
Models
Delivery
Models
Essential
Characteristics
Software as a Service
(SaaS)
Platform as a Service
(PaaS)
Infrastructure as a
Service (IaaS)
• On-demand self-service
• Broad network access
• Resource pooling
• Rapid elasticity
• Measured service
Delivery Models
source: http://www.businessinsider.com.au/10-most-important-in-cloud-computing-2013-4?op=1#a-word-about-clouds-1
Infrastructure-as-a-Service (IaaS)
• Amazon Web Services (Market leader)
• Rackspace Cloud
• NeCTAR/OpenStack Research Cloud
• Joyent Cloud
• GoGrid
• FlexiScale
Public PaaS Examples
Cloud Name Language and
Developer Tools
Programming
Models Supported
by Provider
Target Applications
and Storage Options
Google App Engine Python, Java, Go,
PHP + JVM languages
(scala, groovy, jruby)
MapReduce, Web,
DataStore, Storage
and other APIs
Web applications and
BigTable storage
Salesforce.com’s
Force.com
Apex, Eclipsed-based
IDE, web-based
wizard
Workflow, excel-like
formula, web
programming
Business applications
such as CRM
Microsoft Azure .NET, Visual Studio,
Azure tools
Unrestricted model Enterprise and web
apps
Amazon Elastic
MapReduce
Hive, Pig, Java, Ruby
etc.
MapReduce Data processing and
e-commerce
Aneka .NET, stand-alone
SDK
Threads, task,
MapReduce
.NET enterprise
applications, HPC
Public SaaS examples
• Gmail
• Sharepoint
• Salesforce.com CRM
• On-live
• Gaikai
• Microsoft Office 365
• Some definitions include those that do not require payment.
E.g. ad-supported sites
Things we find most interesting
• Accessibility
• Infrastructure as code
• Elasticity
• Programming models that fit the cloud
Accessibility
● Global availability via public clouds
● On-demand self-service
● A platform for democratisation of computing
● Access is enabled via point-and-click interfaces (blends with the Internet)
Infrastructure as Code
• Programmable
• Captures knowledge
• DevOps
Elasticity
• Rapidly expand and shrink based on demand
• “Infinite” scaling
• Cost-driven architecture
• Ties in with infrastructure-as-code
Programming models that fit the cloud
• Fault-tolerant models
• Massively scalable
• Distributed algorithms
Cloud computing is a valuable resource -
but what do we use it for?
Bioinformatics
A multi-disciplinary science using computers for acquiring, managing and
analyzing biological data.
It is a data-driven science.
It is a tool for genomics research.
Biology Medicine
Math &
Physics
Computer
Science
Bioinformatics
Genomics
Oxford dictionaries
“The branch of molecular biology concerned with the
structure, function, evolution, and mapping of genomes.”
Where are the genes and other interesting pieces?
How do sequences change over evolutionary time?
What does all the DNA do?
What are the physical shapes of the genome and its products?
Genomics: contrast with biology and genetics
Biology and genetics
Targeted studies of one
or a few genes
Targeted,
low-throughput
experiments
Clever experimental design,
painstaking experimentation
Genomics
Studies considering all
genes in a genome
Global,
high-throughput
experiments
Tons of data,
uncertainty, computation
scope
technology
hard part
* Everything on this slide is
a generalization
Where is genomics used?
Basic science
● What is the DNA sequence of the genome?
● Where are the genes?
● What does all the DNA in the genome do?
● How did history shape our ethnicities and populations?
Medicine
● What’s the difference between DNA in a tumor vs DNA in healthy tissue?
● Can genomic data help predict what drugs might be appropriate for:
○ a particular cancer patient?
○ a particular genetic disorder?
● Can genomic data help us predict what flu strains will prevail next year?
Genome
Oxford dictionaries
“The complete set of genes or genetic material
present in a cell or organism.”
“Blueprint” or “recipe” of life.
Self-copying store of read-only information about
how to develop and maintain an organism.
Where do genomes live?
All the trillions of cells in a person have
same genomic DNA in the nucleus.
Picture source:
https://publications.nigms.nih.gov/insidethecell/preface.html
Genome
How do we obtain genome data? Sequencing!
First methods developed in the mid-1970’s, called Sanger sequencing.
In the 1990’s, the international Human Genome Project took 13 years to sequence
the human genome.
In the 2000’s, massively parallel Next Generation Sequencers (NGS) were
developed that took days to sequence a human genome at a much lesser cost.
Today, nanopore sequencers are emerging, offering real time sequencing.
There are many public data repositories with
free access to data (e.g., TCGA, 1000 genomes,
GenBank).
Two unrelated humans have genomes that are ~99.8% similar by sequence.
There are about 3-4 million differences. Most are small, e.g. Single Nucleotide
Polymorphisms (SNPs).
Human and chimpanzee
genomes are about 96%
similar.
Genome variation
Apply data transformations to extract useful information
This is not always a well-defined process
This is typically done with existing tools, or by developing one’s own
Tools can be chained into workflows
Making sense of the data through data manipulations
What does all of this have to do with
Cloud Computing?
omicsmaps.com
World’s clouds
bit.ly/worldclouds
Results
External reference
data
Raw
data
Data analysis
100-1000's GB
few GB
Typical genomics flow
Results
Raw
data
Some computers + reliable persistent data storage +
bioinf tools + reference data + workflow system
100-1000's GB
few GB
Indexed
genomes
10-100's GB
Aug
Sep
Oct
Nov
...
A real-world infrastructure requirements
A Data analysis and integration tool
A (free for everyone) web service integrating a
wealth of tools, compute resources, terabytes of
reference data and permanent storage
Open source software that makes integrating your
own tools and data and customizing for your own
site simple
Galaxy: accessible analysis system
Three ways to use Galaxy
1. Download and run locally
2. Public website (http://usegalaxy.org)
3. Run on the Cloud
Bringing cloud resources to genomics
Cloud resources need to be provisioned and configured for use in genomics.
A Cloud Manager that orchestrates all of the steps required to provision, manage,
and share a compute platform on a cloud infrastructure, all through a web
browser.
Accessibility
Get started at https://launch.usegalaxy.org/
Elasticity
Manage it programmatically
Create a new CloudMan compute cluster
Manage an existing CloudMan instance
How is it all achieved?
Architectural stack
CloudLaunch.usegalaxy.org
C L O U D A P P S
CloudBridge
CloudMan
cloudbridge.readthedocs.org
github.com/gvlproject/cloudbridge
beta.launch.usegalaxy.org
github.com/galaxyproject/cloudlaunch-ui
github.com/galaxyproject/cloudlaunch
wiki.galaxyproject.org/CloudMan
github.com/galaxyproject/cloudman
Impact?
http://www.citeulike.org/group/16008/tag/usecloud
Acknowledgments
Everything talked about here is an effort from a large community!
Come talk to us; get involved.
enis.afgan@jhu.edu or nuwan.goonasekera@unimelb.edu.au

Más contenido relacionado

La actualidad más candente

A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
balmanme
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
Ian Foster
 

La actualidad más candente (20)

Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSBioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWS
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
 
Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURECYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 

Destacado

Pasos para crear un blog
Pasos para crear un blogPasos para crear un blog
Pasos para crear un blog
angiedaiana
 
CRM Via SMS
CRM Via  SMSCRM Via  SMS
CRM Via SMS
MABSIV
 

Destacado (20)

Resource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloudResource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloud
 
Scaling Data Science: Engineering a Platform
Scaling Data Science: Engineering a PlatformScaling Data Science: Engineering a Platform
Scaling Data Science: Engineering a Platform
 
Pasos para crear un blog
Pasos para crear un blogPasos para crear un blog
Pasos para crear un blog
 
From Analysis to Action- Communicating Data Science Insights
From Analysis to Action- Communicating Data Science InsightsFrom Analysis to Action- Communicating Data Science Insights
From Analysis to Action- Communicating Data Science Insights
 
06. la 1ª GERRA MUNDIAL y la revolución rusa
06. la 1ª GERRA MUNDIAL y la revolución rusa06. la 1ª GERRA MUNDIAL y la revolución rusa
06. la 1ª GERRA MUNDIAL y la revolución rusa
 
Tasarım kuralları
Tasarım kurallarıTasarım kuralları
Tasarım kuralları
 
CRM Via SMS
CRM Via  SMSCRM Via  SMS
CRM Via SMS
 
Mantıksal programlama
Mantıksal programlama Mantıksal programlama
Mantıksal programlama
 
Emirates- A marketing excellence case study
Emirates- A marketing excellence case studyEmirates- A marketing excellence case study
Emirates- A marketing excellence case study
 
Strategic Technical Presenation
Strategic Technical PresenationStrategic Technical Presenation
Strategic Technical Presenation
 
Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big Data
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
 
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and Genomics
 
Effective ansible
Effective ansibleEffective ansible
Effective ansible
 
HL7: Clinical Decision Support
HL7: Clinical Decision SupportHL7: Clinical Decision Support
HL7: Clinical Decision Support
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
 
Jsm madduri-august-2015
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Public.Cdsc.Middleton
Public.Cdsc.MiddletonPublic.Cdsc.Middleton
Public.Cdsc.Middleton
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 

Similar a The pulse of cloud computing with bioinformatics as an example

Similar a The pulse of cloud computing with bioinformatics as an example (20)

2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Big Data
Big Data Big Data
Big Data
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Kerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensorsKerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensors
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Climb bath
Climb bathClimb bath
Climb bath
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
 
An Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourceAn Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data Resource
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centre
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
 
SFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free software
 
Final Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational ResearchFinal Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational Research
 

Más de Enis Afgan

GCC 2014 scriptable workshop
GCC 2014 scriptable workshopGCC 2014 scriptable workshop
GCC 2014 scriptable workshop
Enis Afgan
 
Galaxy workshop
Galaxy workshopGalaxy workshop
Galaxy workshop
Enis Afgan
 
CloudMan workshop
CloudMan workshopCloudMan workshop
CloudMan workshop
Enis Afgan
 

Más de Enis Afgan (15)

Federated Galaxy: Biomedical Computing at the Frontier
Federated Galaxy: Biomedical Computing at the FrontierFederated Galaxy: Biomedical Computing at the Frontier
Federated Galaxy: Biomedical Computing at the Frontier
 
From laptop to super-computer: standardizing installation and management of G...
From laptop to super-computer: standardizing installation and management of G...From laptop to super-computer: standardizing installation and management of G...
From laptop to super-computer: standardizing installation and management of G...
 
Horizontal scaling with Galaxy
Horizontal scaling with GalaxyHorizontal scaling with Galaxy
Horizontal scaling with Galaxy
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
 
2016 07 - CloudBridge Python library (XSEDE16)
2016 07 - CloudBridge Python library (XSEDE16)2016 07 - CloudBridge Python library (XSEDE16)
2016 07 - CloudBridge Python library (XSEDE16)
 
2017.07.19 Galaxy & Jetstream cloud
2017.07.19 Galaxy & Jetstream cloud2017.07.19 Galaxy & Jetstream cloud
2017.07.19 Galaxy & Jetstream cloud
 
Galaxy CloudMan performance on AWS
Galaxy CloudMan performance on AWSGalaxy CloudMan performance on AWS
Galaxy CloudMan performance on AWS
 
Adding Transparency and Automation into the Galaxy Tool Installation Process
Adding Transparency and Automation into the Galaxy Tool Installation ProcessAdding Transparency and Automation into the Galaxy Tool Installation Process
Adding Transparency and Automation into the Galaxy Tool Installation Process
 
Enabling Cloud Bursting for Life Sciences within Galaxy
Enabling Cloud Bursting for Life Sciences within GalaxyEnabling Cloud Bursting for Life Sciences within Galaxy
Enabling Cloud Bursting for Life Sciences within Galaxy
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
IRB Galaxy CloudMan radionica
IRB Galaxy CloudMan radionicaIRB Galaxy CloudMan radionica
IRB Galaxy CloudMan radionica
 
GCC 2014 scriptable workshop
GCC 2014 scriptable workshopGCC 2014 scriptable workshop
GCC 2014 scriptable workshop
 
Data analysis with Galaxy on the Cloud
Data analysis with Galaxy on the CloudData analysis with Galaxy on the Cloud
Data analysis with Galaxy on the Cloud
 
Galaxy workshop
Galaxy workshopGalaxy workshop
Galaxy workshop
 
CloudMan workshop
CloudMan workshopCloudMan workshop
CloudMan workshop
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

The pulse of cloud computing with bioinformatics as an example

  • 1. The Pulse of Cloud Computing with Bioinformatics as an example Nuwan Goonasekera† , Enis Afgan* † University of Melbourne, Melbourne Bioinformatics, Australia * Johns Hopkins University, Taylor Lab, USA @ University of Colombo Feb 2017
  • 2. The answer to everything?
  • 3. Overview • The key characteristics of Cloud Computing • Using Cloud Computing for bioinformatics Source: http://dilbert.com/strips/comic/2012-05-25/
  • 4. A modern data-center Source: http://www.businessinsider.com/google-data-centers-2014-10?op=1
  • 5. Data center use before cloud computing source: http://www.rackspace.com/knowledge_center/whitepaper/revolution-not-evolution-how-cloud-computing-differs-from-traditional-it-and-why-it
  • 6. Cloud Computing: A Definition • NIST definition: “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” » National Institute of Standards and Technology (http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf)
  • 7. The Cloud Model Private Community Public Hybrid Deployment Models Delivery Models Essential Characteristics Software as a Service (SaaS) Platform as a Service (PaaS) Infrastructure as a Service (IaaS) • On-demand self-service • Broad network access • Resource pooling • Rapid elasticity • Measured service
  • 9. Infrastructure-as-a-Service (IaaS) • Amazon Web Services (Market leader) • Rackspace Cloud • NeCTAR/OpenStack Research Cloud • Joyent Cloud • GoGrid • FlexiScale
  • 10. Public PaaS Examples Cloud Name Language and Developer Tools Programming Models Supported by Provider Target Applications and Storage Options Google App Engine Python, Java, Go, PHP + JVM languages (scala, groovy, jruby) MapReduce, Web, DataStore, Storage and other APIs Web applications and BigTable storage Salesforce.com’s Force.com Apex, Eclipsed-based IDE, web-based wizard Workflow, excel-like formula, web programming Business applications such as CRM Microsoft Azure .NET, Visual Studio, Azure tools Unrestricted model Enterprise and web apps Amazon Elastic MapReduce Hive, Pig, Java, Ruby etc. MapReduce Data processing and e-commerce Aneka .NET, stand-alone SDK Threads, task, MapReduce .NET enterprise applications, HPC
  • 11. Public SaaS examples • Gmail • Sharepoint • Salesforce.com CRM • On-live • Gaikai • Microsoft Office 365 • Some definitions include those that do not require payment. E.g. ad-supported sites
  • 12. Things we find most interesting • Accessibility • Infrastructure as code • Elasticity • Programming models that fit the cloud
  • 13. Accessibility ● Global availability via public clouds ● On-demand self-service ● A platform for democratisation of computing ● Access is enabled via point-and-click interfaces (blends with the Internet)
  • 14. Infrastructure as Code • Programmable • Captures knowledge • DevOps
  • 15. Elasticity • Rapidly expand and shrink based on demand • “Infinite” scaling • Cost-driven architecture • Ties in with infrastructure-as-code
  • 16. Programming models that fit the cloud • Fault-tolerant models • Massively scalable • Distributed algorithms
  • 17. Cloud computing is a valuable resource - but what do we use it for?
  • 18. Bioinformatics A multi-disciplinary science using computers for acquiring, managing and analyzing biological data. It is a data-driven science. It is a tool for genomics research. Biology Medicine Math & Physics Computer Science Bioinformatics
  • 19. Genomics Oxford dictionaries “The branch of molecular biology concerned with the structure, function, evolution, and mapping of genomes.” Where are the genes and other interesting pieces? How do sequences change over evolutionary time? What does all the DNA do? What are the physical shapes of the genome and its products?
  • 20. Genomics: contrast with biology and genetics Biology and genetics Targeted studies of one or a few genes Targeted, low-throughput experiments Clever experimental design, painstaking experimentation Genomics Studies considering all genes in a genome Global, high-throughput experiments Tons of data, uncertainty, computation scope technology hard part * Everything on this slide is a generalization
  • 21. Where is genomics used? Basic science ● What is the DNA sequence of the genome? ● Where are the genes? ● What does all the DNA in the genome do? ● How did history shape our ethnicities and populations? Medicine ● What’s the difference between DNA in a tumor vs DNA in healthy tissue? ● Can genomic data help predict what drugs might be appropriate for: ○ a particular cancer patient? ○ a particular genetic disorder? ● Can genomic data help us predict what flu strains will prevail next year?
  • 22. Genome Oxford dictionaries “The complete set of genes or genetic material present in a cell or organism.” “Blueprint” or “recipe” of life. Self-copying store of read-only information about how to develop and maintain an organism.
  • 23. Where do genomes live? All the trillions of cells in a person have same genomic DNA in the nucleus. Picture source: https://publications.nigms.nih.gov/insidethecell/preface.html Genome
  • 24. How do we obtain genome data? Sequencing! First methods developed in the mid-1970’s, called Sanger sequencing. In the 1990’s, the international Human Genome Project took 13 years to sequence the human genome. In the 2000’s, massively parallel Next Generation Sequencers (NGS) were developed that took days to sequence a human genome at a much lesser cost. Today, nanopore sequencers are emerging, offering real time sequencing. There are many public data repositories with free access to data (e.g., TCGA, 1000 genomes, GenBank).
  • 25. Two unrelated humans have genomes that are ~99.8% similar by sequence. There are about 3-4 million differences. Most are small, e.g. Single Nucleotide Polymorphisms (SNPs). Human and chimpanzee genomes are about 96% similar. Genome variation
  • 26. Apply data transformations to extract useful information This is not always a well-defined process This is typically done with existing tools, or by developing one’s own Tools can be chained into workflows Making sense of the data through data manipulations
  • 27. What does all of this have to do with Cloud Computing?
  • 28.
  • 30.
  • 33. Results Raw data Some computers + reliable persistent data storage + bioinf tools + reference data + workflow system 100-1000's GB few GB Indexed genomes 10-100's GB Aug Sep Oct Nov ... A real-world infrastructure requirements
  • 34. A Data analysis and integration tool A (free for everyone) web service integrating a wealth of tools, compute resources, terabytes of reference data and permanent storage Open source software that makes integrating your own tools and data and customizing for your own site simple
  • 36. Three ways to use Galaxy 1. Download and run locally 2. Public website (http://usegalaxy.org) 3. Run on the Cloud
  • 37. Bringing cloud resources to genomics Cloud resources need to be provisioned and configured for use in genomics. A Cloud Manager that orchestrates all of the steps required to provision, manage, and share a compute platform on a cloud infrastructure, all through a web browser.
  • 38.
  • 39. Accessibility Get started at https://launch.usegalaxy.org/
  • 41. Manage it programmatically Create a new CloudMan compute cluster Manage an existing CloudMan instance
  • 42. How is it all achieved?
  • 43. Architectural stack CloudLaunch.usegalaxy.org C L O U D A P P S CloudBridge CloudMan cloudbridge.readthedocs.org github.com/gvlproject/cloudbridge beta.launch.usegalaxy.org github.com/galaxyproject/cloudlaunch-ui github.com/galaxyproject/cloudlaunch wiki.galaxyproject.org/CloudMan github.com/galaxyproject/cloudman
  • 46. Everything talked about here is an effort from a large community! Come talk to us; get involved. enis.afgan@jhu.edu or nuwan.goonasekera@unimelb.edu.au