SlideShare una empresa de Scribd logo
1 de 26
The case for Docker in multi-
cloud enabled
bioinformatics applications
Ahmed Ali, Mohamed M. ElKalioby, Mohamed Abouelhoda
Nile University, Egypt
Presented By
Mohamed M. El-Kalioby, MSc
1
Introduction
● Next generation sequencing technology has changed the
traditional bioinformatics practice
● Sophisticated multi-step workflows used to transform the raw
sequence data into knowledge.
● One NGS workflow can include tens of tasks and hundreds of
information sources integrated together to achieve the analysis
goals.
● Medical Variant Detection Workflow is an example of such
workflows.
2
Medical Variant Detection Workflow
(MVDW)
3
Medical Variant Detection Workflow (2)
● Multiple Versions and Instances of the workflow needed
● Tools and parameters can be changed
● per user, where each one may require certain modules, annotation
databases, and special post-processing;
● per experiment type, e.g., whole genome, whole exome, or RNAseq
in a single or multiplexed mode
● per sequencing platforms, illumina, IonTorrent, or any other one.
4
Requirements5
● Efficient Dynamic Deployment Strategy
● The deployed system should use HPC resources
● Able to consume cloud computing resources (private and public
clouds)
Virtualization Technology
● the whole system with all modules, databases and the
related dependencies are packaged in a virtual machine
(VM) image.
● These images can be then used to instantiate a virtual
machine running in private or public cloud.
● Examples from sequence analysis
● Crossbow for NGS read alignment & SNP calling,
● RSD-Cloud for comparative genomics
● … many more
6
Virtual Technology (2)
● The traditional engine for running the virtual machine
instances is based either on
● Oracle Virtual Box,
● KVM,
● Xen Hypervisor
● VMware
7
Docker8
● Docker provides a new level of virtualization
● the computing machine (including the operating system) is
not virtualized,
● Only the application and the related dependencies are
encapsulated in a ’virtual’ isolated process
INFRASTRUCTURE
Operating System
Virtual Machine Hypervisor
VM1 VM2 … VMn
APP1 APP2 …. APPn
INFRASTRUCTURE
Operating System
Container Container … Container
APPnAPP1 APP2 …
Container
Engine
Software Stack with Virtual Machines Software Stack with Containers
(a) (b)
Usage of Docker
9
Dockerclient
DockerServer
(Daemon)
Pull Image
Download/upload
Images
Build Image
Run Container
Build/Push container
images to local registry
Terminate Container
Docker
public
registry
Local registry
Infrastructure
Operating System
container container
Run containers
Why Docker10
● Reduced execution overhead compared to traditional whole
machine virtualization
● Provides an effective solution to the image portability
problem.
● Virtual machine images running in Amazon are not compatible
with those running in Google and vice versa which directly lead
to duplication of work to prepare new images with each
deployment.
Challenges
● Extra layers need to be built on top of Docker to enable the use of HPC resources
(computer cluster) and multi-cloud platforms
● Deployment in different commercial clouds is not an easy task.
● Each cloud platforms has different APIs and different business models.
● Images are compatible with different providers
11
Contribution
● Define use case scenario for using Docker within a computer cluster for
bioinformatics workflows.
● Evaluate its performance in comparison to the use of native hardware and usual
virtual machines, in private and public cloud.
● We also present a new version of our multicloud elasticHPC, referred to as
elasticHPC-Docker
1. enable the user deploy and run multi-step whole analysis workflows,
2. create computer cluster with Docker based applications and define a use case scenario
for that
3. support the use of private clouds as well as commercial clouds like Amazon and Google.
12
Containers in the Cloud13
Google
● Google Cloud offers a container service in the form of two products
1. container-optimized virtual machine images, which includes programs to run standard Docker
images, according to a user defined file in YAML format.
2. Google Kubernetes Engine (GKE) to create a cluster of virtual machines that can run Docker
images. GKE is based on pods,
● Google has established Google container registry (GCR).
● Cost:
● The optimized container images and GKE run at no extra cost. pays usual price of virtual
machines.
● GKE charges an extra fee of $0.15 per hour per cluster on top of the usual machine price (for
cluster size > 5 nodes).
● GKE has two limitations:
1. It does not support Docker’s private images.
2. The cluster size in GKE cannot exceed 100 nodes.
14
Amazon
● Amazon provides Elastic Container Service (ECS).
● ECS enables the deployment of Docker containers on Amazon EC2.
● Amazon uses docker-compose to manage docker containers.
● Docker-compose facilitates the process of setting up a multi-container application
by defining the application and all its dependencies in a single file using YAML
format.
● The instantiated machines include programs to automatically configure the
Docker environment.
● Amazon has its own images registry.
● Cost:
● the user pays for same as that of the usual instance types.
● If the load balancing service is selected, the user pays an extra small cost of $0.025 per
hour and $0.008 per GB transferred between instances
● Limitations:
● It does not support attaching EBS volumes to the running containers.
15
ElasticHPC-Docker
Features
● Ability to port and run any docker image to either private or commercial clouds.
● Creation and management of a cluster of containers. The cluster can use single or
multiple machines.
● The computer cluster can have nodes from different cloud providers; i.e. some
nodes can come from Amazon and some can come from Google.
● Ability to create and destroy containers in the run-time. This makes it possible to
run multiple containers on the same machine, one at a time.
● The package supports scaling up/down of virtual machines (worker nodes) in a
running clusters.
16
ElasticHPC-Docker
Features (2)
17
● The package allows mounting of virtual disks and establishment of a
shared file system to the containers (Default option is the NFS). In AWS, we
use EBS volumes and in Google we use persistent storage disks.
● elasticHPC-Docker automatically configures a job scheduler (including
security settings among the different providers) among the containers. The
default job schedule is PBS Torque, but SGE is also supported.
● The current package includes many Docker specification files (DockerFile)
for the most important tools for NGS data analysis. These include Fastx,
BWA, GATK .
● It includes a number of structural bioinformatics tools, including AutoDock,
Frodock, and AMBER GROMACS,, among others;.
EHPC-Docker (Use Case)18
EHPC-Client
EHPC-VM
Manager
Port 5000
Communication
with VM Manager
Port 5555
Ports1:4999,
5001:65535
Container
Communication with
Container service
Master Node
Communication
Among conainer
Service
Communication
Among Containerized
Services
Attached
Data
Volume
Shared File System
(Block Storage)
Running on
Users PC
EHPC-VM
Manager
Port 5000
Port 5555
Ports1:4999,
5001:65535
Container
Slave Node Worker Node
Attached
Data
Volume
EHPC-VM
Manager
Port 5000
Port 5555
Ports1:4999,
5001:65535
Container
Slave Node Worker Node
Attached
Data
Volume
EHPC-VM
Manager
Port 5000
Port 5555
Ports1:4999,
5001:65535
Container
Slave Node Worker Node
Attached
Data
Volume
1. User downloads the EHPC-Docker client2. User runs the client to create a cluster on a supported clouda. The client starts Master nodeb. Master node creates the rest of the cluster in parallelc. Master node distributes the URL of the image registryd. Master and worker nodes retrieve the image and start the containers.
e. Once done, the master node sets up the ports and finalizes the configuration of in
terms of setting up the job scheduler and the shared storage.Cluster is ready
Experiments
● We conducted two experiments:
1. Measure the time for establishing container clusters over different cloud platforms.
2. Measure the performance of using Docker when running the variant detection workflow.
19
Experiment 120
1. GKE is faster than ECS
2. elasticHPC is faster than GKE
3. elasticHPC is close to ECS
Experiment 2
● For this experiment, we used an exome dataset from DePristo et al. of size ~ 9 GB.
● The exome is a set of NGS reads sequenced only from the whole coding regions of a
genome.)
● The workflow was executed three times independently on Google, AWS, and private
cloud based on OpenStack.
● In each cloud, the 9 GB input data is divided into blocks to be processed in parallel
over the cluster nodes.
● For fair comparison, we used machines of as similar specifications as possible.
● Amazon: m3.2xlarge (8 C, Intel 2.5 GHz, 30 GB RAM, SSD disks, $0.532/hour),
● Google: n1-highmem-8(8 C, Intel 2.5 GHz, 52 GB RAM, SSD disks,$0.504/hour)
● OpenStack: we used local machine with 8 Cores, 56 GB RAM.
21
Experiment 2
Physical Servers
22
Docker is too close to physical
Experiment 2
Google Cloud
23
ElasticHPC is faster than
GCE Containers
Experiment 2
Amazon Cloud
24
ElasticHPC is very close to Amazon ECS
Conclusion
● We introduced elasticHPC-Docker based on container technology.
● Our package enables the creation of a computer cluster with containerized
applications and workflows in private and in different commercial clouds using
single interface.
● It includes options to run bioinformatics applications and workflows for large
datasets
● Through the container technology, elasticHPC-Docker provides an efficient
solution to the inter-operability among commercial clouds,
● It is efficient in practice with reduced overhead especially on local infrastructures.
● It is available on http://www.elastichpc.org
25
26
Thank You

Más contenido relacionado

La actualidad más candente

Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaSScaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaSJelastic Multi-Cloud PaaS
 
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineMeteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineKit Merker
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
DevOps in AWS with Kubernetes
DevOps in AWS with KubernetesDevOps in AWS with Kubernetes
DevOps in AWS with KubernetesOleg Chunikhin
 
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMSARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMSArun prasath
 
Getting started with kubernetes
Getting started with kubernetesGetting started with kubernetes
Getting started with kubernetesBob Killen
 
Kubernetes Requests and Limits
Kubernetes Requests and LimitsKubernetes Requests and Limits
Kubernetes Requests and LimitsAhmed AbouZaid
 
Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideBytemark
 
Quantifying the Noisy Neighbor Problem in Openstack
Quantifying the Noisy Neighbor Problem in OpenstackQuantifying the Noisy Neighbor Problem in Openstack
Quantifying the Noisy Neighbor Problem in OpenstackNodir Kodirov
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetesKrishna-Kumar
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosmictc
 
Federated Kubernetes: As a Platform for Distributed Scientific Computing
Federated Kubernetes: As a Platform for Distributed Scientific ComputingFederated Kubernetes: As a Platform for Distributed Scientific Computing
Federated Kubernetes: As a Platform for Distributed Scientific ComputingBob Killen
 
Kubernetes a comprehensive overview
Kubernetes   a comprehensive overviewKubernetes   a comprehensive overview
Kubernetes a comprehensive overviewGabriel Carro
 
Kubernetes
KubernetesKubernetes
Kuberneteserialc_w
 

La actualidad más candente (19)

Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaSScaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
 
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineMeteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container Engine
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
DevOps in AWS with Kubernetes
DevOps in AWS with KubernetesDevOps in AWS with Kubernetes
DevOps in AWS with Kubernetes
 
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMSARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
 
kubernetes 101
kubernetes 101kubernetes 101
kubernetes 101
 
Getting started with kubernetes
Getting started with kubernetesGetting started with kubernetes
Getting started with kubernetes
 
Kubernetes Requests and Limits
Kubernetes Requests and LimitsKubernetes Requests and Limits
Kubernetes Requests and Limits
 
Containers kuberenetes
Containers kuberenetesContainers kuberenetes
Containers kuberenetes
 
Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory Guide
 
Quantifying the Noisy Neighbor Problem in Openstack
Quantifying the Noisy Neighbor Problem in OpenstackQuantifying the Noisy Neighbor Problem in Openstack
Quantifying the Noisy Neighbor Problem in Openstack
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetes
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Federated Kubernetes: As a Platform for Distributed Scientific Computing
Federated Kubernetes: As a Platform for Distributed Scientific ComputingFederated Kubernetes: As a Platform for Distributed Scientific Computing
Federated Kubernetes: As a Platform for Distributed Scientific Computing
 
Kubernetes a comprehensive overview
Kubernetes   a comprehensive overviewKubernetes   a comprehensive overview
Kubernetes a comprehensive overview
 
Kubernetes
KubernetesKubernetes
Kubernetes
 

Destacado

Head first docker
Head first dockerHead first docker
Head first dockerHan Qin
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingDIGVIJAY SHINDE
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Vincenzo Ferme
 
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]Caravane Bio [Mohammed Benbouida, AMBS, Morocco]
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]UNESCO Venice Office
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009bosc
 
استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)
 استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I) استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)
استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)Prof. Tafida Ghanem
 
Lt npsti process-and_forms_april_2011
Lt npsti process-and_forms_april_2011Lt npsti process-and_forms_april_2011
Lt npsti process-and_forms_april_2011Mosab-Khayat
 
الهوية الرقمية على مواقع التواصل الاجتماعي
الهوية الرقمية على مواقع التواصل الاجتماعيالهوية الرقمية على مواقع التواصل الاجتماعي
الهوية الرقمية على مواقع التواصل الاجتماعيFatma Esa
 
Delivering Bioinformatics MapReduce Applications in the Cloud
Delivering Bioinformatics MapReduce Applications in the CloudDelivering Bioinformatics MapReduce Applications in the Cloud
Delivering Bioinformatics MapReduce Applications in the CloudLukas Forer
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1Hamid Ur-Rahman
 
الثقافة المعلوماتية في الجامعات مكتبة جامعة 6 أكتوبر نوفمبر 2012م
الثقافة المعلوماتية في الجامعات   مكتبة جامعة 6 أكتوبر نوفمبر 2012مالثقافة المعلوماتية في الجامعات   مكتبة جامعة 6 أكتوبر نوفمبر 2012م
الثقافة المعلوماتية في الجامعات مكتبة جامعة 6 أكتوبر نوفمبر 2012مProf. Sherif Shaheen
 
تسويق خدمات المعلومات
تسويق خدمات المعلوماتتسويق خدمات المعلومات
تسويق خدمات المعلوماتu083125
 
الثقافة التقنية والمواطنة الالكترونية
الثقافة التقنية والمواطنة الالكترونيةالثقافة التقنية والمواطنة الالكترونية
الثقافة التقنية والمواطنة الالكترونيةNazzal Th. Alenezi
 

Destacado (20)

Head first docker
Head first dockerHead first docker
Head first docker
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computing
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
 
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]Caravane Bio [Mohammed Benbouida, AMBS, Morocco]
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)
 استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I) استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)
استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)
 
Lt npsti process-and_forms_april_2011
Lt npsti process-and_forms_april_2011Lt npsti process-and_forms_april_2011
Lt npsti process-and_forms_april_2011
 
Dr Justin Schonfeld - Bioinformatics Applications
Dr Justin Schonfeld - Bioinformatics ApplicationsDr Justin Schonfeld - Bioinformatics Applications
Dr Justin Schonfeld - Bioinformatics Applications
 
الهوية الرقمية على مواقع التواصل الاجتماعي
الهوية الرقمية على مواقع التواصل الاجتماعيالهوية الرقمية على مواقع التواصل الاجتماعي
الهوية الرقمية على مواقع التواصل الاجتماعي
 
Delivering Bioinformatics MapReduce Applications in the Cloud
Delivering Bioinformatics MapReduce Applications in the CloudDelivering Bioinformatics MapReduce Applications in the Cloud
Delivering Bioinformatics MapReduce Applications in the Cloud
 
مهارات+1
مهارات+1مهارات+1
مهارات+1
 
Present
PresentPresent
Present
 
Dr. Dario Lijtmaer - Data Sharing/Collaboration and Publication using BOLD
Dr. Dario Lijtmaer - Data Sharing/Collaboration and Publication using BOLDDr. Dario Lijtmaer - Data Sharing/Collaboration and Publication using BOLD
Dr. Dario Lijtmaer - Data Sharing/Collaboration and Publication using BOLD
 
e justice
e justice e justice
e justice
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1
 
Brin bws13 quiz mmc
Brin bws13 quiz mmcBrin bws13 quiz mmc
Brin bws13 quiz mmc
 
Visual Studio
Visual StudioVisual Studio
Visual Studio
 
الثقافة المعلوماتية في الجامعات مكتبة جامعة 6 أكتوبر نوفمبر 2012م
الثقافة المعلوماتية في الجامعات   مكتبة جامعة 6 أكتوبر نوفمبر 2012مالثقافة المعلوماتية في الجامعات   مكتبة جامعة 6 أكتوبر نوفمبر 2012م
الثقافة المعلوماتية في الجامعات مكتبة جامعة 6 أكتوبر نوفمبر 2012م
 
تسويق خدمات المعلومات
تسويق خدمات المعلوماتتسويق خدمات المعلومات
تسويق خدمات المعلومات
 
الثقافة التقنية والمواطنة الالكترونية
الثقافة التقنية والمواطنة الالكترونيةالثقافة التقنية والمواطنة الالكترونية
الثقافة التقنية والمواطنة الالكترونية
 

Similar a The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications

Introduction to Docker storage, volume and image
Introduction to Docker storage, volume and imageIntroduction to Docker storage, volume and image
Introduction to Docker storage, volume and imageejlp12
 
Introduction to containers a practical session using core os and docker
Introduction to containers  a practical session using core os and dockerIntroduction to containers  a practical session using core os and docker
Introduction to containers a practical session using core os and dockerAlessandro Martellone
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetesdatamantra
 
Docker on Amazon ECS
Docker on Amazon ECSDocker on Amazon ECS
Docker on Amazon ECSDeepak Kumar
 
Introduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud NativeIntroduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud NativeTerry Wang
 
Cloud Run and Containers
Cloud Run and ContainersCloud Run and Containers
Cloud Run and ContainersOmar Fathy
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production Hung Lin
 
Academy PRO: Docker. Part 1
Academy PRO: Docker. Part 1Academy PRO: Docker. Part 1
Academy PRO: Docker. Part 1Binary Studio
 
VASCAN - Docker and Security
VASCAN - Docker and SecurityVASCAN - Docker and Security
VASCAN - Docker and SecurityMichael Irwin
 
Kubernetes in Docker
Kubernetes in DockerKubernetes in Docker
Kubernetes in Dockerdocker-athens
 
Containerize! Between Docker and Jube.
Containerize! Between Docker and Jube.Containerize! Between Docker and Jube.
Containerize! Between Docker and Jube.Henryk Konsek
 
Kubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battleKubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battleAmir Moghimi
 
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Anant Corporation
 
Making Service Deployments to AWS a breeze with Nova
Making Service Deployments to AWS a breeze with NovaMaking Service Deployments to AWS a breeze with Nova
Making Service Deployments to AWS a breeze with NovaGregor Heine
 
Best Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with DockerBest Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with DockerEric Smalling
 
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Mario Ishara Fernando
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 

Similar a The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications (20)

JOSA TechTalks - Docker in Production
JOSA TechTalks - Docker in ProductionJOSA TechTalks - Docker in Production
JOSA TechTalks - Docker in Production
 
Introduction to Docker storage, volume and image
Introduction to Docker storage, volume and imageIntroduction to Docker storage, volume and image
Introduction to Docker storage, volume and image
 
Introduction to containers a practical session using core os and docker
Introduction to containers  a practical session using core os and dockerIntroduction to containers  a practical session using core os and docker
Introduction to containers a practical session using core os and docker
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
 
Docker on Amazon ECS
Docker on Amazon ECSDocker on Amazon ECS
Docker on Amazon ECS
 
Introduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud NativeIntroduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud Native
 
Cloud Run and Containers
Cloud Run and ContainersCloud Run and Containers
Cloud Run and Containers
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production
 
Academy PRO: Docker. Part 1
Academy PRO: Docker. Part 1Academy PRO: Docker. Part 1
Academy PRO: Docker. Part 1
 
Gdsc muk - innocent
Gdsc   muk - innocentGdsc   muk - innocent
Gdsc muk - innocent
 
VASCAN - Docker and Security
VASCAN - Docker and SecurityVASCAN - Docker and Security
VASCAN - Docker and Security
 
Kubernetes in Docker
Kubernetes in DockerKubernetes in Docker
Kubernetes in Docker
 
Containerize! Between Docker and Jube.
Containerize! Between Docker and Jube.Containerize! Between Docker and Jube.
Containerize! Between Docker and Jube.
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Kubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battleKubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battle
 
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
 
Making Service Deployments to AWS a breeze with Nova
Making Service Deployments to AWS a breeze with NovaMaking Service Deployments to AWS a breeze with Nova
Making Service Deployments to AWS a breeze with Nova
 
Best Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with DockerBest Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with Docker
 
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 

Último

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Último (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications

  • 1. The case for Docker in multi- cloud enabled bioinformatics applications Ahmed Ali, Mohamed M. ElKalioby, Mohamed Abouelhoda Nile University, Egypt Presented By Mohamed M. El-Kalioby, MSc 1
  • 2. Introduction ● Next generation sequencing technology has changed the traditional bioinformatics practice ● Sophisticated multi-step workflows used to transform the raw sequence data into knowledge. ● One NGS workflow can include tens of tasks and hundreds of information sources integrated together to achieve the analysis goals. ● Medical Variant Detection Workflow is an example of such workflows. 2
  • 3. Medical Variant Detection Workflow (MVDW) 3
  • 4. Medical Variant Detection Workflow (2) ● Multiple Versions and Instances of the workflow needed ● Tools and parameters can be changed ● per user, where each one may require certain modules, annotation databases, and special post-processing; ● per experiment type, e.g., whole genome, whole exome, or RNAseq in a single or multiplexed mode ● per sequencing platforms, illumina, IonTorrent, or any other one. 4
  • 5. Requirements5 ● Efficient Dynamic Deployment Strategy ● The deployed system should use HPC resources ● Able to consume cloud computing resources (private and public clouds)
  • 6. Virtualization Technology ● the whole system with all modules, databases and the related dependencies are packaged in a virtual machine (VM) image. ● These images can be then used to instantiate a virtual machine running in private or public cloud. ● Examples from sequence analysis ● Crossbow for NGS read alignment & SNP calling, ● RSD-Cloud for comparative genomics ● … many more 6
  • 7. Virtual Technology (2) ● The traditional engine for running the virtual machine instances is based either on ● Oracle Virtual Box, ● KVM, ● Xen Hypervisor ● VMware 7
  • 8. Docker8 ● Docker provides a new level of virtualization ● the computing machine (including the operating system) is not virtualized, ● Only the application and the related dependencies are encapsulated in a ’virtual’ isolated process INFRASTRUCTURE Operating System Virtual Machine Hypervisor VM1 VM2 … VMn APP1 APP2 …. APPn INFRASTRUCTURE Operating System Container Container … Container APPnAPP1 APP2 … Container Engine Software Stack with Virtual Machines Software Stack with Containers (a) (b)
  • 9. Usage of Docker 9 Dockerclient DockerServer (Daemon) Pull Image Download/upload Images Build Image Run Container Build/Push container images to local registry Terminate Container Docker public registry Local registry Infrastructure Operating System container container Run containers
  • 10. Why Docker10 ● Reduced execution overhead compared to traditional whole machine virtualization ● Provides an effective solution to the image portability problem. ● Virtual machine images running in Amazon are not compatible with those running in Google and vice versa which directly lead to duplication of work to prepare new images with each deployment.
  • 11. Challenges ● Extra layers need to be built on top of Docker to enable the use of HPC resources (computer cluster) and multi-cloud platforms ● Deployment in different commercial clouds is not an easy task. ● Each cloud platforms has different APIs and different business models. ● Images are compatible with different providers 11
  • 12. Contribution ● Define use case scenario for using Docker within a computer cluster for bioinformatics workflows. ● Evaluate its performance in comparison to the use of native hardware and usual virtual machines, in private and public cloud. ● We also present a new version of our multicloud elasticHPC, referred to as elasticHPC-Docker 1. enable the user deploy and run multi-step whole analysis workflows, 2. create computer cluster with Docker based applications and define a use case scenario for that 3. support the use of private clouds as well as commercial clouds like Amazon and Google. 12
  • 13. Containers in the Cloud13
  • 14. Google ● Google Cloud offers a container service in the form of two products 1. container-optimized virtual machine images, which includes programs to run standard Docker images, according to a user defined file in YAML format. 2. Google Kubernetes Engine (GKE) to create a cluster of virtual machines that can run Docker images. GKE is based on pods, ● Google has established Google container registry (GCR). ● Cost: ● The optimized container images and GKE run at no extra cost. pays usual price of virtual machines. ● GKE charges an extra fee of $0.15 per hour per cluster on top of the usual machine price (for cluster size > 5 nodes). ● GKE has two limitations: 1. It does not support Docker’s private images. 2. The cluster size in GKE cannot exceed 100 nodes. 14
  • 15. Amazon ● Amazon provides Elastic Container Service (ECS). ● ECS enables the deployment of Docker containers on Amazon EC2. ● Amazon uses docker-compose to manage docker containers. ● Docker-compose facilitates the process of setting up a multi-container application by defining the application and all its dependencies in a single file using YAML format. ● The instantiated machines include programs to automatically configure the Docker environment. ● Amazon has its own images registry. ● Cost: ● the user pays for same as that of the usual instance types. ● If the load balancing service is selected, the user pays an extra small cost of $0.025 per hour and $0.008 per GB transferred between instances ● Limitations: ● It does not support attaching EBS volumes to the running containers. 15
  • 16. ElasticHPC-Docker Features ● Ability to port and run any docker image to either private or commercial clouds. ● Creation and management of a cluster of containers. The cluster can use single or multiple machines. ● The computer cluster can have nodes from different cloud providers; i.e. some nodes can come from Amazon and some can come from Google. ● Ability to create and destroy containers in the run-time. This makes it possible to run multiple containers on the same machine, one at a time. ● The package supports scaling up/down of virtual machines (worker nodes) in a running clusters. 16
  • 17. ElasticHPC-Docker Features (2) 17 ● The package allows mounting of virtual disks and establishment of a shared file system to the containers (Default option is the NFS). In AWS, we use EBS volumes and in Google we use persistent storage disks. ● elasticHPC-Docker automatically configures a job scheduler (including security settings among the different providers) among the containers. The default job schedule is PBS Torque, but SGE is also supported. ● The current package includes many Docker specification files (DockerFile) for the most important tools for NGS data analysis. These include Fastx, BWA, GATK . ● It includes a number of structural bioinformatics tools, including AutoDock, Frodock, and AMBER GROMACS,, among others;.
  • 18. EHPC-Docker (Use Case)18 EHPC-Client EHPC-VM Manager Port 5000 Communication with VM Manager Port 5555 Ports1:4999, 5001:65535 Container Communication with Container service Master Node Communication Among conainer Service Communication Among Containerized Services Attached Data Volume Shared File System (Block Storage) Running on Users PC EHPC-VM Manager Port 5000 Port 5555 Ports1:4999, 5001:65535 Container Slave Node Worker Node Attached Data Volume EHPC-VM Manager Port 5000 Port 5555 Ports1:4999, 5001:65535 Container Slave Node Worker Node Attached Data Volume EHPC-VM Manager Port 5000 Port 5555 Ports1:4999, 5001:65535 Container Slave Node Worker Node Attached Data Volume 1. User downloads the EHPC-Docker client2. User runs the client to create a cluster on a supported clouda. The client starts Master nodeb. Master node creates the rest of the cluster in parallelc. Master node distributes the URL of the image registryd. Master and worker nodes retrieve the image and start the containers. e. Once done, the master node sets up the ports and finalizes the configuration of in terms of setting up the job scheduler and the shared storage.Cluster is ready
  • 19. Experiments ● We conducted two experiments: 1. Measure the time for establishing container clusters over different cloud platforms. 2. Measure the performance of using Docker when running the variant detection workflow. 19
  • 20. Experiment 120 1. GKE is faster than ECS 2. elasticHPC is faster than GKE 3. elasticHPC is close to ECS
  • 21. Experiment 2 ● For this experiment, we used an exome dataset from DePristo et al. of size ~ 9 GB. ● The exome is a set of NGS reads sequenced only from the whole coding regions of a genome.) ● The workflow was executed three times independently on Google, AWS, and private cloud based on OpenStack. ● In each cloud, the 9 GB input data is divided into blocks to be processed in parallel over the cluster nodes. ● For fair comparison, we used machines of as similar specifications as possible. ● Amazon: m3.2xlarge (8 C, Intel 2.5 GHz, 30 GB RAM, SSD disks, $0.532/hour), ● Google: n1-highmem-8(8 C, Intel 2.5 GHz, 52 GB RAM, SSD disks,$0.504/hour) ● OpenStack: we used local machine with 8 Cores, 56 GB RAM. 21
  • 22. Experiment 2 Physical Servers 22 Docker is too close to physical
  • 23. Experiment 2 Google Cloud 23 ElasticHPC is faster than GCE Containers
  • 24. Experiment 2 Amazon Cloud 24 ElasticHPC is very close to Amazon ECS
  • 25. Conclusion ● We introduced elasticHPC-Docker based on container technology. ● Our package enables the creation of a computer cluster with containerized applications and workflows in private and in different commercial clouds using single interface. ● It includes options to run bioinformatics applications and workflows for large datasets ● Through the container technology, elasticHPC-Docker provides an efficient solution to the inter-operability among commercial clouds, ● It is efficient in practice with reduced overhead especially on local infrastructures. ● It is available on http://www.elastichpc.org 25