Execution Environment for On-Demand Computing Services Based on Shared Clusters

1/40
Execution Environment for On-
Demand Computing Services Based
on Shared Clusters
PhD thesis, Grenoble University
By Rodrigue Chakode
(LIG/INRIA, Equipe Mescal)
Advisors: - Jean-François Méhaut
- Maurice Tchuenté

2/40
Cloud Computing in a Nutshell
◉ Enables computing features as services
◉ Free or commercial services accessible over network
◉ On-demand and elastic accesses, plus a utility billing
– Customers (users of the service) only pay for what they use,
aka pay-as-you-go
– Requests for more or less features should be satisfied quickly
◉ Services setup transparently against customers
– They don't have to care about how the service is enabled

3/40
Context Statement on Cloud Computing
◉Various sorts of cloud services
– Infrastructure-as-a-Service, Platform-as-a-Service, Software-
as-a-Service, Data-as-a-Service, Translation-as-a-Service...
– Almost everything could be a service (XaaS)
◉Requires to set up a suitable computing
infrastructure
– Servers, storage, network fabrics, cooling system...
◉May need significant investments
– Out of reach for many small or medium businesses (SMBs)
– Market currently dominated by biggest organizations
Introduction

4/40
Challenges for HPC
◉ Numerous software require
intensive computing capabilities
– E.g. EDA Applications (Ciloe Project)
– Integrated circuits need to be simulated
before manufacturing
◉ Computing architectures are
increasingly parallel
– SMP, NUMA, GPU, Cluster... and
soon many-core architectures
◉ HPC applications run on clusters
of multicore nodes (SMP/NUMA)
◉ Also expensive
Example of a cluster. Credit : CEA
Introduction

5/40
Bring HPC Services into Clouds
◉Services requiring intensive computations
◉Services enabled from a mutualized cluster
– Cluster supported by several businesses
– Each business providing its own service
– Cluster's resources shared among the services
◉Study with the context of an industrial
collaboration
– The Ciloe Project [http://ciloe.minalogic.net]
– Three SBEs editing EDA applications involved
Introduction

6/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives

7/40
Resource Management for HPC SaaS Services
◉What is a service
–Computes customer data
with a specific application
–Input specifies an
application and the data
–Output retrieved after the
computation
–No more interactions
necessary
Problem Statement

8/40
Related Research Issues
◉Data Management
◉Resilience and Fault Tolerance
◉Security and privacy
◉Resource Management
Problem Statement

9/40
Scheduling Problems
◉Share the cluster's resources among the services
– according to the investments of the different businesses
◉Maximize the use of resources
– Use idle resources to run pending requests
– Run miscellaneous tasks on idle resources in a best-effort
way
◉Minimize the impact of selfish behaviors
– A business can under-invest while needing a lot of resources
Problem Statement

10/40
Resource Allocation for On-demand Services
◉ Running requests in a dynamic way
– Resources should be allocated dynamically
– Allocated resources should be freed up automatically once
a request completed
– Handle Input/Output data in a transparent way
◉ Need to think of resource partitioning
– Modern computing nodes have several cores
– The amount of cores required by certain tasks can be less
than the number of cores available on a node
Problem Statement

11/40
Outline
◉ Introduction
◉ Background
◉ Contributions
– Prototyping
– Results

12/40
Background on Existing SaaS Clouds
◉ Target office and collaborative
applications
– E.g. Google Docs, Salesforce,
Office365...
– Need of interactiveness
◉ SaaS cloud as a layer on top of a
PaaS
– PaaS can rely on an IaaS layer
– IaaS enables on-demand resource
allocation
• Virtualization plays an important role
◉ Resources belong to an unique
organization
Background on SaaS Clouds

13/40
Services for Intensive Computations
◉ No need of interactiveness
◉ Requires a high dynamicity and
transparency
• Allocation of resources when
executing a task
• Release of resources once a task
completed
◉ Mutualized resources
=>Need to deal with sharing the
resources among the services
Background on SaaS Clouds

14/40
Scheduling services on mutualized resources
◉ Raises conflicting objectives
– Fairness against the service suppliers
– Efficiency concerning the use of resources
◉ Prioritize an objective penalizes the other
=> Requires to make a tradeoff
Background on resource management

15/40
Common resource scheduling strategies
◉ First-come, First-served (FCFS)
◉ FCFS along with Backfilling (EASY/Conservative)
+ Fair against users
– Inefficient in terms of utilization
– May be unfair against some
businesses in out context
+ Improve utilization
– May significantly delay biggest
tasks
+ Possible optimization with a
conservative backfilling
– Remains unfair in our context

16/40Background on resource management
How Resources are Assigned to Tasks
◉ Simple assignation strategies
– Greedy and round-robin algorithms
◉ Assignations guided by performance requirements
– Notion of match-making (affinities between resources and tasks)
◉ Prioritization
– More prioritized tasks get access to resources first
• Preemption can be introduced
=> Notion of best-effort when certain tasks only run on idle
resources
◉ Reservation and leasing
– Resources are allocated for a given time slot

17/40Background on resource management
Common resource sharing strategies
◉ Static sharing (partitioning)
◉ Fair-sharing (no partitioning + dynamic priorities)
+ Fair and easy to setup
– Inefficient in terms of
utilization in our context
+ Tradeoff between the fairness and
the utilization
– May still raise unfair situations in
our context
R1
R2
R3
R4
R5
R6
R7
R1
R2
R3
R4
R5
R6
R7
Business 1
Business 2
Business 3

18/40
Partitioning Individual Node
◉ Requires isolation among tasks
– A task would not access resources allocated to another task
◉ Isolation with containers (cgroups, cpusets, OpenVZ, LXC...)
+ Low level partitioning inducing a low overhead
=> good performances
– Non-flexible since not easy to handle dynamically
◉ Isolation with virtual machines (VMs)
+ High level partitioning
=> High flexibility in terms of automation
– Possible performance overhead
―Several optimizations (e.g. HVM, paravirtualization, PCI passthrough...)

19/40
Synthesis on Partitioning Resources
◉ Virtual Machines enable interesting features
– To partition each individual node along with a high isolation
– To allocate and free up resources dynamically
– To suspend/restart best-effort tasks
◉ Powerful and proved VM management tools
– Handle VMs on individual node
– Xen, KVM, ESXi, Hyper-V...
– Handle VMs on distributed environments
• OpenNebula, Eucalyptus, OpenStack...
―Target IaaS clouds

20/40
Problems to Address With VMs
◉ Deal with performance overhead
– Generic optimizations
• HVM, PCI Passthrough
– Solution-specific optimizations
• Paravirtualization (Xen, Hyper-V)
• Virtio (KVM, Xen)
◉ Allocate custom VMs dynamically on distributed
environments
– Contextualization enables interesting features (OpenNebula)

21/40
Lacks of the Existing According to Our Aims
◉ On-demand HPC services on a mutualized cluster
– Existing SaaS clouds focus on collaborative or office applications
• Resources owned by a single organization
◉ Existing resources sharing strategies don't suit our needs
=> Necessity to design new approaches
◉ Contributions
– Scheduling strategy for sharing mutualized resources
– Architecture for on-demand HPC services
– Prototyping for evaluation

22/40
Outline
◉ Introduction
◉ Background
◉ Contributions
– Prototyping
– Results

23/40
Ideas for the resource sharing strategy
◉ Combines the advantages...
– of a static sharing where the fairness is easy to hold
– and those of a fair-sharing strategy that allows to improve the
utilization
◉ Enables a elasticity in resource sharing
– A business to use more resources than its investment :
• When the task raising such a situation has a duration less than
a acceptable duration threshold noted D
• Or When the task is of best-effort type
=> Limits the impact of selfish behaviors from certain
businesses
Contributions : Overview

24/40
Handling Requests Dynamically
◉ Encapsulate each task within a virtual machine (VM)
– Eases the partitioning of nodes and enables dynamicity
◉ Enable a Specific SaaS Manager
– Implements the scheduling strategy to address the resource
sharing issues
– Assumes the allocation and the destruction of VMs
◉ Exploit the Contextualization of VMs
– VM created, customized and started dynamically
• VM suitably set to launch the task once started
– VM automatically destroyed once the task is completed

25/40
Architecture Model
◉ The SaaS Manager on top
of the cluster
– Relies on a virtual
infrastructure manager (VIM)
– VIM relies on hypervisors
◉ Possibility of reusing
existing tools
– Avoids rewriting existing
features
– Benefits of features from
powerful proved tools
Contributions : Architecture Model

26/40
Design Driven by Openness, Performances and
Interoperability
◉ OpenNebula enables support
for handling the VMs
– Featuring the
contextualization
◉ Xen manages VMs on each
individual node
– Exploits the
paravirtualization for better
performances
◉ The different components
coupled though Open APIs
– Ensure a better interopera-
bility
Contributions : Architecture Model

27/40
Resource Sharing Strategy : Case study
◉ A situation with three
businesses B1, B2 and B3
– B1 (with green tasks) invested
for 2/7 of resources (R1,
R2...R7)
– B2 (with red tasks) invested for
2/7
– B3 (with blue tasks) for 3/7
◉ On the figure, think of tasks
as the related VMs
Contributions : Resource Management Strategy
t2
t3 t5
t6
t1 t4
Queued tasks

28/40
Resource Sharing Strategy : Example 1
◉ Assumes the duration of t1
and t5 <= D (the chosen
duration threshold)
– B1 and B3 are using ratios of
resources geater than their
investments
– That representing a
complementary ratio of 1/14 for
each of them
Queued tasks
t5
t1
t2
t3
t6
t4

29/40
Resource sharing strategy : Example 2
◉ None of tasks has a
duration <= D, but the task
t2 is of best-effort type
– B1 is using a ratio of resources
1/7 greater than its investment
– t2 can be suspended at any
time
t4t1
Queued tasks
t3
t2
t5
t6

30/40
About Implementation
◉ Relies on principles of resource leasing
– A lease consists in allocating a virtual machine for running a task
– The duration of a lease depends on the related task
• Its duration and its of the type (best-effort or not)
◉ Two kinds of leases handled specifically
– Non-preemptive leases
• Assigned to tasks related to the customers
―Non preemptive tasks
=> Resources only freed up at completion
– Preemptive leases
• Assigned to best-effort tasks
―VMs can be suspended to be restart later
=> No guaranty of completion

31/40
Prototyping and Overview on Integration
◉ SVMSched (Smart Virtual
Machine Scheduler)
– Drop-in replacement for the
OpenNebula's default scheduler
– Proper interfaces that provide the
SaaS abstraction
– Deals with allocating and freeing
up VMs dynamically
– Implements the resource sharing
strategy
– Supports contextualization data
stored on Network File Systems
Contributions : Prototyping

32/40
Outline
◉ Introduction
◉ Background
◉ Contributions
– Prototyping
– Results

33/40
Evaluation Protocol
◉ Evaluation of the performances of an application
– Time to setup the VM
– Performance overhead induced by the virtualization
◉ Study of the scheduling strategy
– Is that behaves well regarding the fairness and the utilization ?
– If not, how it can be improved?
◉ Experimental conditions
– Nodes from Grid'5000 : each having 2x4 cores, 2.27 Ghz, 8Go of RAM
– Xen 3.4.2 and OpenNebula 1.4.2 along with VM images of 500MB
– Applications from the Parsec Benchmark (BodyTrack, Blackscholes,
Freqmine)
Evaluation

34/40Evaluation
Performances of the virtualization
◉ Full VMs perform better than contextualized
ones => slight difference
◉ High overhead : applications requiring high
disk IO
◉ VMs perform better than native machines
=>concurrent tasks requiring high memory IO
◉ Contextualized VMs : require
constant and low setup time
– ~15s (<5% of the duration of a task
of 5 mins) with an image of 500 MB
◉ Full VMs : times grow linearly

35/40Evaluation
Analyzing the scheduling strategy
◉ Better choice of the threshold
– Businesses can benefit from the mutualization
– Prevents the temptation for selfish behaviors
– Best-effort tasks would allows better utilization
◉ Mutualization is not relevant
– The threshold is not suitably chosen
– There is no best-effort tasks
– The strategy leads to a static sharing

36/40
Outline
◉ Introduction
◉ Background
◉ Contributions
– Prototyping
– Results

37/40
Conclusion
◉ We studied and set up an environment for enabling HPC
SaaS services on shared computing resources
– Designing an architecture model that relies on virtualization for
executing on-demand requests
– Design resource management algorithms that allow to share in a fair
way the resources while maximizing their use
◉ A prototype has been developed to evaluate experimentally
our contributions
– Results shown the feasibility of our approach
– Prototype integrated in the deliveries of the Ciloe Project
◉ Thus we have enabled a room for addressing the problem
of costs that highly constraints SMBs needing HPC
resources for their applications
Conclusion & Perspectives

38/40
Perspectives
◉ Model of predicting the duration of each task
– Envisioning an approximation model based on reinforcing
learning
◉ Economic model of billing
– What parameters the invoicing can take into account?
• Per-use costs of software licenses and computing resources +
earnings
◉ Dimensioning the platform
– To allow each business to have a suitable view of its needs in
terms of resources
Conclusion & Perspectives

39/40
About this Work
◉ Awards
– 1st Prize Grid'5000 Challenge, Reims 2011
◉ Book Chapter
– Rodrigue chakode, Jean-François Méhaut, Blaise-Omer Yenke. Scheduling On-demand SaaS Services on a
Shared Virtual Cluster. In Cloud Computing and Services Science. Pages 259 – 276. ISBN 978-1-4614-2325-6,
Springer-Verlag, April 2012.
◉ International conferences
– Rodrigue chakode, Blaise-Omer Yenke, Jean-François Méhaut. Resource Management of Virtual Infrastructure
for On-demand SaaS Services. In CLOSER2011 - International conference on Cloud Computing and Service
Science. Pages 352 – 361. Netherlands, May 2011.
– Rodrigue Chakode, Jean-François Méhaut, François Charlet. High Performance Computing on Demand:
Sharing and Mutualizing Clusters. In AINA'10 - IEEE International Conference on Avanced Information
Networking and Applications. Pages 126 – 133. Australia, April 2010.
◉ National conferences
– Rodrigue chakode, Blaise-Omer Yenke. Utilisation des machines virtuelles comme support de services de
calcul à la demande. In Renpar'20: les actes des Rencontres francophones du Parallélisme, édition 2011.
Saint-Malo, France, Mai 2011.
◉ Other publications (in the cloud community)
– Rodrigue chakode. SVMSched : A tool to enable On-demand SaaS and PaaS Services on top of OpenNebula.
In OpenNebula Official Blog, http://blog.opennebula.org/?p=1646.
– Link on the OpenNebula Software Ecosystem : http://opennebula.org/software:ecosystem:svmsched

40/40
Thanks for your attention !

Execution Environment for On-Demand Computing Services Based on Shared Clusters

Recomendados

Recomendados

Más contenido relacionado

Similar a Execution Environment for On-Demand Computing Services Based on Shared Clusters

Similar a Execution Environment for On-Demand Computing Services Based on Shared Clusters (20)

Último

Último (20)

Execution Environment for On-Demand Computing Services Based on Shared Clusters