This thesis talk studies resource management for on-demand computing services through a shared cluster. In such a context, the aim was to propose tools to enable allocating resources automatically for executing on-demand user requests, to enable sharing resources proportionally among those services, while maximizing their use. Funded by the Minalogic global business cluster through the Ciloe Project (http://ciloe.minalogic.net), this work targets on organizations such as SMB, which are not able to support the charge of purchasing and maintaining a dedicated computing infrastructure. Firstly, we have achieved a deep survey in the areas of on-demand computing and high performance computing. From this survey, we have defined a virtualized architecture to enable dynamic execution of user requests thanks to a special resource manager. Finally, we have proposed policies and algorithms which are so flexible to offer a suitable tradeoff between equity and resource use. Having worked in a context of industrial collaboration, we have developed a prototype of our proposal as a proof of concept. Based on open standards, this prototype relies on existing virtualization tools such as OpenNebula for allocating and manipulating virtual machines over the cluster's nodes. From this prototype along with various workloads, we have carried out experiments to evaluate our architecture and scheduling algorithms. Results have shown that our contributions allow to achieve the expected goals while being reliable and efficient.
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Execution Environment for On-Demand Computing Services Based on Shared Clusters
1. 1/40
Execution Environment for On-
Demand Computing Services Based
on Shared Clusters
PhD thesis, Grenoble University
By Rodrigue Chakode
(LIG/INRIA, Equipe Mescal)
Advisors: - Jean-François Méhaut
- Maurice Tchuenté
2. 2/40
Cloud Computing in a Nutshell
◉ Enables computing features as services
◉ Free or commercial services accessible over network
◉ On-demand and elastic accesses, plus a utility billing
– Customers (users of the service) only pay for what they use,
aka pay-as-you-go
– Requests for more or less features should be satisfied quickly
◉ Services setup transparently against customers
– They don't have to care about how the service is enabled
3. 3/40
Context Statement on Cloud Computing
◉Various sorts of cloud services
– Infrastructure-as-a-Service, Platform-as-a-Service, Software-
as-a-Service, Data-as-a-Service, Translation-as-a-Service...
– Almost everything could be a service (XaaS)
◉Requires to set up a suitable computing
infrastructure
– Servers, storage, network fabrics, cooling system...
◉May need significant investments
– Out of reach for many small or medium businesses (SMBs)
– Market currently dominated by biggest organizations
Introduction
4. 4/40
Challenges for HPC
◉ Numerous software require
intensive computing capabilities
– E.g. EDA Applications (Ciloe Project)
– Integrated circuits need to be simulated
before manufacturing
◉ Computing architectures are
increasingly parallel
– SMP, NUMA, GPU, Cluster... and
soon many-core architectures
◉ HPC applications run on clusters
of multicore nodes (SMP/NUMA)
◉ Also expensive
Example of a cluster. Credit : CEA
Introduction
5. 5/40
Bring HPC Services into Clouds
◉Services requiring intensive computations
◉Services enabled from a mutualized cluster
– Cluster supported by several businesses
– Each business providing its own service
– Cluster's resources shared among the services
◉Study with the context of an industrial
collaboration
– The Ciloe Project [http://ciloe.minalogic.net]
– Three SBEs editing EDA applications involved
Introduction
6. 6/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives
7. 7/40
Resource Management for HPC SaaS Services
◉What is a service
–Computes customer data
with a specific application
–Input specifies an
application and the data
–Output retrieved after the
computation
–No more interactions
necessary
Problem Statement
8. 8/40
Related Research Issues
◉Data Management
◉Resilience and Fault Tolerance
◉Security and privacy
◉Resource Management
Problem Statement
9. 9/40
Scheduling Problems
◉Share the cluster's resources among the services
– according to the investments of the different businesses
◉Maximize the use of resources
– Use idle resources to run pending requests
– Run miscellaneous tasks on idle resources in a best-effort
way
◉Minimize the impact of selfish behaviors
– A business can under-invest while needing a lot of resources
Problem Statement
10. 10/40
Resource Allocation for On-demand Services
◉ Running requests in a dynamic way
– Resources should be allocated dynamically
– Allocated resources should be freed up automatically once
a request completed
– Handle Input/Output data in a transparent way
◉ Need to think of resource partitioning
– Modern computing nodes have several cores
– The amount of cores required by certain tasks can be less
than the number of cores available on a node
Problem Statement
11. 11/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives
12. 12/40
Background on Existing SaaS Clouds
◉ Target office and collaborative
applications
– E.g. Google Docs, Salesforce,
Office365...
– Need of interactiveness
◉ SaaS cloud as a layer on top of a
PaaS
– PaaS can rely on an IaaS layer
– IaaS enables on-demand resource
allocation
• Virtualization plays an important role
◉ Resources belong to an unique
organization
Background on SaaS Clouds
13. 13/40
Services for Intensive Computations
◉ No need of interactiveness
◉ Requires a high dynamicity and
transparency
• Allocation of resources when
executing a task
• Release of resources once a task
completed
◉ Mutualized resources
=>Need to deal with sharing the
resources among the services
Background on SaaS Clouds
14. 14/40
Scheduling services on mutualized resources
◉ Raises conflicting objectives
– Fairness against the service suppliers
– Efficiency concerning the use of resources
◉ Prioritize an objective penalizes the other
=> Requires to make a tradeoff
Background on resource management
15. 15/40
Common resource scheduling strategies
◉ First-come, First-served (FCFS)
◉ FCFS along with Backfilling (EASY/Conservative)
+ Fair against users
– Inefficient in terms of utilization
– May be unfair against some
businesses in out context
+ Improve utilization
– May significantly delay biggest
tasks
+ Possible optimization with a
conservative backfilling
– Remains unfair in our context
Background on resource management
16. 16/40Background on resource management
How Resources are Assigned to Tasks
◉ Simple assignation strategies
– Greedy and round-robin algorithms
◉ Assignations guided by performance requirements
– Notion of match-making (affinities between resources and tasks)
◉ Prioritization
– More prioritized tasks get access to resources first
• Preemption can be introduced
=> Notion of best-effort when certain tasks only run on idle
resources
◉ Reservation and leasing
– Resources are allocated for a given time slot
17. 17/40Background on resource management
Common resource sharing strategies
◉ Static sharing (partitioning)
◉ Fair-sharing (no partitioning + dynamic priorities)
+ Fair and easy to setup
– Inefficient in terms of
utilization in our context
+ Tradeoff between the fairness and
the utilization
– May still raise unfair situations in
our context
R1
R2
R3
R4
R5
R6
R7
R1
R2
R3
R4
R5
R6
R7
Business 1
Business 2
Business 3
18. 18/40
Partitioning Individual Node
◉ Requires isolation among tasks
– A task would not access resources allocated to another task
◉ Isolation with containers (cgroups, cpusets, OpenVZ, LXC...)
+ Low level partitioning inducing a low overhead
=> good performances
– Non-flexible since not easy to handle dynamically
◉ Isolation with virtual machines (VMs)
+ High level partitioning
=> High flexibility in terms of automation
– Possible performance overhead
―Several optimizations (e.g. HVM, paravirtualization, PCI passthrough...)
Background on resource management
19. 19/40
Synthesis on Partitioning Resources
◉ Virtual Machines enable interesting features
– To partition each individual node along with a high isolation
– To allocate and free up resources dynamically
– To suspend/restart best-effort tasks
◉ Powerful and proved VM management tools
– Handle VMs on individual node
– Xen, KVM, ESXi, Hyper-V...
– Handle VMs on distributed environments
• OpenNebula, Eucalyptus, OpenStack...
―Target IaaS clouds
20. 20/40
Problems to Address With VMs
◉ Deal with performance overhead
– Generic optimizations
• HVM, PCI Passthrough
– Solution-specific optimizations
• Paravirtualization (Xen, Hyper-V)
• Virtio (KVM, Xen)
◉ Allocate custom VMs dynamically on distributed
environments
– Contextualization enables interesting features (OpenNebula)
21. 21/40
Lacks of the Existing According to Our Aims
◉ On-demand HPC services on a mutualized cluster
– Existing SaaS clouds focus on collaborative or office applications
• Resources owned by a single organization
◉ Existing resources sharing strategies don't suit our needs
=> Necessity to design new approaches
◉ Contributions
– Scheduling strategy for sharing mutualized resources
– Architecture for on-demand HPC services
– Prototyping for evaluation
Background on resource management
22. 22/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives
23. 23/40
Ideas for the resource sharing strategy
◉ Combines the advantages...
– of a static sharing where the fairness is easy to hold
– and those of a fair-sharing strategy that allows to improve the
utilization
◉ Enables a elasticity in resource sharing
– A business to use more resources than its investment :
• When the task raising such a situation has a duration less than
a acceptable duration threshold noted D
• Or When the task is of best-effort type
=> Limits the impact of selfish behaviors from certain
businesses
Contributions : Overview
24. 24/40
Handling Requests Dynamically
◉ Encapsulate each task within a virtual machine (VM)
– Eases the partitioning of nodes and enables dynamicity
◉ Enable a Specific SaaS Manager
– Implements the scheduling strategy to address the resource
sharing issues
– Assumes the allocation and the destruction of VMs
◉ Exploit the Contextualization of VMs
– VM created, customized and started dynamically
• VM suitably set to launch the task once started
– VM automatically destroyed once the task is completed
25. 25/40
Architecture Model
◉ The SaaS Manager on top
of the cluster
– Relies on a virtual
infrastructure manager (VIM)
– VIM relies on hypervisors
◉ Possibility of reusing
existing tools
– Avoids rewriting existing
features
– Benefits of features from
powerful proved tools
Contributions : Architecture Model
26. 26/40
Design Driven by Openness, Performances and
Interoperability
◉ OpenNebula enables support
for handling the VMs
– Featuring the
contextualization
◉ Xen manages VMs on each
individual node
– Exploits the
paravirtualization for better
performances
◉ The different components
coupled though Open APIs
– Ensure a better interopera-
bility
Contributions : Architecture Model
27. 27/40
Resource Sharing Strategy : Case study
◉ A situation with three
businesses B1, B2 and B3
– B1 (with green tasks) invested
for 2/7 of resources (R1,
R2...R7)
– B2 (with red tasks) invested for
2/7
– B3 (with blue tasks) for 3/7
◉ On the figure, think of tasks
as the related VMs
Contributions : Resource Management Strategy
t2
t3 t5
t6
t1 t4
Queued tasks
28. 28/40
Resource Sharing Strategy : Example 1
◉ Assumes the duration of t1
and t5 <= D (the chosen
duration threshold)
– B1 and B3 are using ratios of
resources geater than their
investments
– That representing a
complementary ratio of 1/14 for
each of them
Contributions : Resource Management Strategy
Queued tasks
t5
t1
t2
t3
t6
t4
29. 29/40
Resource sharing strategy : Example 2
◉ None of tasks has a
duration <= D, but the task
t2 is of best-effort type
– B1 is using a ratio of resources
1/7 greater than its investment
– t2 can be suspended at any
time
Contributions : Resource Management Strategy
t4t1
Queued tasks
t3
t2
t5
t6
30. 30/40
About Implementation
◉ Relies on principles of resource leasing
– A lease consists in allocating a virtual machine for running a task
– The duration of a lease depends on the related task
• Its duration and its of the type (best-effort or not)
◉ Two kinds of leases handled specifically
– Non-preemptive leases
• Assigned to tasks related to the customers
―Non preemptive tasks
=> Resources only freed up at completion
– Preemptive leases
• Assigned to best-effort tasks
―VMs can be suspended to be restart later
=> No guaranty of completion
Contributions : Resource Management Strategy
31. 31/40
Prototyping and Overview on Integration
◉ SVMSched (Smart Virtual
Machine Scheduler)
– Drop-in replacement for the
OpenNebula's default scheduler
– Proper interfaces that provide the
SaaS abstraction
– Deals with allocating and freeing
up VMs dynamically
– Implements the resource sharing
strategy
– Supports contextualization data
stored on Network File Systems
Contributions : Prototyping
32. 32/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives
33. 33/40
Evaluation Protocol
◉ Evaluation of the performances of an application
– Time to setup the VM
– Performance overhead induced by the virtualization
◉ Study of the scheduling strategy
– Is that behaves well regarding the fairness and the utilization ?
– If not, how it can be improved?
◉ Experimental conditions
– Nodes from Grid'5000 : each having 2x4 cores, 2.27 Ghz, 8Go of RAM
– Xen 3.4.2 and OpenNebula 1.4.2 along with VM images of 500MB
– Applications from the Parsec Benchmark (BodyTrack, Blackscholes,
Freqmine)
Evaluation
34. 34/40Evaluation
Performances of the virtualization
◉ Full VMs perform better than contextualized
ones => slight difference
◉ High overhead : applications requiring high
disk IO
◉ VMs perform better than native machines
=>concurrent tasks requiring high memory IO
◉ Contextualized VMs : require
constant and low setup time
– ~15s (<5% of the duration of a task
of 5 mins) with an image of 500 MB
◉ Full VMs : times grow linearly
35. 35/40Evaluation
Analyzing the scheduling strategy
◉ Better choice of the threshold
– Businesses can benefit from the mutualization
– Prevents the temptation for selfish behaviors
– Best-effort tasks would allows better utilization
◉ Mutualization is not relevant
– The threshold is not suitably chosen
– There is no best-effort tasks
– The strategy leads to a static sharing
36. 36/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives
37. 37/40
Conclusion
◉ We studied and set up an environment for enabling HPC
SaaS services on shared computing resources
– Designing an architecture model that relies on virtualization for
executing on-demand requests
– Design resource management algorithms that allow to share in a fair
way the resources while maximizing their use
◉ A prototype has been developed to evaluate experimentally
our contributions
– Results shown the feasibility of our approach
– Prototype integrated in the deliveries of the Ciloe Project
◉ Thus we have enabled a room for addressing the problem
of costs that highly constraints SMBs needing HPC
resources for their applications
Conclusion & Perspectives
38. 38/40
Perspectives
◉ Model of predicting the duration of each task
– Envisioning an approximation model based on reinforcing
learning
◉ Economic model of billing
– What parameters the invoicing can take into account?
• Per-use costs of software licenses and computing resources +
earnings
◉ Dimensioning the platform
– To allow each business to have a suitable view of its needs in
terms of resources
Conclusion & Perspectives
39. 39/40
About this Work
◉ Awards
– 1st Prize Grid'5000 Challenge, Reims 2011
◉ Book Chapter
– Rodrigue chakode, Jean-François Méhaut, Blaise-Omer Yenke. Scheduling On-demand SaaS Services on a
Shared Virtual Cluster. In Cloud Computing and Services Science. Pages 259 – 276. ISBN 978-1-4614-2325-6,
Springer-Verlag, April 2012.
◉ International conferences
– Rodrigue chakode, Blaise-Omer Yenke, Jean-François Méhaut. Resource Management of Virtual Infrastructure
for On-demand SaaS Services. In CLOSER2011 - International conference on Cloud Computing and Service
Science. Pages 352 – 361. Netherlands, May 2011.
– Rodrigue Chakode, Jean-François Méhaut, François Charlet. High Performance Computing on Demand:
Sharing and Mutualizing Clusters. In AINA'10 - IEEE International Conference on Avanced Information
Networking and Applications. Pages 126 – 133. Australia, April 2010.
◉ National conferences
– Rodrigue chakode, Blaise-Omer Yenke. Utilisation des machines virtuelles comme support de services de
calcul à la demande. In Renpar'20: les actes des Rencontres francophones du Parallélisme, édition 2011.
Saint-Malo, France, Mai 2011.
◉ Other publications (in the cloud community)
– Rodrigue chakode. SVMSched : A tool to enable On-demand SaaS and PaaS Services on top of OpenNebula.
In OpenNebula Official Blog, http://blog.opennebula.org/?p=1646.
– Link on the OpenNebula Software Ecosystem : http://opennebula.org/software:ecosystem:svmsched