1) The document discusses multi-tenancy in cloud computing, including virtualization techniques that allow multiple tenants to share physical hardware resources in an isolated manner.
2) It describes some of the challenges of achieving true isolation and fairness when sharing resources between multiple tenants, such as performance variability and unpredictable failures.
3) The document presents several architectural models for implementing multi-tenancy, including fully shared hardware with virtualization, and scheduling algorithms like Quincy that aim to provide fairness when distributing resources between jobs from different tenants.
Multi-Tenancy and Virtualization in Cloud Computing
1. IN4392 Cloud Computing
Multi-Tenancy, including Virtualization
Cloud Computing (IN4392)
D.H.J. Epema and A. Iosup
2012-2013
1
Parallel and Distributed Group/
2. Terms for Today’s Discussion
mul·ti-te·nan·cy noun ˌməl-ti-ˌte-nan(t)-sē
an IT sharing model of how physical and virtual
resources are used by possibly concurrent tenants
2012-2013 2
3. Characteristics of Multi-Tenancy
1. Isolation = separation of services provided to each tenant
(the noisy neighbor)
2. Scaling conveniently with the number and size of tenants
(max weight in the elevator)
3. Meet SLAs for each tenant
4. Support for per-tenant service customization
5. Support for value-adding ops, e.g., backup, upgrade
6. Secure data processing and storage
(the snoopy neighbor)
7. Support for regulatory law (per legislator, per tenant)
2012-2013 3
4. Benefits of Multi-Tenancy (the Promise)
• Cloud operator
• Economy of scale
• Market-share and branding (for the moment)
• Users
• Flexibility
• Focus on core expertise
• Reduced cost
• Reduced time-to-market
• Overall
• Reduced cost of IT deployment and operation
2012-2013 4
5. Agenda
1. Introduction
2. Multi-Tenancy in Practice (The Problem)
3. Architectural Models for Multi-Tenancy in Clouds
4. Shared Nothing: Fairness
5. Shared Hardware: Virtualization
6. Sharing Other Operational Levels
7. Summary
2012-2013 5
6. Problems with Multi-Tenancy [1/5]
A List of Concerns
• Users
• Performance isolation (and variability) for all resources
• Scalability with the number of tenants (per resource)
• Support for value-added ops for each application type
• Security concerns (too many to list)
• Owners
• Up-front and operational costs
• Human management of multi-tenancy
• Development effort and required skills
• Time-to-market
• The law: think health management applications
2012-2013 6
8. Problems with Multi-Tenancy [3/5]
Practical Achievable Utilization
• Enterprise: <15% [McKinsey’12]
• Parallel production environments: 60-70% [Nitzberg’99]
• Grids: 15-30% average cluster,
>90% busy clusters
• Today’s clouds: ???
2012-2013 8
Iosup and Epema: Grid Computing Workloads.
IEEE Internet Computing 15(2): 19-26 (2011)
9. Problems with Multi-Tenancy [4/5]
(Catastrophic) Cascading Failures
• Parallel production environments: one failure kills one or
more parallel jobs
• Grids: correlated failures
• Today’s clouds:
Amazon, Facebook, etc.
had catastrophic failures
in the past 2-3 years Average = 11 nodes
Range = 1—339 nodes
CDF
2012-2013 Size of correlated failures
9
Iosup et al. : On the dynamic resource
availability in grids. GRID 2007: 26-33
10. Problems with Multi-Tenancy [5/5]
Economics
• Up-front: a shared approach is more difficult to develop than
an isolated approach; may also require expensive skills
2012-2013 10
Source:
www.capcloud.org/TechGate/Multitenancy_Magic.pptx
11. Agenda
1. Introduction
2. Multi-Tenancy in Practice (The Problem)
3. Architectural Models for Multi-Tenancy in Clouds
4. Shared Nothing: Fairness
5. Shared Hardware: Virtualization
6. Sharing Other Operational Levels
7. Summary
2012-2013 11
13. Agenda
1. Introduction
2. Multi-Tenancy in Practice (The Problem)
3. Architectural Models for Multi-Tenancy in Clouds
4. Shared Nothing: Fairness
5. Shared Hardware: Virtualization
6. Sharing Other Operational Levels
7. Summary
2012-2013 13
14. Fairness
• Intuitively, distribution of goods (distributive justice)
• Different people, different perception of justice
• Each one will pay the same vs
The rich should pay proportionally higher taxes
• I only need to pay a few years later than everyone else
2012-2013 14
15. The VL-e project: application areas
Philips IBM Unilever
Medical
Diagnosis &
Bags-of-Tasks
Bio- Bio- Data
Intensive
Food Dutch
Diversity Informatics Informatics Telescience
Imaging Science
Virtual Laboratory (VL)
Management
of comm. &
Application Oriented Services
computing
Grid Services
Harness multi-domain distributed resources
15
16. The VL-e project: application areas
Philips IBM Unilever
Medical Bio- Bio- Data Food Dutch
Diagnosis & Diversity Informatics Intensive Informatics Telescience
Imaging Science
Bags-of-Tasks
Fairness for all! Virtual Laboratory (VL)
Management
of comm. &
Application Oriented Services
Task (groups of 5, 5 minutes): computing
discuss fairness for this scenario.
Grid Services
Harness multi-domain distributed resources
Task (inter-group discussion):
discuss fairness for this scenario.
16
17. Research Questions
Q1
What is the design space for
BoT scheduling in large-scale, distributed,
fine-grained computing?
Q2
What is the performance of BoT schedulers
in this setting?
2012-2013 17
18. Scheduling Model [1/4]
Overview
• System Model
1. Clusters
execute jobs
2. Resource managers
coordinate job execution
3. Resource management architectures
route jobs among resource managers
4. Task selection policies
create the eligible set
Fairness for all!
5. Task scheduling policies
schedule the eligible set
18
Iosup et al.: The performance of bags-of-tasks in large-
scale distributed systems. HPDC 2008: 97-108 Q1
19. Scheduling Model [2/4]
Resource Management Architectures
route jobs among resource managers
Centralized Separated Clusters Decentralized
(csp) (sep-c) (fcondor)
19
Iosup et al.: The performance of bags-of-tasks in large-
scale distributed systems. HPDC 2008: 97-108 Q1
20. Scheduling Model [3/4]
Task Selection Policies
create the eligible set
• Age-based:
1. S-T: Select Tasks in the order of their arrival.
2. S-BoT: Select BoTs in the order of their arrival.
• User priority based:
3. S-U-Prio: Select the tasks of the User with the highest Priority.
• Based on fairness in resource consumption:
4. S-U-T: Select the Tasks of the User with the lowest res. cons.
5. S-U-BoT: Select the BoTs of the User with the lowest res. cons.
6. S-U-GRR: Select the User Round-Robin/all tasks for this user.
7. S-U-RR: Select the User Round-Robin/one task for this user.
20
Iosup et al.: The performance of bags-of-tasks in large-
scale distributed systems. HPDC 2008: 97-108 Q1
21. Context: System Model [4/4]
Task
Task Scheduling Policies Information
schedule the eligible set K H U
• Information availability: ECT,
K ECT-P FPF
Information
• Known FPLT
Resource
• Unknown
H DFPLT,
• Historical records MQD
RR,
• Sample policies: U STFR
WQR
• Earliest Completion Time (with
Prediction of Runtimes) (ECT(-P))
• Fastest Processor First (FPF)
• (Dynamic) Fastest Processor Largest Task ((D)FPLT)
• Shortest Task First w/ Replication (STFR)
• Work Queue w/ Replication (WQR)
21
Iosup et al.: The performance of bags-of-tasks in large-
scale distributed systems. HPDC 2008: 97-108 Q1
22. Design Space Exploration [1/3]
Overview
• Design space exploration: time to understand
how our solutions fit into the complete system.
s x 7P x I x S x A x (environment) → >2M design points
• Study the impact of:
• The Task Scheduling Policy (s policies)
• The Workload Characteristics (P characteristics)
• The Dynamic System Information (I levels)
• The Task Selection Policy (S policies)
• The Resource Management Architecture (A policies)
22
Iosup et al.: The performance of bags-of-tasks in large-
scale distributed systems. HPDC 2008: 97-108 Q2
23. Design Space Exploration [2/3]
Experimental Setup
• Simulator:
• DGSim [IosupETFL SC’07, IosupSE EuroPar’08]
• System:
• DAS + Grid’5000 [Cappello & Bal CCGrid’07]
• >3,000 CPUs: relative perf. 1-1.75
• Metrics:
• Makespan
• Normalized Schedule Length ~ speed-up
• Workloads:
• Real: DAS + Grid’5000
• Realistic: system load 20-95% (from workload model)
23
Iosup et al.: The performance of bags-of-tasks in large-
scale distributed systems. HPDC 2008: 97-108 Q2
24. Design Space Exploration [3/3]
Task Selection, including Fair Policies
• Task selection policy only for busy systems
• Naïve user priority can lead to poor performance
• Fairness, in general, reduces performance
S-U-Prio
S-U-*: S-U-T, S-U-BoT, …
24
Iosup et al.: The performance of bags-of-tasks in large-
scale distributed systems. HPDC 2008: 97-108 Q2
25. Quincy: Microsoft’s Fair Scheduler
• Fairness in Microsoft’s Dryad data centers
• Large jobs (30 minutes or longer) should not monopolize the
whole cluster (Similar: Bounded Slowdown [Feitelson et al.’97])
• A job that takes t seconds in exclusive-access run requires at
most J x t seconds for J concurrent jobs in the cluster.
• Challenges
1. Support fairness
2. Improve data locality: use data center’s network and storage
architecture to reduce job response time
2012-2013 25
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
26. Dryad Workloads
2012-2013 26
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
28. Dryad Workloads Q: Is this fair?
2012-2013 28
Source:
http://sigops.org/sosp/sosp09/slides/quincy/QuincyTestPage.html
29. Dryad Workloads Q: Is this fair?
2012-2013 29
Source:
http://sigops.org/sosp/sosp09/slides/quincy/QuincyTestPage.html
30. Dryad Workloads Q: Is this fair?
2012-2013 30
Source:
http://sigops.org/sosp/sosp09/slides/quincy/QuincyTestPage.html
31. Quincy
Cluster Architecture: Racks and Computers
2012-2013 31
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
32. Quincy
Main Idea: Graph Min-Cost Flow
• From scheduling to Graph Min-Cost Flow
• Feasible schedule = min-cost flow
• Graph construction
• Graph from job tasks to computers, passing through cluster
headnodes and racks
• Edges weighted by cost function (scheduling constraints, e.g.,
fairness)
• Pros and Cons
• From per-job (local) decisions to workload (global) decisions
• Complex graph construction
• Edge weight assumes all constraints can be normalized
2012-2013 32
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
34. Quincy Operation [2/6]
unscheduled
2012-2013 34
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
35. Quincy Operation [3/6]
Q: How easy to encode
heterogeneous resources?
Weighted edges
2012-2013 35
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
36. Quincy Operation [4/6]
Root task gets
one computer
2012-2013 36
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
37. Quincy Operation [5/6]
Dynamic Schedule for One Job
2012-2013 37
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
38. Quincy Operation [6/6]
Dynamic Schedule for Two Jobs
Q: How compute-intensive is the
Quincy scheduler,
for many jobs and/or computers?
2012-2013 38
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
39. Quincy
Experimental Setup
• Schedulers
• Encoded two fair variants (w/ and w/o pre-emption)
• Encoded two unfair variants (w/ and w/o pre-emption)
• Comparison with Greedy Algorithm (Queue-Based)
• Typical Dryad jobs
• Workload includes worst-case scenario
• Environment
• 1 cluster
• 8 racks
• 240 nodes
2012-2013 39
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
40. Quincy
Experimental Results [1/5]
2012-2013 40
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
41. Quincy
Experimental Results [2/5]
2012-2013 41
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
42. Quincy, Experimental Results [3/5]
No Fairness
2012-2013 42
Isard et al.: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
46. Agenda
1. Introduction
2. Multi-Tenancy in Practice (The Problem)
3. Architectural Models for Multi-Tenancy in Clouds
4. Shared Nothing: Fairness
5. Shared Hardware: Virtualization
6. Sharing Other Operational Levels
7. Summary
2012-2013 46
47. Virtualization
• Merriam-Webster
• Popek and
Goldberg, 1974
2012-2013 47
Source: Waldspurger, Introduction to Virtual Machines
http://labs.vmware.com/download/52/
48. Characteristics of Virtualization
Q: Why not do all these in the OS?
1. Fidelity* = ability to run application unmodified
2. Performance* close to hardware ability
3. Safety* = all hardware resources managed by virtualization
manager, never directly accessible to application
4. Isolation of performance, or failures, etc.
5. Portability = ability to run VM on any hardware
(support for value-adding ops, e.g., migration)
6. Encapsulation = ability to capture VM state
(support for value-adding ops, e.g., backup, clone)
7. Transparency in operation
2012-2013 48
* Classic virtualization
(Popek and Goldberg 1974)
49. Benefits of Virtualization (the Promise)
• Simplified management of physical resources
• Increased utilization of physical resources (consolidation)
• Better isolation of (catastrophic) failures
• Better isolation of security leaks (?)
• Support for multi-tenancy
• Derived benefit: reduced cost of IT deployment and operation
2012-2013 49
50. A List of Concerns
• Users
• Performance isolation
• Owners
• Performance loss vs native hardware
• Support for exotic devices, especially on the versatile x86
• Porting OS and applications, for some virtualization flavors
• Implement VMM—application integration?
(Loss of portability vs increased performance.)
• Install hardware with support for virtualization?
(Certification of new hardware vs increased performance.)
• The Law: security, reliability, …
2012-2013 50
51. Depth of Virtualization
• NO virtualization (actually, virtual memory)
• Most grids, enterprise data centers until 2000
Q: Are all our machines virtualized
• Facebook now
anyway, by the modern OS?
• Single-level virtualization
(we zoom into this next)
• Nested virtualization
• VM embedded in a VM embedded in a VM emb …
• Q: Why is this virtualization model useful?
It’s all turtles all the way down…
Ben-Yehuda et al.: The Turtles Project: Design and Implementation
of Nested Virtualization. OSDI 2010: 423-436
2012-2013 51
52. Single-Level Virtualization and The Full IaaS Stack
Applications
Applications Applications
Guest OS
Guest OS Guest OS
Virtual Virtual Resources
Resources Virtual Resources
VM Instance
Virtual Machine Virtual Machine
Virtual Machine Monitor Virtual Machine Monitor
Virtual Infrastructure
Manager
February 20, 2013 52
Physical
Infrastructu
53. Single-Level Virtualization
Applications Applications
Guest OS Guest OS
MusicW
Virtual Resources OtherAp OtherAp
Virtual Resources
ave p p
Q: What to do
now?
Virtual Machine Virtual Machine
Virtual Machine Monitor Hypervisor
Host OS May not exist
February 20, 2013 53
54. Three VMM Models
Classic VMM* Hosted VMM Hybrid VMM
MWave App2 MWave App2 I/O App2
V
MWave
M
Guest
M OS
VMM VMM Host OS VMM
Host OS
2012-2013 54
* Classic virtualization
(Popek and Goldberg 1974)
55. Single-Level Virtualization
Implementing the Classic Virtualization Model
• General technique*, similar to simulation/emulation
• Code for computer X runs on general-purpose machine G.
• If X=G (virtualization), slowdown in software simulation may be
20:1. If X≠G (emulation), slowdown may be 1000:1.
• If X=G (virtualization), code may execute directly on hardware
• Privileged vs user code*
• Trap-and-emulate as main (but not necessary) approach
• Ring deprivileging, ring aliasing, address-space compression, other niceties**
• Specific approaches for each virtualized resource***
• Virtualized CPU, memory, I/O (disk, network, graphics, …)
2012-2013 55
* (Goldberg 1974) ** (Intel 2006)
*** (Rosenblum and Garfinkel 2005)
56. Single-Level Virtualization
Refinements to the Classic Virtualization Model*
• Enhancing VMM—guest OS interface (paravirtualization)
• Guest OS is re-coded (ported) to the VMM, for performance gains
(e.g., by avoiding some privileged operations)
• Guest OS can provide information to VMM, for performance gains
• Loses or loosens ―Fidelity‖ characteristic**
• 2010 onwards: paravirtualization other than I/O seems to wane
• Enhancing hardware—VMM interface (HW support)
• New hardware execution modes for Guest OSs, so no need for
VMM to trap all privileged operations, so performance gains
• IBM’s System 370 introduced interpretive execution (1972), Intel VT-x and VT-I (2006)
• Passthrough I/O virtualization with low CPU overhead
• Isolated DMA: Intel VT-d and AMD IOMMU; I/O device partitions: PCI-SIG IOV spec
2012-2013 56
* (Adams and Agesen 2006)
** (Popek and Goldberg 1974)
57. Single-Level Virtualization
Trap-and-Emulate
Guest OS + Application
Unprivileged
Q: What are the challenges?
Page Undef
Q: What are the challenges for
Fault Instr
vIRQ
x86 architectures?
Privileged
MMU CPU I/O
Emulation Emulation Emulation
Virtual Machine Monitor
2012-2013 57
Source: Waldspurger, Introduction to Virtual Machines
http://labs.vmware.com/download/52/
58. Single-Level Virtualization
Processor Virtualization Techniques*
• Binary Translation
• Static, execute guest instructions in interpreter, to prevent unlawful
access to privilege state instructions
• Dynamic/Adaptive BT, detect instructions that trap frequently and
adapt their translation, to eliminate traps from non-privileged
instructions accessing sensitive data (e.g., load/store in page tables)
• Hardware virtualization
• Co-design VM and Hardware: HW with non-standard ISA, shadow
memory, optimization of instructions for selected applications
• Intel VT-*, AMD SVM: in-memory data structure for state, guest
mode, a less privileged execution mode + vmrun, etc.
2012-2013 58
* (Adams and Agesen 2006)
59. Agenda
1. Introduction
2. Multi-Tenancy in Practice (The Problem)
3. Architectural Models for Multi-Tenancy in Clouds
4. Shared Nothing: Fairness
5. Shared Hardware: Virtualization
6. Sharing Other Operational Levels
7. Summary
2012-2013 59
60. Support for Specific Services and/or Platforms
Database Multi-Tenancy [1/3]
1. Isolation = separation of services provided to each tenant
2. Scaling conveniently with the number and size of tenants
3. Meet SLAs for each tenant
4. Support for per-tenant service customization
5. Support for value-adding ops, e.g., backup, upgrade
6. Secure data processing and storage
7. Support for regulatory law (per legislator, per tenant)
2012-2013 60
* Platform-specific (database-specific) issues
61. Support for Specific Services and/or Platforms
Database Multi-Tenancy [2/3]
2012-2013 61
Source:
http://msdn.microsoft.com/en-us/library/aa479086.aspx
62. Support for Specific Services and/or Platforms
Database Multi-Tenancy [3/3]
Private tables Extension tables
Rigid, shared table
Datatype-specific Universal table with
pivot tables XML document
Universal table
2012-2013 62
Source: Bobrowksi
www.capcloud.org/TechGate/Multitenancy_Magic.pptx
63. Agenda
1. Introduction
2. Multi-Tenancy in Practice (The Problem)
3. Architectural Models for Multi-Tenancy in Clouds
4. Shared Nothing: Fairness
5. Shared Hardware: Virtualization
6. Sharing Other Operational Levels
7. Summary
2012-2013 63
64. Conclusion Take-Home Message
• Multi-Tenancy = reduced cost of IT
• 7 architectural models for multi-tenancy
• Shared Nothing—fairness is a key challenge
• Shared Hardware—virtualization is a key challenge
• Other levels—optimizing for specific application is a key challenge
• Many trade-offs
• Virtualization
• Enables multi-tenancy + many other benefits
• 3 depth models, 3 VMM models
• A whole new dictionary: hypervisor, paravirtualization, ring deprivileging
• Main trade-off: performance cost vs benefits
• Reality check: virtualization is now (2012) very popular
February 20, 2013 http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/
64
65. Reading Material
• Workloads
• James Patton Jones, Bill Nitzberg: Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization. JSSPP 1999:
1-16
• Alexandru Iosup, Dick H. J. Epema: Grid Computing Workloads. IEEE Internet Computing 15(2): 19-26 (2011)
• Alexandru Iosup, Mathieu Jan, Omer Ozan Sonmez, Dick H. J. Epema: On the dynamic resource availability in grids. GRID 2007: 26-33
• D. Feitelson, L. Rudolph, U. Schwiegelshohn, K. Sevcik, and P. Wong. Theory and practice in parallel job scheduling. In JSSPP, pages 1–
34, 1997
• Fairness
• Alexandru Iosup, Omer Ozan Sonmez, Shanny Anoep, Dick H. J. Epema: The performance of bags-of-tasks in large-scale distributed
systems. HPDC 2008: 97-108
• Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, Andrew Goldberg: Quincy: fair scheduling for distributed
computing clusters. SOSP 2009: 261-276
• A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, Dominant Resource Fairness: Fair Allocation of Multiple
Resources Types, Usenix NSDI 2011.
• Virtualization
• Gerald J. Popek, Robert P. Goldberg, Formal Requirements for Virtualizable Third Generation Architectures, Communications of the ACM,
July 1974.
• Robert P. Goldberg, Survey of Virtual Machine Research, IEEE Computer Magazine, June 1974
• Mendel Rosenblum and Tal Garfinkel, Virtual Machine Monitors: Current Technology and Future Trends, IEEE Computer Magazine, May
2005
• Keith Adams, Ole Agesen: A comparison of software and hardware techniques for x86 virtualization. ASPLOS 2006: 2-13
• Gill Neiger, Amy Santoni, Felix Leung, Dion Rodgers, Rich Uhlig, Intel Virtualization Technology: Hardware Support for Efficient Processor
Virtualization. Intel Technology Journal, Vol.10(3), Aug 2006.
• Muli Ben-Yehuda, Michael D. Day, Zvi Dubitzky, Michael Factor, Nadav Har'El, Abel Gordon, Anthony Liguori, Orit Wasserman, Ben-Ami
Yassour: The Turtles Project: Design and Implementation of Nested Virtualization. OSDI 2010: 423-436
2012-2013 65
Editor's Notes
Comparison:Classic allows code execution to “run through” to the raw hardware; very efficient for I/OClassic requires virtualizable CPU, in the sense described by (Popek and Goldberg 1974’) – see earlier slide “Characteristics of Virtualization”Classic is not possible on x86 architectures without hardware virtualization supportHosted offers better I/O support through use of Host OS drivers, but much worse I/O performance due to I/O ops going through Guest OS, VMM, and Host OS before reaching the raw hardwareHosted has slow I/O so it cannot be used for most servers (Web, etc.)Hosted is easy to install and maintain (it’s a regular app for the Host OS), so it is good for desktopsHosted has problems with maintaining complete isolation(VMware’s) Hybrid runs VMM at the same level as Host OS; I/O VMM can perform graphics and other I/O ops for generic I/O devices, which are then translated to real hardware by the Host OS works but can be very slow