Presentation I made at the Sun Network conference in 2003 on how to do capacity planning for virtualized systems, tied into the N1 product that Sun was pushing at the time. This project was structured as a design for six sigma (DFSS) project.
Capacity Planning for Virtualized Datacenters - Sun Network 2003
1. Capacity Planning for N1
Sun Network 2003 Presentation
SunSigma
DFSS
Adrian.Cockcroft@sun.com
Project
Chief Architect - High Performance P925
Technical Computing
August 29, 2003
2. Project: Capacity Planning for N1
ID: P925
What is N1?
Datacenter Automation
Manage “N” systems as if they were “1” system
Solve the Total Cost of Ownership (TCO) problems
Manage all the “fabrics” as one - Network/VLAN, SAN/Zone, power,
consoles, cluster
Heterogenous Support
Solaris, Linux, AIX, HP-UX, Windows, EMC etc…
Layered Provisioning
Platform/OS, Application, Service
Roadmap Includes Acquisitions
2001 Sun internal N1 architectural definition
2002 Terraspring platform level virtualization
2003 CenterRun Application level provisioning
……….
2
3. Project: Capacity Planning for N1
ID: P925
Voice of the Customer
“We want better performance at a lower price”
_
“We want higher utilization”
_
“We don’t want application performance to
_
degrade at times of peak load”
“We want more and faster application changes”
_
“How do we do capacity planning with N1?”
_
Scope…
3
4. DEFINE Project: Capacity Planning for N1
ID: P925
Capacity Planning for N1
Define
_
Project goals, scope and plan, VOC, stakeholders
–
Measure
_
Definition of Capacity Planning measurements
–
Analyze
_
Gaps, N1CP Processes Concept Design, Survey
–
Design
_
Prototype Use Cases
–
Verify
_
Stakeholder communication and transition plan
–
Monitor
_
N1 Capacity Planning implementation tracked as
–
subgroup of N1 Strategic Working Group
4
5. MEASURE Project: Capacity Planning for N1
ID: P925
Translate VOC to Measurements
“We want better performance at a lower price”
Fast, well tuned and efficient systems
Lower Total Cost of Ownership
Flexibility - choice of systems by price, performance, reliability,
scalability, compatibility and feature set
“We want higher utilization”
Consistently high utilization of expensive resources
“We don’t want application performance to degrade at times of peak load”
Consistent and fast application or service response times
Headroom needed to handle peak loads
“We want more and faster application changes”
Flexible scenario planning, rapid provisioning
Question: “My company already has capacity planning processes and
tools” - do you agree or disagree with this statement?
5
6. MEASURE Project: Capacity Planning for N1
ID: P925
N1 as a Constraint and Opportunity
Centralized control and monitoring
_
Highly replicated hardware configurations
_
Well defined workload and capacity characterization
_
Arrays of load-balanced systems, structured network
_
Large SMP nodes, standardized storage layout
_
Web services workloads follow an “open system”
_
queuing model, which is simple to plan against
Dynamic system domains and virtualized provisioning
_
allow rapid capacity adjustments and pooled resources
Primary capacity metrics are CPU power and storage,
_
secondary metrics (memory, network and thermal) may
be over-provisioned but should be watched
6
7. MEASURE Project: Capacity Planning for N1
ID: P925
Utilization Definition
Utilization is the proportion of busy time
_
Always defined over a time interval
_
Sum over devices
_
OnCPU Scheduling for Each CPU
(mean load level)
Mean CPU Util
OnCPU and
0.56
usr+sys CPU for Peak Period
100
0
90
80 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
70
Microseconds
60
CPU %
50
40
Utilization
30
20
10
0
Time
7
8. MEASURE Project: Capacity Planning for N1
ID: P925
Headroom Definition
Headroom is available usable resources
_
Total Capacity minus Peak Utilization and Margin
–
Applies to CPU, RAM, Net, Disk and OS
–
Depends upon workload mixture
–
Can be very complex to determine
–
usr+sys CPU for Peak Period
100
Margin
90
80
Headroom
70
60
CPU %
50
40
Utilization
30
20
10
0
Time
8
9. MEASURE Project: Capacity Planning for N1
ID: P925
CPU Capacity Measurements
CPU utilization is defined as busy time divided by
_
elapsed time for each CPU
Number of CPUs is dynamic, so capacity at “100%” is
_
not constant. Use units of “processors” to measure load.
CPU type and speed varies so we need something like
_
MIPS or M-Values for mixed systems
CPU utilization should be managed within a range that
_
safely minimizes headroom to give stable performance
at minimum cost
Process level CPU wait time measures the time a
_
process spent on the run queue waiting for a free CPU
This allows response time increase to be observed directly so that
–
increased capacity can be provisioned before headroom is
exhausted
9
10. MEASURE Project: Capacity Planning for N1
ID: P925
Response Time Definition
Service time occurs while using a resource
_
Queue time waits for access to a resource
_
Response Time = Queue time + Service time
_
Response time curves for random arrival of work from large
unknown user population (e.g. the Internet!)
Response Time Curves
R = S / (1 - (U/m)m)
10.00
Response Time Increase Factor
9.00
8.00
7.00
6.00
One CPU
5.00 Two CPUs
Four CPUs
4.00
3.00
2.00
1.00
0.00
0 0.5 1 1.5 2 2.5 3 3.5 4
Mean CPU Load Level
10
11. MEASURE Project: Capacity Planning for N1
ID: P925
Response Time Curves
Systems with many CPUs can run at higher utilization
levels, but degrade more rapidly when they finally run out
of capacity. Headroom margin should be set according to
response time margin and CPU count.
Response Time Curves R = S / (1 - (U%)m)
10.00
Response Time Increase Factor
9.00
8.00
One CPU
7.00
Two CPUs
6.00
Four CPUs
5.00 Eight CPUs
Headroom 16 CPUs
4.00
margin 32 CPUs
3.00 64 CPUs
2.00
1.00
0.00
0 10 20 30 40 50 60 70 80 90 100
Total System Utilization %
11
12. MEASURE Project: Capacity Planning for N1
ID: P925
CPU Scalability Differences
SMP allows work to migrate between CPUs, “blades” don’t
Single queue of work gives lower response time for user sessions
–
at high utilization than arrays of uniprocessor “blades”
Headroom margin on array of “blades” is constant as array grows
–
Two to four CPU systems need much less margin than Uni-CPUs
–
Measure and calibrate actual response curve per workload
–
Response Time Curves
SMP R = S / (1 - (U/m)m) vs. Blade R = S / (1 - U/m)
10.00
Response Time Increase Factor
9.00
8.00
7.00
1 CPU/Blade
6.00
2 CPU SMP
5.00 4 CPU SMP
2 Blades
4.00
4 Blades
3.00
2.00
1.00
0.00
0 0.5 1 1.5 2 2.5 3 3.5 4
CPU Demand Level
12
13. MEASURE Project: Capacity Planning for N1
ID: P925
CPU Measurement System Issues
Clock sampled CPU usage
_
Poor clock resolution at 10ms (optionally 1 ms)
–
Biased sample since clock schedules jobs
–
Underestimates more at lower utilization
–
Creates apparent lack of scalability
–
Microstate measured CPU usage
_
Measure state changes directly - “microstates”
–
Per-CPU microstate based counters are not available
–
Use microstates at process based workload level, sum over some or
–
all processes as needed (can take a while on big systems)
Microstate method simply extends to measuring services and mixed
–
workloads
13
14. MEASURE Project: Capacity Planning for N1
ID: P925
N1 Capacity Planning CTQs
Gauge Budget
CTQ Name Pri Units LSL USL
Acc. Sigma
30% of
CPU Utilization (TCO) 5 CPUs 99% 3.0
total
CPU Responsiveness 70-98%
10 CPUs 99% 4.0
(SLA) of total
Both of these Critical To Quality (CTQ) requirements are measured via the CPU load
level which can accurately be measured with a Gauge accuracy estimated at 99% and a
sigma goal based on defect cost. Using sampled CPU accuracy is estimated at 90%.
For CPU Utilization a defect is unacceptable Total Cost of Ownership (TCO) and
occurs if the total CPU load drops below the Lower Specification Limit (LSL) of 30%
of the total configured for a sample taken during the peak load period.
For CPU Responsiveness a defect is overload leading to a Service Level Agreement
(SLA) failure and occurs if the total CPU load goes above the Upper Specification
Limit (USL) which is 70% of the total configured for Uni-processors increasing for
larger CPU counts.
14
15. ANALYZE Project: Capacity Planning for N1
ID: P925
Concept Design - N1CP Roles
Manager
Application Architect
_
– Developers
– Database Administrators
Systems Architect
_
– Systems Administrators
– Storage Administrators
– Network Administrators
Others?
Question: What roles do you do?
15
16. ANALYZE Project: Capacity Planning for N1
ID: P925
Scenarios - Top Level Functional Breakdown
Install N1
Datacenter
Provision
Provisionlevel
System
Over-Provision
System level
Applications
System level
Applications Provision
Applications Provisionlevel Repeat infrequently
System
Right-size
System level
Applications
System level
Applications
Applications
Provision
Provisionlevel Repeat on schedule
System
Re-Allocate
System level
Applications
Resources during
Applications
Provision
Low load times
Provisionlevel
System Repeat as needed
Grow or borrow
System level
Capacity Applications
just before
Applications
Overload occurs
16
17. ANALYZE Project: Capacity Planning for N1
ID: P925
Installation Sizing Scenario
This scenario indicates the tasks for each role when an N1 datacenter fabric is created using
currently available system level provisioning software. The tasks performed by each role in a
scenario is called a “use case”. Future versions of N1 will configure services and policies
during installation. Red arrows show the command flow between the roles.
Manager Application Database Developer Systems Systems Network Storage
Architect Admin Architect Admin Admin Admin
I want an N1 Choose and Install Install Choose Size systems Size overall Size overall
ready size generic generic systems mix network storage
datacenter applications database application
and images servers
platforms
Time
Build generic Setup Setup SANs
system switches and storage
images and VLANs for N1
for N1
Measure
capacity of
generic
systems
17
18. ANALYZE Project: Capacity Planning for N1
ID: P925
Over-Provisioning Scenario
This gives an indication of the tasks performed by each role as a new application is
provisioned using the capabilities of todays N1 products. The initial goal is to over-provision
the capacity for initial bring-up of the application then later right-size it as its actual usage
pattern becomes better understood. In future releases more and more of this activity will be
automated, and more of the work will move to become pre-work that is related to setting up
the overall N1 datacenter infrastructure.
Manager Application Database Developer Systems Systems Network Storage
Architect Admin Architect Admin Admin Admin
Provide an Use these Database App server Use these Systems Network Storage
online apps versions versions platforms selection & sizing sizing
service and sizing and sizing versions
Configure Configure Define Build Provision Provision
Time database app server operations replicable Internet LUNs
policies system connection
images
Populate Acceptance Use N1 GUI Configure Configure
database test to over- access and backup
provision security strategy
initial
system
Enable user
access
18
19. ANALYZE Project: Capacity Planning for N1
ID: P925
Rightsizing Scenario
Rightsizing adjusts the headroom for each component of the system to make sure that the
usage level falls inside the specification limits. Rightsizing can be performed during an
offline maintenance window but all the technologies exist to adjust domain size for tier 3
systems, and adjust the number of tier 1 and tier 2 systems dynamically.
Manager Application Database Developer Systems Systems Network Storage
Architect Admin Architect Admin Admin Admin
Business Monitor Monitor CPU, Monitor WAN Monitor
level and database Network / Internet storage
trend plan headroom and headroom headroom
(memory memory
and tables)
Time
Increase Increase Increase Increase
headroom headroom headroom headroom
for for for for
bottleneck bottleneck bottleneck bottleneck
Reduce Reduce Reduce Reduce
headroom headroom headroom headroom
for under for under for under for under
utilized utilized utilized utilized
database systems bandwidth storage
19
20. ANALYZE Project: Capacity Planning for N1
ID: P925
Re-Allocation Scenario
Load levels vary during the day and the week. Regular times of low utilization can have
other work performed - e.g. overnight batch jobs. Batch workloads that cannot run on the
same systems due to configuration or security issues can run on systems (or Grids) that are
provisioned each night using spare capacity from other systems.
Manager Application Database Developer Systems Systems Network Storage
Architect Admin Architect Admin Admin Admin
Batch Define batch Build or Define batch Determine
workload capable configure mechanism timing and
capacity applications batch depth of
needed capable capacity to
applications re-allocate
Time
Move
resources
to Grid
after peak
load time
Bring
resources
back before
peak load
time
20
21. ANALYZE Project: Capacity Planning for N1
ID: P925
Overload Scenario
Load levels vary during the day and the week in a fairly consistent and predictable manner. Sizing
for the normal load level allows high utilization levels. Higher load levels can be handled as an
exception by watching for abnormally high levels before the load peaks and borrowing capacity
from lower priority applications such as development environments.
Question: “Are dynamic capacity adjustments a mature and reliable technology?”
Manager Application Database Developer Systems Systems Network Storage
Architect Admin Architect Admin Admin Admin
Higher Determine
utilization normal load
needed to curve for time
reduce cost of day and
of service day of week
Time
Negotiate Monitor
victim to deviations
steal above normal
capacity load level
from
Provision extra
capacity
before it is
needed
21
22. ANALYZE Project: Capacity Planning for N1
ID: P925
Rightsizing Scenario
Detailed Design Concept via an Example
_
Large scale Internet workload
_
Fairly predictable load shape
–
Peaks every evening (use peak hours)
–
Grows every week
–
Key CTQs
_
Performance during peak hour
–
Cost of maintaining performance level
–
Risk of downtime
–
Tier 3 backend database server
_
Primary bottleneck, over-provisioned elsewhere
–
Highest cost of CPU headroom (E10K/F15K class)
–
Initially 56 CPUs in domain, average 30 CPUs load
–
22
23. ANALYZE Project: Capacity Planning for N1
ID: P925
CPU Load Level
Monitor for days or weeks to establish baseline and time of
peak load, then track that timeslot daily
CPU load (units are CPUs, 56 configured) for a busy day:
Summed CPU Utilization
Peak
50
2 Hrs
CPU Utilization Level
40
30
20
10
0
0:00:00
0:58:00
1:56:00
2:54:00
3:52:00
4:50:00
5:48:00
6:46:00
7:44:00
8:42:00
9:40:00
10:38:00
11:36:00
12:34:00
13:32:00
14:30:00
15:28:00
16:26:00
17:24:00
18:22:00
19:20:00
20:18:01
21:16:00
22:14:00
23:12:00
Time of Day
23
24. ANALYZE Project: Capacity Planning for N1
ID: P925
Utilization Distribution
Capability plot for peak time shows system is less than half
utilized about 25% of the time, too much headroom. Defect
rate corresponds to Sigma level of 2.18.
CPU Demand Level
24
25. ANALYZE Project: Capacity Planning for N1
ID: P925
Increase Utilization
Reduce system to 40 CPUs, assume linear increase in utilization -
predicted sigma = 5.2
Over-simplified - headroom margin and non-linearities not included
in the plan. So add a little extra headroom to compensate
CPU Demand Level
25
26. DESIGN Project: Capacity Planning for N1
ID: P925
Headroom Tool Prototype
Solaris specific prototype
_
Rapid prototype using SE Toolkit from http://www.setoolkit.com
–
Shows component level headroom vs. utilization goal
–
Automatic margin calculation based on CPU count
–
Samples every few minutes, reports every 30-60 minutes
–
Microstate based, sums over all processes
–
Headroom predictor uses mean plus two standard deviations
–
Text based, logs data to a daily file, 3.5 sigma headroom
–
Code p.=processor, r.=ram, n.=network, d.=disk, .st=status, .cf=configured,
.ll=min lsl, .ul=limit usl, .ld=mean load, .h%=headroom, .sd=std deviation,
.tco=TCO defect rate, .sla=SLA defect rate, .tK=throughput K, .rm=response
time in milliseconds, .rp=response time proportional increase
time pll pul pcf pst ptco psla pld psd ph% ptK prm prp
17:36:04 3.6 11.6 12 Green 0.00 0.00 5.26 0.28 50 15.8 1.05 1.08
18:06:04 3.6 11.6 12 Green 0.00 0.00 4.90 0.38 51 13.9 1.01 1.06
18:36:04 3.6 11.6 12 Blue 0.40 0.00 4.55 2.19 23 13.0 0.93 1.09
19:06:03 3.6 11.6 12 Blue 1.00 0.00 3.02 0.17 71 12.7 0.86 1.05
19:36:03 3.6 11.6 12 Blue 0.93 0.00 2.82 0.53 67 12.0 0.67 1.04
CPU Throughput is based on
Samples taken every 12 CPUs configured, Status is based on measured Mean load level and
voluntary context switches,
two minutes and Lower limit 30% = 3.6, defect proportion of time that standard deviation are
prm is very short, but prp
reported every 30 Upper limit based on CPU load level is below pll=TCO or compared to the upper limit
minutes above pul=SLA limits to calculate headroom. defines a response time curve
count at 11.6
26
27. DESIGN Project: Capacity Planning for N1
ID: P925
Headroom Calculations
Set configured total to number of processors online
conf = sysconf(_SC_NPROCESSORS_ONLN);
Set lower spec limit to 30% for TOC failures
lsl = conf * 0.3;
Use response time goal of 3 times baseline on curve to
determine margin for maximum load level
rpgoal = 3.0;
Calculate max load level from theoretical response time curve
/* rp = R/S, rp = 1/(1-(U^m)) so U = exp(log((rp-1)/rp)/ m)) */
usl = conf * exp(log((rpgoal-1.0)/rpgoal)/conf);
Calculate headroom % from mean plus two standard
deviations versus upper spec limit
headp = 100.0 * (1.0 - (mean + 2.0*sd) / usl);
Calculate Sigma Zst
tco_sigma = 1.5 + (mean - lsl) / sd);
sla_sigma = 1.5 + (usl - mean) / sd);
27
28. DESIGN Project: Capacity Planning for N1
ID: P925
Design Optimization
Compare the “traditional” approach with the new design
Run the headroom tool on a big and busy server, collect data and show how a simplistic approach
compares with the method described in this project
SunRay timesharing server monitored for several days. System is loaded to the limit at peak times,
but idle out of hours, so focus on a scheduled capacity reallocation scenario.
Simplistic “Traditional” Approach
Collect data using vmstat, sar, SunMC or 3rd party tools
Plot CPU % busy - as shown on next slide
There is spare capacity, but no indication of how many CPUs are unused
Need extra information that this is a 12-CPU system
N1CP Approach
Collect data using headroom prototype
Plot CPU load level in CPU units, no need to guess or replot data
Calculate margin, headroom and sigma levels
Plan capacity reallocation and recalculate margin, headroom and sigma levels
28
30. DESIGN Project: Capacity Planning for N1
ID: P925
N1CP View free overnight, system overloads at peak times
- CPU Counts
There are 12 CPUs, 6 to 8 are
Mean+2sd Load vs Configured and Upper Limit
pcf pul pmd+2psd
14
12
10
8
CPU Count
6
Mean CPU Load 7.03
4
Mean Util 59% DPMO Min Sigma
Summary
Mean headroom 34%
2 TCO 110215 -1.5 Zst
Mean capacity 12.00 SLA 538 2.5 Zst
0
0:30:05
3:00:05
5:30:05
8:00:06
10:30:16
13:00:14
15:30:21
18:00:08
20:30:06
23:00:06
1:30:06
4:00:06
6:30:06
9:00:09
11:30:15
14:03:13
16:33:10
19:03:07
21:33:07
0:03:07
2:33:06
5:03:07
7:33:07
12:36:12
15:06:17
17:36:07
20:06:06
22:36:06
1:06:06
3:36:06
6:06:06
8:36:08
11:06:12
13:36:12
16:06:12
18:36:07
21:06:07
23:36:06
Time of Day
30
31. DESIGN Project: Capacity Planning for N1
ID: P925
N1CP - Response Curve
System is close to overload, this timeshared workload has a flatter curve
than internet workloads (closed rather than open queuing model)
Response Time vs Load Level
2.5
2
Response Increase
1.5
1
0.5
0
0 2 4 6 8 10 12
CPU Count
31
33. DESIGN Project: Capacity Planning for N1
ID: P925
N1CPcount and times daily, and borrow extra for the peak load
View - Dynamic!
Vary the CPU
CPU mean+2sd Load vs Config and Upper Limit
pcf pul pmd+2psd
14
3.2s
3.2s 3.5s
4.3s
12
6.3s
10
3.6s
CPU Count
8
5.2s
3.2s
6
5.7s
Mean CPU load 7.03
4
Min Sigma
Mean Util 74% Predicted
2 TCO 2.0 Zst
Mean headroom 16%
SLA 3.2 Zst
Mean capacity 9.52
0
30 5
30 5
35
:3 09
30 6
30 5
36
:3 12
33 6
33 7
37
:0 14
06 6
06 6
06
:0 10
:3 11
:3 21
:3 07
30 6
:3 16
:3 16
:3 08
33 7
:0 17
:0 07
06 6
:0 10
:0 13
:0 07
07
3: :0
6: :0
9: :0
3: :0
6: :0
9: :0
3: :0
6: :0
9: :0
3: :0
6: :0
9: :0
0: :0
0: :0
0: :0
12 0:
12 0:
15 3:
12 6:
15 0:
18 0:
21 0:
15 3:
18 3:
21 3:
18 6:
21 6:
15 6:
18 6:
21 6:
6:
30
0
3
6
0:
Time of Day
33
34. DESIGN Project: Capacity Planning for N1
ID: P925
Summary
Performance Impact
SLA Sigma levels improve from minimum of 2.5 Zst to 3.2 Zst
Improvement of 0.7 Sigma by allowing for extra peak load
Simplistic methods do not allow quality of service prediction
Cost Impact
TCO Sigma levels improve from minimum of -1.5 Zst to 2.0 Zst
Improvement of 3.5 Sigma by reducing capacity from 12 to 9.5
Observability Impact
Headroom tool prototype generates all required statistics
Sigma level is simply calculated, or headroom tool could print it
Simplistic methods do not show what is going on
Complexity Impact
Dynamic reconfiguration must be enabled
One reconfiguration each morning and each evening
Applicability (Assertions, out of scope for this project!)
CPU based example can be applied to blades, RAM, disk, net, thermal
Method can be extended from platform level to services
34
36. GRID Project: Capacity Planning for N1
ID: P925
Capacity for Sale
Uses for Spare Capacity
Carefully schedule batch work and backups
Remotely support global timezones
Run engineering dept. simulation jobs
Grid Oriented Solutions
Project Grid - departmental cluster (Sun Grid Engine)
Enterprise Grid - collection of clusters forming a general
purpose Grid service (Sun Grid Engine Enterprise Edition)
The Global Grid - Internet level - GT2.2, OGSA/OGSI/GT3
Provision an Enterprise Grid service using N1
Join The Global Grid and sell or share capacity
36
37. GRID Project: Capacity Planning for N1
ID: P925
Relationships: N1 and Grid
N1 is about provisioning things you own, Grid is about access to things you don’t own
Business
Infrastructure
Model
Things you
Utility
N1
own and
Computing
control
Things you Grid Services Utility
borrow or
Computing
Web Services
lease
37
38. GRID Project: Capacity Planning for N1
ID: P925
Capacity Flows in a Grid Enabled N1 Datacenter
Utility
Computing
N1 Virtualized Datacenter
Capacity
Requests
Capacity
Purchase
On Tier 0
C.O.D. Tier 1
Tier 3 Tier 2
Demand Web Web User /
Web
Database App Front Web Services
Servers
Storage Servers End
Purchase
Capacity
Free Sun
Pool Grid
Cluster Grid
Unused Grid User /
Engine
Compute and
Resources Grid Services
Enter-
Storage Resources
Prise
Retire
Edition
Obsolete
Capacity
Repair and Replace
38
39. GRID Project: Capacity Planning for N1
ID: P925
IT market segments by “need to share”
Defense Commercial Technical Consumer
spooks suits geeks users
What can be Operating
Nothing Hardware Everything
System
shared
Nothing, N1, Server P2P apps,
Grid, VPN,
What is physical domains, VLAN SETI, Kazaa,
encryption,
separation and SAN Zone Limewire,
trusted firewalls
required partitioning People!
Everything in The Everything
What is Local systems, Local systems
Global Grid including other
and Internet
visible Project Grids community users
Storage.
CPU cycles,
CPU cycles. Network
Organizational,
Primary Latency. bandwidth.
Organizational
legal,
constraints National issues Know-how
contractual
security issues
39