Dynamic Fractional Resource Scheduling Practical Issues and Future Directions -- 2011, Cetraro

Introduction Implementation Concluding Remarks Appendix
Dynamic Fractional Resource Scheduling
Practical Issues and Future Directions
Mark Stillwell Frédéric Vivien Henri Casanova
INRIA, Université de Lyon, LIP
International Research Workshop on Advanced High
Performance Computing Systems
Cetraro, Italy
June 27–29, 2011
M. Stillwell, F. Vivien, H. Casanova INRIA
DFRS in Practice

Outline
Introduction
Implementation
Concluding Remarks
DFRS in Practice

Dynamic Fractional Resource Scheduling
HPC Scheduling Problem:
homogeneous nodes with high-speed interconnect
network ﬁle system
release date and number of tasks known at submission
computation time known only after completion
goal: minimize maximum stretch
approach:
based on virtual machine technology
fractional resource allocations
preemption / migration
on-line scheduling problem → sequence of off-line resource
allocation problems
computable, on-line performance metric related to max
stretch
heuristics for task placement/resource allocation
DFRS in Practice

Task Placement and Resource Allocation Problem
nodes comprise a number of resources (CPU cores,
memory, bandwith, etc...)
tasks have rigid requirements and fluid needs
job tasks have the same requirements and needs
rigid requirements specify minimum resource allocations
fluid needs specify minimum allocations without
performance degradation
yield of a job is a value between 0 and 1
correlated with performance
jobs allocated product of yield and need in each resource
Goal: find an allocation that maximizes the minimum yield
DFRS in Practice

Task Placement Heuristics
Greedy Task Placement – incremental, puts each task on
the node with the lowest computational load on which it
can ﬁt without violating memory constraints
MCB Task Placement – global, iteratively applies
multi-capacity (vector) bin-packing heuristics during a
binary search for the maximized minimum yield
achieves higher minimum yield values than Greedy
can potentially cause lots of migration
what if the system is oversubscribed?
use a priority function to decide which jobs to run
DFRS in Practice

Priority Function?
Virtual Time: subjective time experienced by a job
ﬁrst idea: 1
VIRTUAL TIME
informed by ideas about fairness
lead to good results
theoretically prone to starvation
second idea: FLOW TIME
VIRTUAL TIME
addresses starvation problem
lead to poor performance
Third Idea: FLOW TIME
(VIRTUAL TIME)2
combines idea #1 and idea #2
addresses starvation
performs about the same as ﬁrst priority function
DFRS in Practice

When to apply Heuristics
We consider a number of different options:
Job Submission – heuristics can use greedy or bin packing
approaches
Job Completion – as above, can help with throughput
when there are lots of short running jobs
Periodically – some heuristics periodically apply vector
packing to improve overall job placement
DFRS in Practice

Methodology
experiments conducted using discrete event simulator
mix of synthetic and real trace data
considered only memory requirements and CPU needs
preemption/migration of jobs causes 300 second penalty to
runtime
periodic approaches use a 600 second (10 minute) period
absolute bound on max stretch computed for each instance
performance comparison based on degradation from max
stretch bound
DFRS in Practice

Simulation Algorithms
FCFS
EASY
Greedy*
GreedyPM*
/per
GreedyPM*/per
MCB*/per
GreedyPM*/per/mvt
MCB*/per/mvt
/stretch-per
DFRS in Practice

Max Stretch Degradation vs. Load, 5 minute penalty
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
MaxstretchdegradationFromBound
Load
FCFS
EASY
Greedy*
GreedyPM*
/per
GreedyPM*/per
MCB*/per
/stretch-per
MCB*/per/mvt
GreedyPM*/per/mvt
DFRS in Practice

Bandwidth vs. Period
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
2000 4000 6000 8000 10000 12000
AverageBandwidth(GB/s)
Period (seconds)
DFRS in Practice

Max Stretch Degradation vs. Period
5
6
7
8
9
10
11
12
13
14
2000 4000 6000 8000 10000 12000
AverageMaxstretchDegradation
Period (seconds)
DFRS in Practice

Development Platform
Grid5000 (http://www.grid5000.fr)
Kadeploy images
Genepi Nodes: Quad Core Xeon 2.5GHz w/ 8GB ram
Open-Source Software
Xen 4.0
Debian Squeeze / Linux 2.6.32
DFRS in Practice

Goals
“small” experiment topics:
VM overhead
resource requirement/needs estimation
time sharing performance impact
preemption/migration costs
bandwidth consumption
performance impact
migration planning strategies
“large” experiment topic: DFRS in practice
DFRS in Practice

Estimating Resource Needs
Techniques:
repeated runs for benchmarking / model building
online model building
introspection / conﬁguration variation
Models:
constant values
time-varying values
random variables / probability distributions
Questions:
how accurate can we be?
how accurate do we need to be?
DFRS in Practice

Benchmark Task VM Memory Requirements
bench/tasks 1 2 3 4
CG 477M 251M – 147M
EP <64M <64M <64M <64M
FT 1930M 977M – 499M
IS 637M 329M – 176M
LU 218M 125M – 80M
MG 485M 259M – 147M
SP 369M – – 123M
DFRS in Practice

Time Sharing Performance Impact
Process thrashing!!!
Gang Scheduling
Flexible Co-scheduling
Uncoordinated might not be so bad...
DFRS in Practice

VM Consolidation
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8
slowdown
consolidation factor (maximum tasks/core)
y = x
CG
EP
FT
IS
LU
MG
DFRS in Practice

Parallel Task Preemption/Migration
preemption of parallel distributed jobs is theoretically
difﬁcult
in the general case this may require coordinated shutdown
on the target platform everything happens quickly enough
that it isn’t an issue
migration can be done by preempt/restore or “live”
DFRS in Practice

Preemption/Migration Timing Results
0
10
20
30
40
50
60
70
80
90
100
400 600 800 1000 1200 1400 1600 1800 2000
Time(seconds)
Memory Allocation (MB)
Preempt
Restore
Preempt+Restore
Migrate
DFRS in Practice

Migration Planning
using only live migration is NP-hard
could fail (circular dependencies)
heuristic: try to move the highest priority job
if it can’t move, try next highest
if none can move, preempt the lowest priority job scheduled
for migration and try again
restore any preempted jobs at the end
DFRS in Practice

Planned Implementation Validation Experiment
generate traces using an accepted model
annotate them with CPU/memory values
”ﬁll in” the traces using benchmarks...
NAS parallel benchmarks (HPC)
TCPP benchmarks (Service Hosting)
DFRS in Practice

Future Work
complete development of practical implementation
extend the model
nonlocal resources
heterogeneous nodes
varying communications delays
new or arbitrary optimization targets
fault tolerance
energy or other costs
formal deﬁnitions of fairness
DFRS in Practice

Summary
DFRS performs well in simulation
orders of magnitude better than batch
close to a theoretical bound
prototype in development
preliminary results are promising
DFRS in Practice

References I
Stillwell, M., Schanzenbach, D., Vivien, F., and Casanova,
H. (2010).
Resource allocation algorithms for virtualized service
hosting platforms.
Journal of Parallel and Distributed Computing,
70(9):962–974.
DFRS in Practice

Dynamic Fractional Resource Scheduling Practical Issues and Future Directions -- 2011, Cetraro

Recomendados

Recomendados

Más contenido relacionado

Similar a Dynamic Fractional Resource Scheduling Practical Issues and Future Directions -- 2011, Cetraro

Similar a Dynamic Fractional Resource Scheduling Practical Issues and Future Directions -- 2011, Cetraro (20)

Último

Último (20)

Dynamic Fractional Resource Scheduling Practical Issues and Future Directions -- 2011, Cetraro