Power consumption has become a major concern for almost every digital system: from the smallest embedded devices to the biggest data centers, energy and power budgets are always constraining the performance of the system. Moreover, the actual power consumption of these systems is strongly affected by their current “working regime” (e.g., from idle to heavy-load conditions, with all the shades in between), which depends on the guest applications they host, as well as on the external interactions these are subject to. It is then difficult to make accurate predictions on the power consumed by the whole system over time, when it is subject to constantly changing operating conditions: a self-aware and goal-oriented approach to resource allocation may then improve the instantaneous performance of the system, but still the definition of energy saving policies remains not trivial as far as the system is not really able to learn from experience in real world scenarios.
In this context, this thesis proposes a holistic power modeling framework that a wide range of energy and power constrained systems can use to profile their energy and power consumption. Starting from the preliminary experience developed on power consumption models for mobile devices during my M.Sc. thesis, I designed a general methodology that can be tailored on the actual system's features, extracting a specific power model able to describe and predict the future behavior of the observed entity. This methodology is meant to be provided in an “as-a-service” fashion: at first, the target system is instrumented to collect power metrics and workload statistics in its real usage context; then, the collected measurements are sent to a remote server, where data is processed using well known techniques (e.g., Principal Components Analysis, Markov Decision Chains, ARX models, etc.); finally, an accurate power model is built as a function of the metrics monitored on the instrumented system. The generalized approach has been validated in the context of power consumption models for multi-tenant virtualized infrastructures, outperforming results from the state of the art. Finally, the experience developed on power consumption models for server infrastructures led me to the design of a power-aware and QoS-aware orchestrator for multi-tenant systems. On the one hand, I propose a performance-aware power capping orchestrator in a virtualized environment, that aims at maximizing performance under a power cap. On the other hand, I bring the same concepts into a different approach to multi-tenancy, i.e., containerization, thus moving the first steps towards power-awareness for Docker containers orchestration, laying the basis for further research work.
Full thesis: https://www.politesi.polimi.it/handle/10589/132112
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi-tenant Systems
1. Enabling Power-Awareness
For Multi-Tenant Systems
Candidate: Matteo FERRONI
Advisor: Marco D. Santambrogio
Tutor: Donatella Sciuto
Ph.D. Cycle: XXIX
Ph.D. in Information Technology: Final Dissertation
Politecnico di Milano — February, 17th 2017
3. The battery of your smartphone does not last a day.
Credits: http://www.mobileworld.it/2016/01/07/smartphone-ricarica-camminata-62171/
4. A data center needs to deal with power grid limits.
Credits: https://resources.workable.com/systems-engineer-job-description
5. 5
Context definition
Common features
(1) hardware heterogeneity
(2) software multi-tenancy
(3) input variability
Key facts:
• Energy budgets and power caps constrain the performance of the system
• The actual power consumption is affected by a pletora of different actors
(0) A bird's eye view
6. Problem definition and proposed approach
6
• Problems definition
A. How much power is a system going to consume, given certain working conditions?
B. How to control a system to consume less power, still satisfying its requirements?
• Assumption: the system will behave as it did in the past
• High-level approach:
1. Observe the behavior of the system during its real working conditions
2. Build accurate models to describe and predict it
3. Use them to refine decisions and meet goals efficiently
Idea: learn from experience
(0) A bird's eye view
7. 7
Pragmatic methodology
Data-driven power-awareness through a holistic approach
We start from raw data
(power measurements,
load traces, system
stats, etc.)
We are not interested in the
physical components of the
system: it is a black box
We help users and
systems to learn and
predict their power needs
This should be done in automation throughout the whole lifetime of the system
(0) A bird's eye view
8. Outline
1. A first case study: power models for Android devices
2. Generalization: Model and Analysis of Resource Consumption (MARC)
3. Virtual guests monitoring: towards power-awareness for Xen
4. Modeling power consumption in multi-tenant virtualized systems
5. Maximizing performance under a power cap: a hybrid approach
6. Moving forward: containerization, challenges and opportunities
7. Conclusion and future work
8
CONTROLMODEL
9. 9
• We need to observe and model the phenomenon
The need for a model
EnergyBudget(%)
Power
Model
Energy
Behavior
Time-To-Live (s)
Now! Time
9
(1) A first case study: power models for Android devices
10. 10
Model as-a-Service
• Requirements:
• No monitoring and modeling overheads on the system itself
• adapt to different systems/users, as well as to changes over time
• Proposed solution: Model-as-a-Service
a. send raw traces to a remote server
b. compute power models
c. send back predictions and models
parameters
a
b
c
(1) A first case study: power models for Android devices
Power
constrained
system
11. 11
Pragmatic approach
• Modeling approach: “divide et impera”
We experienced a piecewise linear behavior and tried to attribute
this to domain-specific features
Working regime
A
Working regime
B
Working regime
C
= actions on
controllable variables
Exogenous input
(uncontrollable)
(1) A first case study: power models for Android devices
12. 12
Prediction performance w.r.t. SoA approaches
• Baseline
• Android L and Battery Historian (early 2015)
• Makes use of power models to estimate TTLs
• Performance reported for different models
• SM - one model for the user behavior for the whole day
• HM - one model for the user behavior for every hour of the day
• DM - subset of HM, merging similar hours of the day
• I% - Improvements w.r.t. Android L (AL)
(1) A first case study: power models for Android devices
average error values are reported ± standard deviations
13. MODEL
Outline
1. A first case study: power models for Android devices
2. Generalization: Model and Analysis of Resource Consumption (MARC)
3. Virtual guests monitoring: towards power-awareness for Xen
4. Modeling power consumption in multi-tenant virtualized systems
5. Maximizing performance under a power cap: a hybrid approach
6. Moving forward: containerization, challenges and opportunities
7. Conclusion and future work
13
CONTROL
14. Signal
Models
Markov
Models
ARX
Models
PHASE2A
14
A general methodology: the MARC approach
PHASE3
Integration
PHASE1
Data
Conditioning
Traced Battery Level
Battery Discharge
BatteryLevel
25%
30%
35%
40%
45%
50%
55%
60%
65%
70%
75%
80%
85%
90%
95%
100%
Time
32.000s 34.000s 36.000s 38.000s 40.000s
Traced Power
Energy Consumption
Linear Approximation
Linear Approximation
Sudden slope change
Energy
2kJ
4kJ
6kJ
8kJ
10kJ
12kJ
14kJ
16kJ
18kJ
20kJ
22kJ
24kJ
26kJ
28kJ
30kJ
32kJ
Power
2W
4W
6W
8W
10W
12W
14W
16W
18W
20W
22W
24W
26W
28W
30W
Time
0s 200s 400s 600s 800s 1000s 1200s
Traced Power
Energy Consumption
Linear Approximation - IDLE
Linear Approximation - I/O
Linear Approximation - MEM
Linear Approximation - CPU
Linear Approximation - I/O
Linear Approximation - IDLE
Energy
2kJ
4kJ
6kJ
8kJ
10kJ
12kJ
14kJ
16kJ
18kJ
20kJ
22kJ
24kJ
26kJ
28kJ
30kJ
32kJ
Power
2W
4W
6W
8W
10W
12W
14W
16W
18W
20W
22W
24W
26W
28W
30W
Time
0s 200s 400s 600s 800s 1000s 1200s
PHASE2B
PHASE2C
(2) Generalization: Model and Analysis of Resource Consumption (MARC)
• MARC (Model and Analysis of Resource Consumption) is a REST platform
that is able to build resource consumption models in an “as-a-service” fashion
15. 15
A model for each configuration
1
PHASE2B
PHASE2A
Autoregressive Models with Exogenous Variables
3
PHASE2C
Traced Power
Energy Consumption
Linear Approximation - IDLE
Linear Approximation - I/O
Linear Approximation - MEM
Linear Approximation - CPU
Linear Approximation - I/O
Linear Approximation - IDLE
Energy
2kJ
4kJ
6kJ
8kJ
10kJ
12kJ
14kJ
16kJ
18kJ
20kJ
22kJ
24kJ
26kJ
28kJ
30kJ
32kJ
Power
2W
4W
6W
8W
10W
12W
14W
16W
18W
20W
22W
24W
26W
28W
30W
Time
0s 200s 400s 600s 800s 1000s 1200s
FOR EACH
WORKING REGIME
A model
is computed to
characterize the
process
(2) Generalization: Model and Analysis of Resource Consumption (MARC)
16. 16
Predicting configuration switches
1
PHASE2B
PHASE2A
Hidden Markov Models
3
PHASE2C
BY OBSERVING
PERIODICITY
A predictive
configuration
switching model is
computed
Traced Power
Energy Consumption
Linear Approximation
Linear Approximation
Sudden slope change
Energy
2kJ
4kJ
6kJ
8kJ
10kJ
12kJ
14kJ
16kJ
18kJ
20kJ
22kJ
24kJ
26kJ
28kJ
30kJ
32kJ
Power
2W
4W
6W
8W
10W
12W
14W
16W
18W
20W
22W
24W
26W
28W
30W
Time
0s 200s 400s 600s 800s 1000s 1200s
(2) Generalization: Model and Analysis of Resource Consumption (MARC)
17. Tackling the residual non-linearity
17
PHASE2B
PHASE2A
3
PHASE2C
WITHIN EACH
WORKING REGIME
The residual non-
linearity is
addressed by
exploiting time
series analyses
Signal Models and Time Series Analysis
Traced Battery Level
Battery Discharge
BatteryLevel
25%
30%
35%
40%
45%
50%
55%
60%
65%
70%
75%
80%
85%
90%
95%
100%
Time
32.000s 34.000s 36.000s 38.000s 40.000s
1
(2) Generalization: Model and Analysis of Resource Consumption (MARC)
18. MODELCONTROL
Outline
1. A first case study: power models for Android devices
2. Generalization: Model and Analysis of Resource Consumption (MARC)
3. Virtual guests monitoring: towards power-awareness for Xen
4. Modeling power consumption in multi-tenant virtualized systems
5. Maximizing performance under a power cap: a hybrid approach
6. Moving forward: containerization, challenges and opportunities
7. Conclusion and future work
18
19. 19
Use case: Power consumption models for Xen domains
✅
❓Dom 0
Kernel
HW
XEN
CPU MEMORYIO
Drivers
Dom 1
Guest OS
Paravirtualized
Application
Dom 2
Guest OS
Paravirtualized
Application
DomU
Guest OS
Paravirtualized
Application
CONFIG SCHEDULER MMU TIMERS INTERRUPTS
PV frontBack
Toolstack
THE XEN
HYPERVISOR
Type 1 Hypervisor currently employed
in many production environments
• Question: “how much is a virtual tenant consuming?”
(3) Virtual guests monitoring: towards power-awareness for Xen
20. 20
Use case: Power consumption models for Xen domains
✅
❓Dom 0
Kernel
HW
XEN
CPU MEMORYIO
Drivers
Dom 1
Guest OS
Paravirtualized
Application
Dom 2
Guest OS
Paravirtualized
Application
DomU
Guest OS
Paravirtualized
Application
CONFIG SCHEDULER MMU TIMERS INTERRUPTS
PV frontBack
Toolstack
ASSUMPTION
“The power consumption of a system
depends on what the hardware is doing”
• Proposed solution: model virtual tenants power consumption exploiting
hardware events traces, collected and attributed to each one of them
(3) Virtual guests monitoring: towards power-awareness for Xen
21. Tracing the Domains’ behavior
21
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
XEMPOWER
Collect and account hardware events
to virtual tenants in two steps:
1. In the Xen scheduler (kernel-level)
• At every context switch, trace the
interesting hardware events
• e.g., INST_RET,
UNHALTED_CLOCK_CYCLES,
LLC_REF, LLC_MISS
2. In Domain 0 (privileged tenant)
• Periodically acquire the events
traces and aggregate them on a
domain basis
(3) Virtual guests monitoring: towards power-awareness for Xen
22. Outline
1. A first case study: power models for Android devices
2. Generalization: Model and Analysis of Resource Consumption (MARC)
3. Virtual guests monitoring: towards power-awareness for Xen
4. Modeling power consumption in multi-tenant virtualized systems
5. Maximizing performance under a power cap: a hybrid approach
6. Moving forward: containerization, challenges and opportunities
7. Conclusion and future work
22
MODELCONTROL
23. 23
Power models: State-of-Art approaches
Workload classes:
(a) idle
(b) weak I/O intensive
(c) memory intensive
(d) CPU intensive
(e) strong I/O intensive
Use a single power model, built on different hardware events:
A. INST_RET, UNHALTED_CLOCK_CYCLES, LLC_REF, LLC_MISS
B. INST_RET, UNHALTED_CLOCK_CYCLES, LLC_REF
C. UNHALTED_CLOCK_CYCLES, LLC_REF
Configuration
Model A Model B Model C
RMSE
Relative
error
RMSE
Relative
error
RMSE
Relative
error
(a) ± 17.63 W 35.56% ± 16.44 W 32% ± 17.68 W 35%
(b) ± 4.7 W 9.4% ± 5.86 W 11.7% ± 7.17 W 14%
(c) ± 19.11 W 38% ± 34.54 W 70% ± 18.7 W 37%
(d) ± 0.44 W 0.08% ± 0.6W W 1.2% ± 0.42 W 0.08%
(e) ± 2.98 W 5.9% ± 38.57 W 77% ± 3.29 W 6.5%
average ± 8.97 W 17.79% ± 19.20 W 38.38% ± 9.45 W 18.52%
Table 6.9: The modelling errors (Root MSE and mean relative error) obtained with state of the art
Workload
classes
The best average model is the
worst on a single configuration
No model is better than the others
consistently w.r.t. all the configurations
(4) Modeling power consumption in multi-tenant virtualized systems
24. Power modeling flow
24(4) Modeling power consumption in multi-tenant virtualized systems
Models
exploitation
25. 25
• Goals of the experiments:
A. assess the precision of the modeling methodology
B. explore model portability on different hardware platforms
C. evaluate colocation of different tenants
• Benchmarks
– Apache Spark (SVM and PageRank)
– Redis (Memory-intensive)
– MySQL and Cassandra (IO-intensive)
– FFmpeg (CPU-intensive)
Experimental evaluation
(4) Modeling power consumption in multi-tenant virtualized systems
• Experimental setup
– A. WRK: Intel Core i7 @ 3.40GHz
8GB DDR3 RAM
– B. SRV1: Intel Xeon @ 2.80GHz
16GB DDR3 RAM
– C. SRV2: two Intel Xeon @ 2.3GHz
128GB RAM DDR4
26. 26(4) Modeling power consumption in multi-tenant virtualized systems
• RMSE around 1W on average,
under 2W in almost all the cases;
• only three results present a
worse behavior (still under 5W)
• Relative error around 2% on
average, under 4% in almost all
the cases
• only three results present a
worse behavior (still under 10%)
Results generally outperform the
works in literature [1,2,3], even in
the worst cases
[1] Anton Beloglazov, Rajkumar Buyya, Young Choon Lee, Albert Zomaya, et al. A taxon- omy and survey of energy-efficient data centers and cloud computing systems. Advances in computers,
82(2):47–111, 2011
[2] W Lloyd Bircher and Lizy K John. Complete system power estimation: A trickle-down approach based on performance events. In Performance Analysis of Systems & Software, 2007. ISPASS
2007. IEEE International Symposium on, pages 158–168. IEEE, 2007
[3] Hailong Yang, Qi Zhao, Zhongzhi Luan, and Depei Qian. imeter: An integrated vm power model based on performance profiling. Future Generation Computer Systems, 36:267–286, 2014.
27. Outline
1. A first case study: power models for Android devices
2. Generalization: Model and Analysis of Resource Consumption (MARC)
3. Virtual guests monitoring: towards power-awareness for Xen
4. Modeling power consumption in multi-tenant virtualized systems
5. Maximizing performance under a power cap: a hybrid approach
6. Moving forward: containerization, challenges and opportunities
7. Conclusion and future work
27
MODELCONTROL
28. Problem definition
28(5) Maximizing performance under a power cap: a hybrid approach
• Two points of view:
A. minimize power consumption given a minimum performance requirement
B. maximize performance given a limit on the maximum power consumption
• Requirements:
– work in a virtualized environment
– avoid instrumentation of the guest workloads
• Steps towards the goal:
1. identify a performance metric for all the hosted tenants
2. define a resource allocation policy to deal with the requirements
3. extend the hypervisor to provide the right knobs
29. (5) Maximizing performance under a power cap: a hybrid approach
Power capping approaches
29
SOFTWARE APPROACH
✓ efficiency
✖ timeliness
MODEL BASED
MONITORING [3]
THREAD
MIGRATION [2]
RESOURCE
MANAGMENT DVFS [4] RAPL [1]
CPU
QUOTA
HARDWARE APPROACH
✖ efficiency
✓ timeliness
[1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010.
[2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011.
[3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and
ubiquitous computing adjunct publication, pages 171–174. ACM, 2013.
[4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007.
30. 30
[5] H. Zhang and H. Hoffmann. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. In International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS), 2016.
HYBRID APPROACH [5]
✓ efficiency
✓ timeliness
(5) Maximizing performance under a power cap: a hybrid approach
Power capping approaches
SOFTWARE APPROACH
✓ efficiency
✖ timeliness
MODEL BASED
MONITORING [3]
THREAD
MIGRATION [2] RESOURCE
MANAGMENT
DVFS [4]
RAPL [1]
CPU
QUOTA
HARDWARE APPROACH
✖ efficiency
✓ timeliness
32. • The workloads run in paravirtualized domains
32(5) Maximizing performance under a power cap: a hybrid approach
Systemdesign
33. • XeMPUPiL spans over all the layers
33(5) Maximizing performance under a power cap: a hybrid approach
Systemdesign
34. • Instruction Retired (IR) metric gathered and accounted to each domain,
thanks to XeMPower
• The aggregation is done over a time window of 1 second
34(5) Maximizing performance under a power cap: a hybrid approach
Systemdesign
35. • Observation of both hardware events (i.e., IR) and power
consumption (whole CPU socket)
35(5) Maximizing performance under a power cap: a hybrid approach
Systemdesign
36. 36
– given a workload with M virtual resources
and an assignment of N physical resources,
to each pCPUi we assign:
(5) Maximizing performance under a power cap: a hybrid approach
Systemdesign
37. • Hybrid actuation:
– enforce power cap via RAPL
– define a CPU pool for the workload and pin workload’s vCPUs over pCPUs
37(5) Maximizing performance under a power cap: a hybrid approach
Systemdesign
38. 38
• Hybrid actuation:
– enforce power cap via RAPL
– define a CPU pool for the workload and pin workload’s vCPUs over pCPUs
(5) Maximizing performance under a power cap: a hybrid approach
Systemdesign
39. 39
• Hybrid actuation:
– enforce power cap via RAPL
– define a CPU pool for the workload and pin workload’s vCPUs over pCPUs
(5) Maximizing performance under a power cap: a hybrid approach
Systemdesign
40. 40
• Goals of the experiments:
A. how do different workloads perform under a power cap?
B. can we achieve higher efficiency w.r.t. RAPL power cap?
• Benchmarks
– Embarrassingly Parallel (EP)
– IOzone
– cachebench
– Bi-Triagonal solver (BT)
• Three power caps explored: 40W, 30W and 20W
• Results are normalized w.r.t. the performance obtained with no power caps
(5) Maximizing performance under a power cap: a hybrid approach
Experimental evaluation
• Experimental setup
– 2.8-GHz quad-core Intel Xeon
– 32GB of RAM
– Xen hypervisor version 4.4
41. 41
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20
NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
• Preliminary evaluation: how do they perform under a power cap?
(5) Maximizing performance under a power cap: a hybrid approach
42. 42
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20
NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
• Preliminary evaluation: how do they perform under a power cap?
• For CPU-bound benchmarks (i.e., EP and BT), the difference are significant
(5) Maximizing performance under a power cap: a hybrid approach
43. 43
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20
NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
• Preliminary evaluation: how do they perform under a power cap?
• With IO- and/or memory-bound workloads, the performance degradation is
less significant between different power caps
(5) Maximizing performance under a power cap: a hybrid approach
44. 44
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
• Performance of the
workloads with
XeMPUPiL, for different
power caps:
– higher performance
than RAPL, in general
– not always true on a
pure CPU-bound
benchmark (i.e., EP)
(5) Maximizing performance under a power cap: a hybrid approach
45. 45
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
• Performance of the
workloads with
XeMPUPiL, for different
power caps:
– higher performance
than RAPL, in general
– not always true on a
pure CPU-bound
benchmark (i.e., EP)
(5) Maximizing performance under a power cap: a hybrid approach
46. 46
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
• Performance of the
workloads with
XeMPUPiL, for different
power caps:
– higher performance
than RAPL, in general
– not always true on a
pure CPU-bound
benchmark (i.e., EP)
(5) Maximizing performance under a power cap: a hybrid approach
47. Outline
1. A first case study: power models for Android devices
2. Generalization: Model and Analysis of Resource Consumption (MARC)
3. Virtual guests monitoring: towards power-awareness for Xen
4. Modeling power consumption in multi-tenant virtualized systems
5. Maximizing performance under a power cap: a hybrid approach
6. Moving forward: containerization, challenges and opportunities
7. Conclusion and future work
47
MODELCONTROL
48. Containerization: opportunities and challenges
48(6) Moving forward: containerization, challenges and opportunities
A different road to multi-tenancy
• Group the application and all its dependencies in a single container
• The host operating system sees a container as a group of processes
Proposed solution
• A power-aware orchestrator for Docker containers
Manage resources to meet the power consumption goal
• A policy-based system
Guarantee performance of the containers while staying under the power cap
54. 54
• Goals of the experiments:
A. is the software-level power cap stable and precise?
B. are we able to meet the performance requirements of the containers?
• Benchmarks
– fluidanimate (fluid simulation)
– x264 (video encoding)
– dedup (compression)
• Three power caps explored: 40W, 30W and 20W
• All the benchmark containers run simultaneously on the same node
• Baseline: Intel RAPL power capping solution
Experimental evaluation
• Experimental setup
– 2.8-GHz quad-core Intel Xeon
– 32GB of RAM
– Docker 1.11.2
(6) Moving forward: containerization, challenges and opportunities
55. 55
Power cap: 30W
Power cap: 20W
Power cap: 40W
•dedup
•fluidanimate
•x264
(6) Moving forward: containerization, challenges and opportunities
• Comparison between performance-
agnostic approaches: Fair
partitioning policy vs. RAPL
• Performance metric:
Time To Completion
(lower is better)
• Comparable performance, better
results on lower power caps
56. Power cap: 30W
Power cap: 20W
Power cap: 40W
56(6) Moving forward: containerization, challenges and opportunities
•dedup
•fluidanimate
•x264
All policies
• Comparing fair and
performance-aware
approaches
• Performance metric:
Time To Completion
(lower is better)
57. Power cap: 30W
Power cap: 20W
Power cap: 40W
57(6) Moving forward: containerization, challenges and opportunities
•dedup
•fluidanimate
•x264
All policies
• Comparing fair and
performance-aware
approaches
• Performance metric:
Time To Completion
(lower is better)
58. Power cap: 30W
Power cap: 20W
Power cap: 40W
58
• Comparing fair and
performance-aware
approaches
• Performance metric:
Time To Completion
(lower is better)
• fluidanimate is set to High
Priority with a SLO of 400s
(6) Moving forward: containerization, challenges and opportunities
•dedup
•fluidanimate
•x264
All policies
59. Conclusion
1. A first case study: power models for Android devices
Better performance w.r.t. Android L predictions
2. Generalization: Model and Analysis of Resource Consumption (MARC)
Modeling pipeline has been generalized and provided “as-a-service”
3. Virtual guests monitoring: towards power-awareness for Xen
HW events are traced with neglibigle overhead on the system
4. Modeling power consumption in multi-tenant virtualized systems
Better performance w.r.t. SoA approaches
5. Maximizing performance under a power cap: a hybrid approach
Better performance w.r.t. standard RAPL power cap
6. Moving forward: containerization, challenges and opportunities
Promising results towards a performance-aware and power-aware orchestration
59
MODELCONTROL
60. • We want to validate the modeling methodology on different resources
• Time-to-Completion of Hadoop jobs
60
Future Work
• We want to exploit these model to:
• detect anomalies in a distributed
microservice infrastructure
• perform better resource allocation and
consolidation
64. Working Regime identification
64
A single model is not enough: we explored the MARC approach
Question: What is a working regime in this case study?
Identified
a posteriori
by looking at the
different slopes
on the trace graph
Traced power Energy consumption
Energy
0J
20kJ
40kJ
60kJ
80kJ
100kJ
120kJ
140kJ
160kJ
180kJ
200kJ
220kJ
240kJ
Power
25W
30W
35W
40W
45W
50W
55W
60W
65W
70W
75W
Time
0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s
(4) Modeling power consumption in multi-tenant virtualized systems
65. KERNEL DENSITY ESTIMATION (KDE)
By observing the local minima of the
reconstructed distribution of power consumption
we identify
the points where a Working Regime change happens
Working Regime identification - How many are they?
65
Traced power Energy consumption
Energy
0J
20kJ
40kJ
60kJ
80kJ
100kJ
120kJ
140kJ
160kJ
180kJ
200kJ
220kJ
240kJ
Power
25W
30W
35W
40W
45W
50W
55W
60W
65W
70W
75W
Time
0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s
LINEAR RANGES
0: [0W , 42W)
1: [42W , 57W)
2: [57W , +∞)
Probabilitydensity
0
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0,045
0,050
0,055
0,060
0,065
0,070
Power
10W 20W 30W 40W 50W 60W 70W 80W 90W
(4) Modeling power consumption in multi-tenant virtualized systems
66. From hardware events to Working Regimes (1)
66
Weights
0
0,01
0,02
0,03
0,04
0,05
0,06
0,07
0,08
0,09
0,10
0,11
0,12
0,13
0,14
0,15
0,16
Features
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
RELIEFF + KDE
1. ReliefF is used to identify which feature better induce the
Working Regimes classification identified before
(4) Modeling power consumption in multi-tenant virtualized systems
67. 2. For each Working Regime:
The distribution of the values of that feature is reconstructed using KDE
3. The distribution are compared to obtain discriminant values
From hardware events to Working Regimes (2)
67
Weights
0
0,01
0,02
0,03
0,04
0,05
0,06
0,07
0,08
0,09
0,10
0,11
0,12
0,13
0,14
0,15
0,16
Features
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
CLASS 0
CLASS 1
CLASS 2
Probabilitydensity
0
5×10
−11
10×10
−11
15×10
−11
20×10
−11
25×10
−11
30×10
−11
35×10
−11
40×10
−11
45×10
−11
50×10
−11
55×10
−11
60×10
−11
PMC values
0 2×10
9
4×10
9
6×10
9
8×10
9
RELIEFF + KDE
(4) Modeling power consumption in multi-tenant virtualized systems
68. RESULT:
A Working Regime classifier that is able to determine in which
Working Regime the system is, starting from the sampled features
From hardware events to Working Regimes (3)
68
Weights
0
0,01
0,02
0,03
0,04
0,05
0,06
0,07
0,08
0,09
0,10
0,11
0,12
0,13
0,14
0,15
0,16
Features
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
CLASS 0
CLASS 1
CLASS 2
Probabilitydensity
0
5×10
−11
10×10
−11
15×10
−11
20×10
−11
25×10
−11
30×10
−11
35×10
−11
40×10
−11
45×10
−11
50×10
−11
55×10
−11
60×10
−11
PMC values
0 2×10
9
4×10
9
6×10
9
8×10
9
INST_RET
0
[0,
1.235e9]
1
(1.235e9,
3.61e9)
[3.61e9,
5.58e9)
2
(1.235e9,
3.61e9)
(1.235e9,
3.61e9)
[5.58e9,
+∞)
RELIEFF + KDE
(4) Modeling power consumption in multi-tenant virtualized systems
69. From hardware events to Working Regimes (4)
69
RELIEFF + KDE
Weights
0
0,01
0,02
0,03
0,04
0,05
0,06
0,07
0,08
0,09
0,10
0,11
0,12
0,13
0,14
0,15
0,16
Features
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
CLASS 0
CLASS 1
CLASS 2
Probabilitydensity
0
5×10
−11
10×10
−11
15×10
−11
20×10
−11
25×10
−11
30×10
−11
35×10
−11
40×10
−11
45×10
−11
50×10
−11
55×10
−11
60×10
−11
PMC values
0 2×10
9
4×10
9
6×10
9
8×10
9
INST_RET L1_HIT
0
[0,
1.235e9]
1
(1.235e9,
3.61e9)
(2.36362e8,
5.672e8)
[3.61e9,
5.58e9)
2
(1.235e9,
3.61e9)
[0,
2.36362e8]
(1.235e9,
3.61e9)
[5.672e8,
+∞)
[5.58e9,
+∞)
• In case of uncertainty, repeat from ReliefF:
• Eliminating the already selected features
• Eliminating all the data that are not part of the uncertain zone
(4) Modeling power consumption in multi-tenant virtualized systems
76. 76
Resource control
100%
CPU quota cap
available resource
With the feedback control loop logic, we find the
allocation of resources that ensures the power cap
78. 78
Resource partitioning
Containers: C1 C2 C3 C4
?
We explore three different partitioning policies:
• Fair resource partitioning
• Priority-aware resource partitioning
• Throughput-aware resource partitioning
100%
CPU quota cap
available resource
79. • The quota Q is evenly partitioned across all the containers
• No control over the throughput of a single container
79
1. Fair resource partitioning
100%
CPU quota cap
Containers: C1 C2 C3 C4
Q/4 Q/4 Q/4 Q/4
80. 80
2. Priority-aware partitioning
100%
CPU quota cap
Containers:
• The quota Q is partitioned following the priority of each container
• The quota of the single container is estimated through a weighted mean,
where every priority has its own associated weight
High priority:
Low priority:
C1
HIGH LOW
C2 C3 C4
LOW LOW
81. 81
Throughput-aware resource partitioning
C3 C4Best effort:
+ SLO1
+ SLO2
SLO1 SLO2
100%
CPU quota cap
Containers:
High priority:
Low priority:
C1
C2
BE BE
• The quota Q is partitioned following the priority of each container and its
Service Level Objectives (SLO)
• SLO is here defined as the Time-To-Completion (TTC) of the task
82. 82
Experimental setup
All the benchmark containers run simultaneously on the same node
HW OS
CONTAINER
ENGINE
RUNTIME
Intel Xeon
E5-1410
32GB RAM
Ubuntu 14.04
Linux 3.19.0-42
Docker 1.11.2 Python 2.7.6
BENCHMARK CONTAINERS
PARSEC DESCRIPTION
fluidanimate fluid dynamics simulation generic CPU-bound
x264 video streaming encoding e.g., video surveilance
dedup compression cloud-fog communication
83. 83
Goals of the experiments
The comparison is done with the state of the art power capping solution RAPL by Intel[1]
PERFORMANCES OF THE
CONTAINERS
PRECISION OF THE POWER
CAPPING
allocate resource to meet
containers’ requirements
manage machine
power consumption
84. 84
Precision of the power capping
• Comparable results in terms
of average power
consumption under the
power cap
• As expected, RAPL provides
a more stable power capping
•Fair
•Priority-aware
•Throughput-aware
•RAPL
85. 85
Performances: Fair Partitioning vs RAPL
• Comparison between the
performance-agnostic approaches
• Performance metric:
Time To Completion
(lower is better)
Power cap: 30W
Power cap: 20W
Power cap: 40W
•dedup
•fluidanimate
•x264
86. Power cap: 30W
Power cap: 20W
Power cap: 40W
86
Performances: all policies
•dedup
•fluidanimate
•x264
• Comparison with the
performance-aware
approaches
• fluidanimate is set to
High Priority with a SLO
of 400s
• Performance metric:
Time To Completion
(lower is better)
87. 87
Conclusion and future work
✓We presented DockerCap, a power-aware orchestrator
that manages containers’ resources
✓We showed how DockerCap is able to limit the power
consumption of the machine
✓We discussed three distinct partitioning policies and
compared their impact on containers’ SLO
FUTURE DIRECTIONS
• Exploit both HW and SW power capping
• Improve the precision of the power capping with
more refined modeling techniques [2]
• Compute the right allocation of resources online by
observing the performance of the containers
[2] Andrea Corna and Andrea Damiani. A scalable framework for resource consumption modelling: the MARC approach.
Master’s thesis. Politecnico di Milano, 2016.
89. Experimental settings and benchmarks
89
“XARC1”
Dell OptiPlex 990
“SANDY”
Dell PowerEdge T320
Processor Intel Core i7-2600 @ 3.40GHz Intel Xeon CPU E5-1410 @ 2.80GHz
Memory 4 banks of Synchronous 2GB DIMM DDR3
RAM @ 1.33GHz
2 banks of Synchronous 16GB DIMM DDR3
RAM @ 1.60Ghz
Storage Seagate 250GB 7200rpm 8MB cache SATA
3.5” HDD
Western Digital 250GB 7200rpm 16MB
cache SATA 3.5” HDD
Network Intel 82579LM Gigabit Network Connection Broadcom NetXtreme BCM5720 Gigabit
Ethernet PCIe
[1] YANG, Hailong, et al. iMeter: An integrated VM power model based on
performance profiling. Future Generation Computer Systems, 2014, 36:
267-286.
Micro Benchmarks[1]
• NASA Parallel Benchmarks
• CPUMemory Features
• Cachebench
• Cache Hierarchy
• IOzone
• Disk IO Operations
TRAIN SET
Realistic Benchmarks
• Redis Server
• Non-relational DBMS interrogations
• MySQL Server
• Relational DBMS query
• FFMPEG
• AudioVideo transcoding and compression
TEST SET
90. Power models: the MARC approach (1)
90
RMSE
Relative
error
Coverage
Redis ±0.58W 1.10% 100.00%
MySQL ±1.94W 3.80% 100.00%
FFMPEG ±0.51W 1.00% 100.00%
SANDY
TRAIN AND TEST
ON THE SAME
PHYSICAL MACHINE
LOWER BOUND IN THE STATE OF THE ART
5% of relative error [1]
RMSE
Relative
error
Coverage
Redis ±2.07W 4.14% 100.00%
MySQL ±9.27W 18.5% 100.00%
FFMPEG ±1.32W 2.64% 99.90%
XARC1
[1] YANG, Hailong, et al. iMeter: An integrated VM power model based on
performance profiling. Future Generation Computer Systems, 2014, 36:
267-286.
91. Power models: the MARC approach (2)
91
TRAIN ON XARC1,
TEST ON SANDY
RMSE
Relative
error
Coverage
Redis ±0.58W 1.10% 100.00%
MySQL ±1.94W 3.80% 100.00%
FFMPEG ±0.51W 1.00% 100.00%
SANDY
TRAIN AND TEST
ON THE SAME
PHYSICAL MACHINE
RMSE
Relative
error
Coverage
Redis ±0.61W 1.23% 99.70%
MySQL ±1.97W 3.86% 100.00%
FFMPEG ±0.63W 1.26% 100.00%
XARC1
93. 93
The concept of Working Regime
• Domain-specific feature: hardware modules currently used
• We defined the concept of working regime:
“Given the controllable hardware modules on a device,
a working regime is a combination of their internal state”
Working regime
A
Working regime
B
Working regime
C
(1) A first case study: power models for Android devices
94. 94
MISO Model for every configuration
• We tackle the problem of power model estimation in a fixed configuration with a
linear Multiple Input Single Output (MISO) model
Battery
prediction
Previous
battery
levels
Exogenous
input values
Model
parameter
(1) A first case study: power models for Android devices
95. 95
Actions on controllable variables
• They are determined by the user’s behavior
• We model the evolution of the smartphone’s
configuration as a Markov Decision Process
• A state for every configuration
• Transitions’ weights represent the
probability to go from a configuration to
another
Configuration A
Configuration B
Configuration C
(1) A first case study: power models for Android devices
97. Proposed Approach
• At each context
switch, start counting
the hardware events
of interest
• The configured PMC
registers store the
counts associated
with the domain that
is about to run
97
A1 A3
Core 0 Core N
Time
Xen Kernel
…
98. Proposed Approach
• At the next context
switch, read and
store PMC values,
accounted to the
domain that was
running
• Counters are then
cleared
98
A1
1
B1
A2
A1
A3
3
B3
Core 0 Core N
Time
context
switch
context
switch
Xen Kernel
…
99. Proposed Approach
• Steps A and B are
performed at every
context switch on
every system’s CPU
(i.e., physical core or
hardware thread).
• The reason is that
each domain may
have multiple virtual
CPUs (VCPUs).
99
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
Xen Kernel
…
100. Proposed Approach
• Finally, the PMC
values are
aggregated by
domain and finally
reported or used for
other estimations
• Expose the collected
data to a higher level
– how?
100
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
Xen Kernel Dom0
…
101. Proposed Approach
xentrace
• a lightweight trace
capturing facility
present in Xen
• we tag every trace
record with the ID of
the scheduled
domain and its
current VCPU
• a timestamp is kept
to later reconstruct
the trace flow
101
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
102. Use Case: Power Consumption Attribution
Use case
• Enable real-time
attribution of CPU
power consumption
to each guest
• Socket-level energy
measurements are
also read (via Intel
RAPL interface) at
each context switch
102
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
103. Use Case: Power Consumption Attribution
Power models from PMC traces
• High correlation between hardware events
and power consumption [28]
• Non-halted cycle is the best metric to
correlate power consumption (linear
correlation coefficient above 0.95)
• Such correlation suggests that the higher
the rate of non-halted cycles for a domain
is, the more CPU power the domain
consumes
103
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
104. Use Case: Power Consumption Attribution
Power models from PMC traces
• High correlation between hardware events
and power consumption [28]
• Non-halted cycle is the best metric to
correlate power consumption (linear
correlation coefficient above 0.95)
• Such correlation suggests that the higher
the rate of non-halted cycles for a domain
is, the more CPU power the domain
consumes
Idea
• Split system-level power consumption and
account it to virtual guests
104
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
105. Use Case: Power Consumption Attribution
Proposed approach to account
1. For each tumbling window, the XeMPower
daemon calculates the total number of
non-halted cycles (one of the PMC traced)
2. We estimate the percentage of non-halted
cycles for each domain over the total
number of non-halted cycles; this
represents the contribution of each domain
to the whole CPU power consumption
3. Finally, we split the socket power
consumption proportionally to the
estimated contributions of each domain
105
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
106. Experimental evaluation 106
• Back to the XeMPower requirements:
1. provide precise attribution of hardware events to virtual
tenants
2. agnostic to the mapping between virtual and physical
resources, hosted applications and scheduling policies
3. add negligible overhead
• Goals of the experimental evaluation:
– show how XeMPower monitoring components incur
very low overhead under different configurations
and workload conditions
V
V
107. Experimental evaluation 107
• Overhead metric:
– the difference in the system’s power consumption
while using XeMPower versus an off-the-shelf Xen 4.6
installation
• Experimental setup:
– 2.8 GHz quad-core Intel Xeon E5-1410 processor (4
hardware threads)
– a Watts up? PRO meter to monitor the entire
machine’s power consumption
– Each guest repeatedly runs a multi-threaded
compute-bound microbenchmark on three VCPUs
and uses a stripped-down Linux 3.14 as the guest OS
108. Experimental evaluation 108
• Three system configurations:
1. the baseline configuration uses off-the-shelf Xen 4.4
2. the patched configuration introduces the kernel-level
instrumentation without the XeMPower daemon
3. the monitoring configuration is the patched with the XeMPower
daemon running and reporting statistics
• Four running scenarios:
– an idle scenario in which the system only runs Dom0
– 3 running-n scenarios, where n = {1, 2, 3} indicates the number of
guest domains in addition to Dom0
• The idea is to stress the system with an increasing number of
CPU-intensive tenant applications
• This increases the amount of data traced and collected by
XeMPower
109. • Mean power consumption (μ), in Watts, scenarios idle and running-
{1,2,3}, and configurations baseline (b), patched (p), and monitoring
(m)
• Mean power values are reported with their 95% confidence interval
Experimental Results 109
• At a glance, we can see how measurements are pretty close
pinned-VCPU
unpinned-VCPU
110. Experimental Results 110
• We estimate an upper bound ϵ for the maximum overhead using a
hypothesis test:
• A rejection of the null hypothesis means that there is strong statistical
evidence that the power consumption overhead is lower than ϵ
• We compute ϵ for the considered test cases and scenarios, ensuring
average values of power consumption (μ) with confidence: α = 5%
• We want to compare the overhead with the one measured for XenMon, a
performance monitoring tool for Xen
• unlike XeMPower, XenMon does not collect PMC reads
• it is still a reference design in the context of runtime monitoring for
the Xen ecosystem
111. Experimental Results 111
• Estimated upper bound ϵ for the power consumption overhead, in Watts
• Parenthetical values are the overheads w.r.t. mean power consumption
• XeMPower introduces an overhead not greater than 1.18W (1.58%),
observed for the [unpinned-VCPU, running-3, patched] case
• In all the other cases, the overhead is less than 1W (and less than 1%)
• This result is satisfactory compared to an overhead of 1-2% observed for
XenMon, the reference implementation for XeMPower
113. Related work: PUPiL [5] 113
[5] H. Zhang and H. Hoffmann. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. In International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS), 2016.
• PUPiL, a framework that aims to minimize and to maximize respectively
the concept of timeliness and efficiency
• Proposed approach:
– both hardware (i.e., the Intel RAPL interface [10]) and software (i.e.,
resource partitioning and allocation) techniques
– exploits a canonical ODA control loop, one of the main building blocks of
self-aware computing
• Limitations
– the applications running on the system need to be instrumented with the
Heartbeat framework, to provide uniform metric of throughput
– applications running bare-metal on Linux
• These conditions might not hold in the context of a multi-tenant
virtualized environment
114. The Xen Hypervisor 114
Slides from: http://www.slideshare.net/xen_com_mgr/xpds16-porting-xen-on-arm-to-a-new-soc-julien-grall-arm
115. 1. Performance metric identification
• Hardware event counters as low level metrics of
performance
• We exploit the Intel Performance Monitoring Unit (PMU)
to monitor the number of Instruction Retired (IR)
accounted to each domain in a certain time window
– an insight on how many microinstructions were completely
executed (i.e., that successfully reached the end of the
pipeline)
– it represents a reasonable indicator of performance, as the
same manufacturer suggests [6]
115
[6] Clockticks per instructions retired (cpi). https://software.intel.com/en-us/node/544403. Accessed: 2016-06-01.
116. 2. Decision phase and virtualization
• Evaluation criterion: the average IR rate over a certain time
window
– the time window allows the workload to adapt to the actual
configuration
– the comparison of IR rates of different configurations highlights
which one makes the workload perform better
• Resource allocation granularity: core-level
– each domain owns a set virtual CPUs (vCPUs)
– a set of physical CPUs (pCPU) present on the machine
– each vCPU can be mapped on a pCPU for a certain amount of
time, while multiple vCPUs can be mapped on the same pCPU
• We wanted our allocation to cover the whole set of pCPUs, if
possible
116
117. 3. Extending the hypervisor - RAPL
• Working with the Intel RAPL interface:
– harshly cutting the frequency and the voltage of the whole CPU socket
• On a bare-metal operating system:
– reading and writing data into the right Model Specific Register (MSR)
• MSR_RAPL_POWER_UNIT: read processor-specific time, energy and power
units, used to scale each value read or written
• MSR_PKG_RAPL_POWER_LIMIT: write to set a limit on the power
consumption of the whole socket
• In a virtualized environment:
– the Xen hypervisor does not natively support the RAPL interface
– we developed custom hypercalls, with kernel callback functions and
memory buffers
– we developed a CLI tool that performs some checks on the input
parameters, as well as of instantiating and invoking the Xen command
interface to launch the hypercalls
117
118. 3. Extending the hypervisor - Resources
• cpupool tool:
– allows to cluster the physical CPUs in different pools
– the pool scheduler will schedule the domain’s vCPUs only
on the pCPUs that are part of that cluster
– as a new resource allocation is chosen by the decide phase,
we increase or decrease the number of pCPUs in the pool
– pin the domain’s vCPUs to these, to increase workload
stability
• NO xenpm:
– set a maximum and minimum frequency for each pCPU
– it may interfere with the actuation made by RAPL
118
121. 121
Motivation - Modeling approaches (2)
• Controllable environment
• Ad-hoc instrumentation
• Relies on reasonable
simulations
• Does not evolve with the
target
• Requires ex-novo modeling
for new targets
• Intrinsic ability of evolve
with the target
• Tackles new targets
• Does not require in-lab
phases
• Noisy real-world
environment
PROSCONS
OFF-LINE MODELING ON-LINE MODELING
129. 4. MARC PLATFORM
Scalability
14
Load Balancer
Communication Actor Communication Actor
Module Specific
Functional Logic
Module Specific
Functional Logic
SCALE-IN
INTRA-MODULE PARALLELISM
Technologies: Scala - Akka
130. 4. MARC PLATFORM
Scalability
15
SCALE-OUT
MODULE DISTRIBUTION
Load Balancer
Communication Actor Communication Actor
Module Specific
Functional Logic
Module Specific
Functional Logic
=
DOCKER
CONTAINER
Technologies: Scala - Akka - Docker