Virtualization allows simultaneous execution of multi-tenant workloads on the same platform, either a server or an embedded system. Unfortunately, it is non-trivial to attribute hardware events to multiple virtual tenants, as some system’s metrics relate to the whole system (e.g., RAPL energy counters). Virtualized environments have then a rather incomplete picture of how tenants use the hardware, limiting their optimization capabilities. Thus, we propose XeMPower, a lightweight monitoring solution for Xen that precisely accounts hardware events to guest workloads. It also enables attribution of CPU power consumption to individual tenants. We show that XeMPower introduces negligible overhead in power consumption, aiming to be a reference design for power-aware virtualized environments.
Full paper: http://ceur-ws.org/Vol-1697/EWiLi16_10.pdf
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
1. Enabling power-awareness
for the Xen Hypervisor
Matteo Ferroni, Juan A. Colmenares, Steven Hofmeyr,
John D. Kubiatowicz, Marco D. Santambrogio
EWiLi’16, 10/06/2016, Pittsburgh PA, USA.
Co-located with the Embedded Systems Week.
2. Outline 2
• Introduction on virtualization
• Problem definition and proposed solution
• Requirements and related work
• Proposed approach and system design
• Use case: power consumption attribution
• Experimental evaluation and results
• Conclusion and future work
3. Introduction
• Embedded ecosystems
– from microcontrollers to multi-core processors
cheaper, smaller, and less power-hungry
• Two main advantages:
– multiple embedded applications on the same System-on-
Chip (SoC)
– application concurrency and parallelism to obtain better
performance
3
4. Virtualization
• Hardware-assisted and software virtualization enter the
context of embedded systems
– Features:
• applications do not need to be changed
• physical resources shared between applications
• strong security and isolation guarantees
• Heterogeneity supported and fostered:
– software heterogeneity, i.e., memory-bound, I/O-bound, or
CPU-bound;
– hardware heterogeneity, i.e., different SoCs and platforms;
4
5. The Xen Hypervisor 5
Slides from: http://www.slideshare.net/xen_com_mgr/xpds16-porting-xen-on-arm-to-a-new-soc-julien-grall-arm
6. The Xen Hypervisor 6
Slides from: http://www.slideshare.net/xen_com_mgr/xpds16-porting-xen-on-arm-to-a-new-soc-julien-grall-arm
7. The Xen Hypervisor 7
Slides from: http://www.slideshare.net/xen_com_mgr/xpds16-porting-xen-on-arm-to-a-new-soc-julien-grall-arm
8. Problem Definition
• Problem definition:
– optimization and consolidation is not an easy task
– power consumption is a major concern
• A strong requirement:
– online monitoring system to accurately observe
applications’ and system’s behavior
8
9. Proposed Solution
• Proposed solution:
– XeMPower, a lightweight hardware and resource
monitoring solution for the Xen hypervisor
• Basic idea:
– monitor hardware events and precisely account them to
each virtual guest
• Use case:
– enable real-time attribution of CPU power consumption
to each guest
9
10. Requirements
• XeMPower requirements:
1. provide precise attribution of hardware events to virtual
tenants;
2. agnostic to the mapping between virtual and physical
resources, hosted applications and scheduling policies;
3. add negligible overhead;
• Proposed approach:
– kernel-level instrumentation to gather Performance
Monitoring Counter (PMC) values
– Dom0-level aggregation and processing
10
11. • Example of the counters of interest:
Performance Monitoring Counter (PMC) 11
12. Related Work: PMC monitoring
• Every monitoring tool is affected by a tradeoff:
– accuracy vs. overhead
• Different approaches to hardware events monitoring:
– Code instrumentation (e.g., Valgrind and IgProf)
• Inject extra code in the applications at compile time and/or runtime,
allowing complex analysis
• High overhead, not suitable for runtime analysis in production
– Performance counter tools (e.g., Perf, OProfile, PAPI
libraries)
• Sample system’s events at different granularity (e.g., thread level,
process level, set of processors, or the entire systems)
• Use kernel modules to access different categories of events:
hardware events, software events (context switches or minor faults),
and tracepoint events (disk I/O and TCP events).
12
13. Related Work: PMC monitoring in Xen
• Using the Xen hypervisor:
– Xenoprof
• a system-wide statistical profiling toolkit based on OProfile
• allows the domain itself to collect its own hardware event
counters (active mode)
• passive mode profiling (i.e., domain treated as a “black
box”) is limited, not agnostic to hosted applications
– Perfctr-Xen
• supports performance counter virtualization and re-
programs the Performance Monitoring Unit (PMU)
configuration registers (e.g., event selectors) at every
context switch
• Good for workload profiling inside a domain, no centralized
runtime monitoring
13
14. Proposed Approach
• At each context
switch, start counting
the hardware events
of interest
• The configured PMC
registers store the
counts associated
with the domain that
is about to run
14
A1 A3
Core 0 Core N
Time
Xen Kernel
…
15. Proposed Approach
• At the next context
switch, read and
store PMC values,
accounted to the
domain that was
running
• Counters are then
cleared
15
A1
1
B1
A2
A1
A3
3
B3
Core 0 Core N
Time
context
switch
context
switch
Xen Kernel
…
16. Proposed Approach
• Steps A and B are
performed at every
context switch on
every system’s CPU
(i.e., physical core or
hardware thread).
• The reason is that
each domain may
have multiple virtual
CPUs (VCPUs).
16
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
Xen Kernel
…
17. Proposed Approach
• Finally, the PMC
values are
aggregated by
domain and finally
reported or used for
other estimations
• Expose the collected
data to a higher level
– how?
17
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
Xen Kernel Dom0
…
18. Proposed Approach
xentrace
• a lightweight trace
capturing facility
present in Xen
• we tag every trace
record with the ID of
the scheduled
domain and its
current VCPU
• a timestamp is kept
to later reconstruct
the trace flow
18
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
19. Use Case: Power Consumption Attribution
Use case
• Enable real-time
attribution of CPU
power consumption
to each guest
• Socket-level energy
measurements are
also read (via Intel
RAPL interface) at
each context switch
19
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
20. Use Case: Power Consumption Attribution
Power models from PMC traces
• High correlation between hardware events
and power consumption [28]
• Non-halted cycle is the best metric to
correlate power consumption (linear
correlation coefficient above 0.95)
• Such correlation suggests that the higher
the rate of non-halted cycles for a domain
is, the more CPU power the domain
consumes
20
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
21. Use Case: Power Consumption Attribution
Power models from PMC traces
• High correlation between hardware events
and power consumption [28]
• Non-halted cycle is the best metric to
correlate power consumption (linear
correlation coefficient above 0.95)
• Such correlation suggests that the higher
the rate of non-halted cycles for a domain
is, the more CPU power the domain
consumes
Idea
• Split system-level power consumption and
account it to virtual guests
21
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
22. Use Case: Power Consumption Attribution
Proposed approach to account
1. For each tumbling window, the XeMPower
daemon calculates the total number of
non-halted cycles (one of the PMC traced)
2. We estimate the percentage of non-halted
cycles for each domain over the total
number of non-halted cycles; this
represents the contribution of each domain
to the whole CPU power consumption
3. Finally, we split the socket power
consumption proportionally to the
estimated contributions of each domain
22
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
23. Experimental evaluation 23
• Back to the XeMPower requirements:
1. provide precise attribution of hardware events to virtual
tenants
2. agnostic to the mapping between virtual and physical
resources, hosted applications and scheduling policies
3. add negligible overhead
• Goals of the experimental evaluation:
– show how XeMPower monitoring components incur
very low overhead under different configurations
and workload conditions
V
V
24. Experimental evaluation 24
• Overhead metric:
– the difference in the system’s power consumption
while using XeMPower versus an off-the-shelf Xen 4.6
installation
• Experimental setup:
– 2.8 GHz quad-core Intel Xeon E5-1410 processor (4
hardware threads)
– a Watts up? PRO meter to monitor the entire
machine’s power consumption
– Each guest repeatedly runs a multi-threaded
compute-bound microbenchmark on three VCPUs
and uses a stripped-down Linux 3.14 as the guest OS
25. Experimental evaluation 25
• Three system configurations:
1. the baseline configuration uses off-the-shelf Xen 4.4
2. the patched configuration introduces the kernel-level
instrumentation without the XeMPower daemon
3. the monitoring configuration is the patched with the XeMPower
daemon running and reporting statistics
• Four running scenarios:
– an idle scenario in which the system only runs Dom0
– 3 running-n scenarios, where n = {1, 2, 3} indicates the number of
guest domains in addition to Dom0
• The idea is to stress the system with an increasing number of
CPU-intensive tenant applications
• This increases the amount of data traced and collected by
XeMPower
26. • Mean power consumption (μ), in Watts, scenarios idle and running-
{1,2,3}, and configurations baseline (b), patched (p), and monitoring
(m)
• Mean power values are reported with their 95% confidence interval
Experimental Results 26
• At a glance, we can see how measurements are pretty close
pinned-VCPU
unpinned-VCPU
27. Experimental Results 27
• We estimate an upper bound ϵ for the maximum overhead using a
hypothesis test:
• A rejection of the null hypothesis means that there is strong statistical
evidence that the power consumption overhead is lower than ϵ
• We compute ϵ for the considered test cases and scenarios, ensuring
average values of power consumption (μ) with confidence: α = 5%
• We want to compare the overhead with the one measured for XenMon, a
performance monitoring tool for Xen
• unlike XeMPower, XenMon does not collect PMC reads
• it is still a reference design in the context of runtime monitoring for
the Xen ecosystem
28. Experimental Results 28
• Estimated upper bound ϵ for the power consumption overhead, in Watts
• Parenthetical values are the overheads w.r.t. mean power consumption
• XeMPower introduces an overhead not greater than 1.18W (1.58%),
observed for the [unpinned-VCPU, running-3, patched] case
• In all the other cases, the overhead is less than 1W (and less than 1%)
• This result is satisfactory compared to an overhead of 1-2% observed for
XenMon, the reference implementation for XeMPower
29. Conclusion and Future Work 29
• We presented XeMPower, a lightweight monitoring
solution for Xen that precisely accounts hardware
events to virtual guests
• We described its use in online attribution of CPU
power consumption to individual domains
• Our results show that XeMPower can provide continuous
statistics with very low overhead compared to an off-
the-shelf Xen installation
• Future work
– explore the complementary use of offline
characterization of both hardware and guest
workloads, towards power consumption
predictions