2. Origins
Formalized by
‣ R. Goldberg. Architectural Principles for Virtual Computer Systems. Ph.D. thesis,
Harvard Univer- sity, Cambridge, MA, 1972.
‣ G. Popek and R. Goldberg. Formal Requirements for Virtualizable 3rd
Generation Architectures. Communications of the A.C.M., 17(7):412–421, 1974.
By their standards,
‣ Virtual Machine : an efficient, isolated duplicate of the real machine.
‣ Virtual Machine Monitor is a piece of software which meets the following requirements
• Equivalent execution. Programs running in a virtual environment run identically to
running natively, barring differences in resource availability and timing.
• Performance. A “statistically dominant” subset of instructions must be executed
directly on the CPU.
• Safety. A VMM must completely control system resources.
3. Origins
Instruction types
‣ Privileged
• an instruction traps in unprivileged (user) mode but not in privileged
(supervisor) mode.
‣ Sensitive
✓ Control sensitive
• attempts to change the memory allocation or privilege mode
✓ Behavior sensitive
• Location sensitive – execution behavior depends on location in memory
• Mode sensitive – execution behavior depends on the privilege mode
‣ Innocuous – an instruction that is not sensitive
Theorem
For any conventional third generation computer, a virtual machine monitor may be
constructed if the set of sensitive instructions for that computer is a subset of the set of
privileged instructions.
The IA-32/x86 architecture is not virtualizable.
4. Full virtualization (direct execution)
Exact hardware exposed to OS
Efficient execution
OS runs unchanged
Requires a “virtualizable”
architecture
Example: VMWare ESX
Paravirtualization
OS modified to execute under
VMM
Requires porting OS code
Execution overhead
Necessary for some (popular)
architectures (e.g., x86)
Examples: Xen
5. SIMULATE(d)
sensitive
innocuous innocuous
IDENT(ical)
Binary Translation
Binary – input is machine-level code
Dynamic – occurs at runtime
On demand – code translated when needed for execution
System level – makes no assumption about guest code
Subsetting – translates from full instruction set to safe subset
Adaptive – adjust code based on guest behavior to achieve efficiency
6. Intel® Virtualization Technology
What is Intel VT? (formerly known as Vanderpool)
- Silicon level virtualization support to eliminate virtualization holes
- Unmodified guest OSes can be executed.
- VT-x : for the IA-32 architecture
- VT-i : for the Itanium architecture
- VT-d : for Directed I/O
- cf. AMD-V (known as Pacifica)
Benefits with VT-x
- Reduce size and complexity of VMM SW
- Reduce the need for VMM intervention
- Reduce the need for memory overhead (no sidetable…)
- Avoids need to modify guest OSes allowing them to run directly on the HW
7. Intel VT-x Architecture
• Two new forms of CPU operation
- VMX root operation – for use by a VMM
- VMX non-root operation – similar to that
of IA-32 without VT-x
- Both forms of operation support all four
privilege levels
- Guest OS can run at its intended privilege
level
• Two new transitions
- VM entry – from VMX root operation to
non-root operation
- VM exit – from VMX non-root operation to
root operation
• Under VMX non-root operation, Many
instructions/events cause VM exits
8. Intel VT-x Architecture
• Two new forms of CPU operation
- VMX root operation – for use by a VMM
- VMX non-root operation – similar to that
of IA-32 without VT-x
- Both forms of operation support all four
privilege levels
- Guest OS can run at its intended privilege VM VM
level
• Two new transitions Ring 3 Apps Apps
- VM entry – from VMX root operation to Ring 0 OS OS
non-root operation VM Exit VM Entry
- VM exit – from VMX non-root operation to VMX
VMM
root operation Root
• Under VMX non-root operation, Many Intel® Virtualization Technology
instructions/events cause VM exits Shared Physical Hardware
9. Virtual Machine Control Structure
A new data structure.
VMCS is created for each virtual CPU.
VMCS includes guest-state area and host-
state area
At transition, corresponding state is loaded/
saved VM Exiting events control
10. Virtual Machine Control Structure
A new data structure.
VMCS is created for each virtual CPU.
VMCS includes guest-state area and host-
state area
At transition, corresponding state is loaded/
saved VM Exiting events control
11. Virtual Machine Control Structure
VM entry
A new data structure.
VMCS is created for each virtual CPU.
VMCS includes guest-state area and host-
state area
At transition, corresponding state is loaded/
saved VM Exiting events control
12. Virtual Machine Control Structure
VM exit
A new data structure.
VMCS is created for each virtual CPU.
VMCS includes guest-state area and host-
state area
At transition, corresponding state is loaded/
saved VM Exiting events control
13. Virtual Machine Control Structure
A new data structure.
VMCS is created for each virtual CPU.
VMCS includes guest-state area and host-
state area
At transition, corresponding state is loaded/
saved VM Exiting events control
14. VM exit/entry
Instructions, such as CPUID, MOV
from/to CR3, are intercepted as
VM exit.
Exceptions/faults, such as page
fault, are intercepted as VM exits,
and virtualized exceptions/faults
are injected on VM entry to guests.
External interrupts unrelated to
guests are intercepted as VM exits,
and virtualized interrupts are
injected on VM entry to the guests.
15. Performance
100000 10
Native Software VMM
Software VMM Hardware VMM
Hardware VMM
10000
8
CPU cycles (smaller is better)
1000 3.8GHz P4 672 2.66GHz Core 2 Duo
Overhead (seconds)
VM entry6 2409 937
Page fault VM exit 1931 1186
100 VMCB read 178 52
VMCB write
4 171 44
10 Table 1. Micro-architectural improvements (cycles).
2
1
System calls were similar in frequency to PTE modifications.
However, while the software VMM slows down system calls sub-
0
0.1 stantially, on an end-to-end basis system calls were not frequent
syscall in cr8wr callret pgfault divzero ptemod
enough to offset the hardware VMM’s penalty for PTE ptemod transla
syscall in/out cr8wr callret pgfault
modifica-
tion (and I/O instructions), and the hardware VMM incurs consider-
Figure 4. Virtualization nanobenchmarks. ably more Figure 5. Sources of virtualization overhead in workload.
total overhead than the software VMM in this an XP boot/h
The cost of running the binary translator (vs. executing the
translated code) is rarely significant; see again Figure 5. There are
tween the two VMMs, the hardware VMM inducing approximately two reasons. First, the TC captures the working 35 cycles, about fou
4.4 times greater overhead than the software VMM. Still, this pro- structions, completing the %cr8 write in set and continued
execution amortizes away translation costs for long-running work-
faster than native.
gram stresses many divergent paths through both VMMs, such as loads. Second, the translator is quite fast because it does flow. anal-
system calls, context switching, creation of address spaces, modifi- call/ret. BT slows down indirect control little We targ
ysis (2300 cyclesby repeatedly calling a subroutine. Since kcy- ha
overhead per x86 instruction, compared with 100-200 the
cation of traced page table entries, and injection of page faults. cles per Java bytecode for some optimizing JITs [1]). High trans- the
VMM executes calls and returns without modification,
lator throughput ensures goodboth execute the call/return pair in 11
ware VMM and native performance even for a worst-case
6.3 Virtualization nanobenchmarks workload like boot/halt that mostly executes cold code.
16. Conclusion
• While the new hardware removes the need
for BT and simplifies VMM design, it rarely
improves performance.
• Hardware overheads will shrink over time
as technology matures.
17. References
• Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex
Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of
virtualization. In Proceedings of the ACM Symposium on Operating Systems
Principles, October 2003.
• Jacob Faber Kloster, Jesper Kristensen, and Arne Mejlholm. Efficient
memory sharing in the xen virtual machine monitor. http://www.cs.aau.dk/
library/cgi-bin/detail.cgi?id=1136884892, January 2006.
• Gil Neiger, Amy Santoni, Felix Leung, Dion Rodgers, Rich Uhlig. Intel
Virtualization Technology:Hardware Support for Efficient Processor
Virtualization. Intel Technology Journal Volume 10, Issue 3, 2006
• J. Fisher-Ogden. Hardware support for efficient virtualization. http://
cseweb.ucsd.edu/~jfisherogden/hardwareVirt.pdf, 2006.
• http://courses.cs.vt.edu/cs5204/fall09-kafura/
18. Definitions
Virtualization
‣ A layer mapping its visible interface and resources onto the interface and
resources of the underlying layer or system on which it is implemented
‣ Purposes
• Abstraction – to simplify the use of the underlying resource (e.g., by
removing details of the resource’s structure)
• Replication – to create multiple instances of the resource (e.g., to
simplify management or allocation)
• Isolation – to separate the uses which clients make of the underlying
resources (e.g., to improve security)
Virtual Machine Monitor (VMM)
‣ A virtualization system that partitions a single physical “machine” into
multiple virtual machines.
Terminology
‣ Host – the machine and/or software on which the VMM is implemented
‣ Guest – the OS which executes under the control of the VMM