This document discusses vSVM design on the Xen hypervisor. It proposes exposing SVM extensions in hardware like PASID, ATS and PRQ through virtual IOMMU capabilities in Xen. This would allow guest VMs to utilize shared virtual addressing between CPU and devices. The design would involve shadow extended context entries pointing to guest PASID tables and queuing invalidation of translation caches between host and guest. Currently Xen supports device assignment but not full IOMMU functionality or SVM extensions for shared virtual addressing across VMs.
3. Software & Services Group
High-Level View of SVM
Intel Confidential 3
CPU Device
X86 page tables x86 page tables
EPT tables VT-d tables
Host/Physical Memory
GVA GVA
GPA GPA
HPA HPA
OS
Managed
VMM
Managed
4. Software & Services Group
Example
• OpenCL 2.0 supports sharing virtual address between CPU and GPU.
• Reduce communication latency between CPU and GPU.
• Simplified GPU programming.
4
SVM
5. Software & Services Group
SVM Extensions in
Hardware
5
• PASID
• Process Address Space ID
• New extension to PCI-e Spec.
• Identify the targeted virtual address space
• DMA Requests without PASID
• Normal memory requests from endpoint devices
• DMA Requests with PASID
• request to application’s virtual address, SVM
• Extended Root Entry
• Upper 64-bit is used
• LCTP: Lower Context Table Pointer
• UCTP: Upper Context Table Pointer
• Extended Context Entry
• Extended to 256 bits
• PASIDPTR: PASID table pointer
• SLPTPTR: Second-Level Page Translation Pointer
• PASID table
• FLPTPTR: First-Level Page Translation Pointer
Ex-Root Table
PASID Table
First Level Page Table
Second Level Page Table
Bus #
devfn #
RTA
Ex-Context Table
pasid #
For request w/ PASID
VT-d DMA Remapping
Hierarchy
7. Software & Services Group
vSVM Design: Architecture on Xen
• vSVM depends on Virtual IOMMU
framework developed by Tianyu Lan and
Chao Gao from Intel.
• Expose capabilities
• Queued invalidation:
– Shadow context entry: GPA of PASID table pointer
used in nested mode
– Forward 1st-level translation cache invalidation
– PRQ response
• Fault reporting
– Unrecoverable fault
– Recoverable fault reporting/servicing (PRQ)
7
IOMMU Driver
vIOMMUIOMMU Driver
IOMMU
Hardware
Xen Hypervisor
Qemu
Guest
IOMMU Fault
PRQ
Response
Translation
Cache Invalidation
vIOMMU
Fault
Shadow
Ex-Context
Entry
Cache Invalidation
for 1st
Translation
vIOMMU
Capabilities
IOMMU
Capabilities
PCIe Support
PCIe Configuration
8. Software & Services Group
vSVM Design: PCI-e Extended Capabilities Exposure
• Expose below PCI capabilities to Guest:
– PASID - Process Address Space ID Extended Capability Structure
– ATS - Address Translation Services Extended Capability Structure
– PRQ - Page Request Extended Capability Structure
8
• More work is required:
– All these capabilities are PCI-e Extended
Capability which is in PCI configuration
space offset 256-4095
– Current PCI device emulation on Xen
supports offset 0-255 only
– Offset 256-4095 can only be accessed with
MMCFG
– A PCI-e device emulation is required
– In QEMU only Q35 supports PCI-e, while
the PIIX440 Xen uses does not
– Finally we need to enable Q35 for Xen
9. Software & Services Group
vSVM Design: Architecture in details
9
Hardware
Memory
pIOMMU
Device1
Device0
Shared VA
Request w/ PASID
Request w/o PASID
Assigned Device
Xen IOMMU Driver
Trans faults
PRQ service
PRQ queue
QI interface
Invalidate
queue
Ext Root
Table
Ext Ctx
Table
PASID
Table
SLP Table
vIOMMU
Guest
gVAS
Guest process1
gVAS
Guest process2
Guest Kernel
gIOMMU Driver
gTrans
faults
Guest PRQ
thread
Guest QI
interface
gPRQ
queue
gInvalidate
queue
Qemu-xen
Newly
added
Modify
Trans faults
Inject msi
RTA/IQT/etc regs
handling
MMIO write
(including PRQ response)
Ctx-cache
invalidation to
update PASID ptr
PCIe
support
Trans faults
QI interface
Guest User
Space
Shared VA
RTA: Root Table Address
IQT: Invalidation Queue Tail
10. Software & Services Group
Xen IOMMU Status
• Root-table->Context-table->Second-level-table are
supported: GPA->HPA.
• Xen does not support IOVA/SVM now. Only
device assignment can work.
• Xen IOMMU protection domain is managed per-
VM.
10