SlideShare a Scribd company logo
1 of 32
Download to read offline
Xen Schedulers and Interrupt Latency
Dario Faggioli & Stefano Stabellini
The case for embedded
virtualization
Galois
SMACCMPPilot
Demo
Xen Summit 2014
Why Xen? Why an hypervisor?
• Efficiency and Consolidation
• Isolation and Partitioning
• Componentization
• Resilience
• Scaling
• Portability
Embedded != Cloud
Different requirements:
• short boot times
• small footprint
• small codebase (certifications)
• non-PCI device assignment
• driver domains
• co-processor virtualization
• low, deterministic irq latency
• real time schedulers
Embedded != Cloud
Different requirements:
• short boot times
• small footprint
• small codebase (certifications)
• non-PCI device assignment
• driver domains
• co-processor virtualization
• low, deterministic irq latency
• real time schedulers
Xen supports/enables
Xen Schedulers
CPU CPU CPU CPU
CPU CPU CPU CPU
Xen Schedulers
CPU CPU CPU CPU
CPU CPU CPU CPU
Real Time Scheduler
ARINC 653
Regular VM Scheduler
Credit
Dedicated
to 1 VCPU
Dedicated
to 1 VCPU
Automotive
Hardware
Xen
Dom0
Linux Control Domain
UI Domain
Automotive Grade Android
HW Drivers
GPU
Driver
PV Block & Net
frontends
PV Block & Net
Backends
Audio
Driver
GlobalLogic
EPAMEPAM
Xilinx Zynq MPSoC
Xen
Dom0
Linux
Baremetal
App
Toolstack FPGA Driver
Baremetal
App
FPGA Driver
Baremetal
App
FPGA Driver
Baremetal
App
FPGA Driver
FPGA
Dedicated CPU Dedicated CPU Dedicated CPU Dedicated CPU
Latency Impact of
Schedulers
pCPUs and vCPUs...
pCPU0
pCPU1
pCPU2
pCPU3
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu vcpu
vcpu
vcpu
vcpu
We want
to run!!
We are
blocked...
vcpu
vcpu
I’m running
Keeping vCPUs “organised”: runqueues
pCPU0
pCPU1
pCPU2
pCPU3
vcpu vcpu vcpuvcpuvcpurunq
vcpu vcpurunq
runq
vcpu vcpu vcpurunq
Runqueues in Credit1
pCPU0
pCPU1
pCPU2
pCPU3
vcpu vcpu vcpuvcpuvcpurunq0
vcpu vcpurunq1
runq2
vcpu vcpu vcpurunq3
1 runqueue x pCPU
vcpu
vcpu
vcpu
Runqueues in Credit2
pCPU0
pCPU1
pCPU2
pCPU3
vcpu vcpu vcpuvcpurunq
runq
Runqueues are shared
vcpu
vcpu
vcpu
A vCPU Wake-Up in Credit1
pCPU0
pCPU1
pCPU2
pCPU3
vcpu vcpu vcpuvcpuvcpurunq0
vcpu vcpurunq1
runq2
vcpu vcpu vcpurunq3
Case a:
1. vcpu goes in a runq1
2. where can vcpu run?
Hey, pCPU2 is idle!
3. put vcpu in runq2
4. pCPU2 picks up vcpu
from its runqueue
Case b:
1. vcpu goes in a runq3
2. hey, vcpu can prempt
what’s running on
pCPU3!
3. context switch
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu
(1)
(2)
(3)
(4)
vcpu
vcpu
(1)
(2)
(3)
A vCPU Wake-Up in Credit2
pCPU0
pCPU1
pCPU2
pCPU3
vcpu vcpu vcpurunqA
runqB
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu
vcpu
Case a:
1. vcpu goes in a runqA
2. load-balancer moves
vcpu to a less loaded
runq
3. pCPU2 picks up vcpu
from its runqueue
Case b:
1. vcpu goes in runqB
2. pCPU3 picks up vcpu
A vCPU Wake-Up in Credit20.000971102 irq_enter
0.000971102 irq_direct, vec fa, handler = apic_timer_interrupt
0.000971649 raise_softirq TIMER_SOFTIRQ
0.000971962 irq_exit, in_irq = 0
0.000974070 softirq_handler TIMER_SOFTIRQ
0.000976010 tasklet_schedule fn=hvm_assert_evtchn_irq, sched_on=6 (softirq)
0.000976010 tasklet_enqueue fn=hvm_assert_evtchn_irq
0.000976510 raise_softirq TASKLET_SOFTIRQ on cpu 6
0.000978213 softirq_handler TASKLET_SOFTIRQ
0.000981562 tasklet_do_work fn=hvm_assert_evtchn_irq
0.000982017 vcpu_wake d1v1
0.000982437 runstate_change d1v1 blocked->runnable
0.000982987 csched2:update_load
0.000983230 csched2:update_rq_load rq# 0, load = 1, avgload = 34.830%
0.000983430 csched2:update_vcpu_load d1v1, vcpu_load = 11.824%
0.000983735 csched2:runq_insert d1v1, position 0
0.000984060 csched2:runq_tickle_new d1v1, processor = 6, credit = 5292567
0.000984490 csched2:runq_tickle cpu 6
0.000984842 raise_softirq SCHEDULE_SOFTIRQ on cpu 6
0.000985500 softirq_handler SCHEDULE_SOFTIRQ
0.000988941 csched2:schedule cpu 6, rq# 0, idle, SMT idle, tickled
0.000989344 csched2:runq_cand_check d1v1
0.000989611 csched2:runq_candidate d1v1, credit = 5292567
0.000990881 sched_switch prev idle, run for 344.6us
0.000991199 sched_switch next d1v1, was runnable for 5.862us, next slice 1000.0us
0.000991377 sched_switch prev idle next d1v1
0.000991697 runstate_change idle running->runnable
0.000991979 runstate_change d1v1 runnable->running
vcpu wakes-up;
goes in runq
Scheduler triggered
on CPU 6
CPU 6 schedules
vcpu runs
Interrupt arrives
Scheduling
Introduced latency:
9.9620 us
BEST CASE
_This_ _is_ _all_ _good_ ...
… Because, thanks to this, we can offer VMs/users:
• Overcommitting (i.e., having more vCPUs than pCPUs)
• Weighted fair share of pCPU time
• Hard and soft affinity
• Cache and NUMA awareness
• Caps and reservations on pCPU time
… but it _comes_ _at_ _a_ _price_
The ‘null’ Scheduler
A scheduler that does nothing
If we want features & flexibility, we must pay the price :-(
What if we don’t want (or need) them, e.g.:
• Static environments (some embedded usecases)
• Systems (or cpupools, where we know we’ll never have
overcommit)
• For testing/benchmarking (as reference)
Then you can use the ‘null’ scheduler
Runqueues in ‘null’. No, wait...
pCPU0
pCPU1
pCPU2
pCPU3
vcpu
vcpu
vcpu
vcpu
There are no runqs at all!
• vCPUs are statically assigned to pCPUs
• Only 1 vCPUs per pCPU
• Overcommit is possible (i.e.: the system won’t
explode), but use only if you really know what
you’re doing (i.e.: the VMs will likely explode!)
A vCPU Wake-Up in ‘null’
pCPU0
pCPU1
pCPU2
pCPU3
1. vcpu wakes up and run
:-)vcpu
vcpuvcpu
(1)
A vCPU Wake-Up in ‘null’
0.636884641 irq_enter
0.636884641 irq_direct, vec fa, handler = 0xffff82d080267ec4
0.636885492 raise_softirq TIMER_SOFTIRQ
0.636885922 irq_exit, in_irq = 0
0.636889583 softirq_handler TIMER_SOFTIRQ
0.636892021 tasklet_schedule fn=hvm_assert_evtchn_irq, sched_on=5 (softirq)
0.636892021 tasklet_enqueue fn=hvm_assert_evtchn_irq
0.636892836 raise_softirq TASKLET_SOFTIRQ on cpu 5
0.636895074 softirq_handler TASKLET_SOFTIRQ
0.636895607 tasklet_do_work fn=hvm_assert_evtchn_irq
0.636896202 vcpu_wake d1v1
0.636896712 runstate_change d1v1 blocked->runnable
0.636897197 raise_softirq SCHEDULE_SOFTIRQ on cpu 5
0.636898470 softirq_handler SCHEDULE_SOFTIRQ
0.636899465 null:schedule cpu 5, vcpu d1v1
0.636899720 sched_switch prev idle, run for 999936.973us
0.636899970 sched_switch next d1v1, was runnable for 2.411us
0.636900155 sched_switch prev idle next d1v1
0.636900448 runstate_change idle running->runnable
0.636900738 runstate_change d1v1 runnable->running
vcpu wakes-up;
goes in runq
Scheduler triggered
on CPU 6
CPU 6 schedules
vcpu runs
Interrupt arrives
Scheduling
Introduced latency:
4.5360 us
(less than half of
Credit2)
NORMAL CASE
Benchmarks
Hardware and Software configuration
Hardware:
Xilinx Zynq MPSoC: 4 ARM A53 Cores
Physical Timer
Software:
Xen 4.9.0-rc7 (+ phys_timer forwarding patch)
Dom0: Linux 4.9, dom0_mem=1G, max_dom0_vcpus=2
1 vcpu TBM ctest
Fin

More Related Content

More from The Linux Foundation

More from The Linux Foundation (20)

XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
 
XPDDS19: Implementing AMD MxGPU - Jonathan Farrell, Assured Information Security
XPDDS19: Implementing AMD MxGPU - Jonathan Farrell, Assured Information SecurityXPDDS19: Implementing AMD MxGPU - Jonathan Farrell, Assured Information Security
XPDDS19: Implementing AMD MxGPU - Jonathan Farrell, Assured Information Security
 
XPDDS19: Support of PV Devices in Nested Xen - Jürgen Groß, SUSE
XPDDS19: Support of PV Devices in Nested Xen - Jürgen Groß, SUSEXPDDS19: Support of PV Devices in Nested Xen - Jürgen Groß, SUSE
XPDDS19: Support of PV Devices in Nested Xen - Jürgen Groß, SUSE
 
XPDDS19: Application Agnostic High Availability Solution On Hypervisor Level ...
XPDDS19: Application Agnostic High Availability Solution On Hypervisor Level ...XPDDS19: Application Agnostic High Availability Solution On Hypervisor Level ...
XPDDS19: Application Agnostic High Availability Solution On Hypervisor Level ...
 
XPDSS19: Live-Updating Xen - Amit Shah & David Woodhouse, Amazon
XPDSS19: Live-Updating Xen - Amit Shah & David Woodhouse, AmazonXPDSS19: Live-Updating Xen - Amit Shah & David Woodhouse, Amazon
XPDSS19: Live-Updating Xen - Amit Shah & David Woodhouse, Amazon
 
XPDDS19: Xen API Archaeology: Creating a Full-Featured VMI Debugger for the...
XPDDS19:   Xen API Archaeology: Creating a Full-Featured VMI Debugger for the...XPDDS19:   Xen API Archaeology: Creating a Full-Featured VMI Debugger for the...
XPDDS19: Xen API Archaeology: Creating a Full-Featured VMI Debugger for the...
 
XPDDS19: Secure Unikraft Applications with Solo5 - Haibo Xu, ARM
XPDDS19: Secure Unikraft Applications with Solo5 - Haibo Xu, ARMXPDDS19: Secure Unikraft Applications with Solo5 - Haibo Xu, ARM
XPDDS19: Secure Unikraft Applications with Solo5 - Haibo Xu, ARM
 
XPDDS19: The Xen-Blanket for 2019 - Christopher Clark and Kelli Little, Star ...
XPDDS19: The Xen-Blanket for 2019 - Christopher Clark and Kelli Little, Star ...XPDDS19: The Xen-Blanket for 2019 - Christopher Clark and Kelli Little, Star ...
XPDDS19: The Xen-Blanket for 2019 - Christopher Clark and Kelli Little, Star ...
 
XPDSS19: Improve the Reliability and Efficiency of Late Microcode Update - Ch...
XPDSS19: Improve the Reliability and Efficiency of Late Microcode Update - Ch...XPDSS19: Improve the Reliability and Efficiency of Late Microcode Update - Ch...
XPDSS19: Improve the Reliability and Efficiency of Late Microcode Update - Ch...
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

XPDDS17: Xen Schedulers and Their Impact on Interrupt Latency - Stefano Stabellini, Aporeto & Dario Faggioli, Citrix

  • 1. Xen Schedulers and Interrupt Latency Dario Faggioli & Stefano Stabellini
  • 2. The case for embedded virtualization
  • 3.
  • 5. Why Xen? Why an hypervisor? • Efficiency and Consolidation • Isolation and Partitioning • Componentization • Resilience • Scaling • Portability
  • 6. Embedded != Cloud Different requirements: • short boot times • small footprint • small codebase (certifications) • non-PCI device assignment • driver domains • co-processor virtualization • low, deterministic irq latency • real time schedulers
  • 7. Embedded != Cloud Different requirements: • short boot times • small footprint • small codebase (certifications) • non-PCI device assignment • driver domains • co-processor virtualization • low, deterministic irq latency • real time schedulers Xen supports/enables
  • 8. Xen Schedulers CPU CPU CPU CPU CPU CPU CPU CPU
  • 9. Xen Schedulers CPU CPU CPU CPU CPU CPU CPU CPU Real Time Scheduler ARINC 653 Regular VM Scheduler Credit Dedicated to 1 VCPU Dedicated to 1 VCPU
  • 10. Automotive Hardware Xen Dom0 Linux Control Domain UI Domain Automotive Grade Android HW Drivers GPU Driver PV Block & Net frontends PV Block & Net Backends Audio Driver
  • 13. Xilinx Zynq MPSoC Xen Dom0 Linux Baremetal App Toolstack FPGA Driver Baremetal App FPGA Driver Baremetal App FPGA Driver Baremetal App FPGA Driver FPGA Dedicated CPU Dedicated CPU Dedicated CPU Dedicated CPU
  • 15. pCPUs and vCPUs... pCPU0 pCPU1 pCPU2 pCPU3 vcpu vcpu vcpu vcpu vcpu vcpu vcpu vcpu vcpu vcpu vcpu vcpu vcpu vcpu We want to run!! We are blocked... vcpu vcpu I’m running
  • 16. Keeping vCPUs “organised”: runqueues pCPU0 pCPU1 pCPU2 pCPU3 vcpu vcpu vcpuvcpuvcpurunq vcpu vcpurunq runq vcpu vcpu vcpurunq
  • 17. Runqueues in Credit1 pCPU0 pCPU1 pCPU2 pCPU3 vcpu vcpu vcpuvcpuvcpurunq0 vcpu vcpurunq1 runq2 vcpu vcpu vcpurunq3 1 runqueue x pCPU vcpu vcpu vcpu
  • 18. Runqueues in Credit2 pCPU0 pCPU1 pCPU2 pCPU3 vcpu vcpu vcpuvcpurunq runq Runqueues are shared vcpu vcpu vcpu
  • 19. A vCPU Wake-Up in Credit1 pCPU0 pCPU1 pCPU2 pCPU3 vcpu vcpu vcpuvcpuvcpurunq0 vcpu vcpurunq1 runq2 vcpu vcpu vcpurunq3 Case a: 1. vcpu goes in a runq1 2. where can vcpu run? Hey, pCPU2 is idle! 3. put vcpu in runq2 4. pCPU2 picks up vcpu from its runqueue Case b: 1. vcpu goes in a runq3 2. hey, vcpu can prempt what’s running on pCPU3! 3. context switch vcpu vcpu vcpu vcpu vcpu vcpu vcpu (1) (2) (3) (4) vcpu vcpu (1) (2) (3)
  • 20. A vCPU Wake-Up in Credit2 pCPU0 pCPU1 pCPU2 pCPU3 vcpu vcpu vcpurunqA runqB vcpu vcpu vcpu vcpu vcpu vcpu vcpu vcpu vcpu Case a: 1. vcpu goes in a runqA 2. load-balancer moves vcpu to a less loaded runq 3. pCPU2 picks up vcpu from its runqueue Case b: 1. vcpu goes in runqB 2. pCPU3 picks up vcpu
  • 21. A vCPU Wake-Up in Credit20.000971102 irq_enter 0.000971102 irq_direct, vec fa, handler = apic_timer_interrupt 0.000971649 raise_softirq TIMER_SOFTIRQ 0.000971962 irq_exit, in_irq = 0 0.000974070 softirq_handler TIMER_SOFTIRQ 0.000976010 tasklet_schedule fn=hvm_assert_evtchn_irq, sched_on=6 (softirq) 0.000976010 tasklet_enqueue fn=hvm_assert_evtchn_irq 0.000976510 raise_softirq TASKLET_SOFTIRQ on cpu 6 0.000978213 softirq_handler TASKLET_SOFTIRQ 0.000981562 tasklet_do_work fn=hvm_assert_evtchn_irq 0.000982017 vcpu_wake d1v1 0.000982437 runstate_change d1v1 blocked->runnable 0.000982987 csched2:update_load 0.000983230 csched2:update_rq_load rq# 0, load = 1, avgload = 34.830% 0.000983430 csched2:update_vcpu_load d1v1, vcpu_load = 11.824% 0.000983735 csched2:runq_insert d1v1, position 0 0.000984060 csched2:runq_tickle_new d1v1, processor = 6, credit = 5292567 0.000984490 csched2:runq_tickle cpu 6 0.000984842 raise_softirq SCHEDULE_SOFTIRQ on cpu 6 0.000985500 softirq_handler SCHEDULE_SOFTIRQ 0.000988941 csched2:schedule cpu 6, rq# 0, idle, SMT idle, tickled 0.000989344 csched2:runq_cand_check d1v1 0.000989611 csched2:runq_candidate d1v1, credit = 5292567 0.000990881 sched_switch prev idle, run for 344.6us 0.000991199 sched_switch next d1v1, was runnable for 5.862us, next slice 1000.0us 0.000991377 sched_switch prev idle next d1v1 0.000991697 runstate_change idle running->runnable 0.000991979 runstate_change d1v1 runnable->running vcpu wakes-up; goes in runq Scheduler triggered on CPU 6 CPU 6 schedules vcpu runs Interrupt arrives Scheduling Introduced latency: 9.9620 us BEST CASE
  • 22. _This_ _is_ _all_ _good_ ... … Because, thanks to this, we can offer VMs/users: • Overcommitting (i.e., having more vCPUs than pCPUs) • Weighted fair share of pCPU time • Hard and soft affinity • Cache and NUMA awareness • Caps and reservations on pCPU time … but it _comes_ _at_ _a_ _price_
  • 24. A scheduler that does nothing If we want features & flexibility, we must pay the price :-( What if we don’t want (or need) them, e.g.: • Static environments (some embedded usecases) • Systems (or cpupools, where we know we’ll never have overcommit) • For testing/benchmarking (as reference) Then you can use the ‘null’ scheduler
  • 25. Runqueues in ‘null’. No, wait... pCPU0 pCPU1 pCPU2 pCPU3 vcpu vcpu vcpu vcpu There are no runqs at all! • vCPUs are statically assigned to pCPUs • Only 1 vCPUs per pCPU • Overcommit is possible (i.e.: the system won’t explode), but use only if you really know what you’re doing (i.e.: the VMs will likely explode!)
  • 26. A vCPU Wake-Up in ‘null’ pCPU0 pCPU1 pCPU2 pCPU3 1. vcpu wakes up and run :-)vcpu vcpuvcpu (1)
  • 27. A vCPU Wake-Up in ‘null’ 0.636884641 irq_enter 0.636884641 irq_direct, vec fa, handler = 0xffff82d080267ec4 0.636885492 raise_softirq TIMER_SOFTIRQ 0.636885922 irq_exit, in_irq = 0 0.636889583 softirq_handler TIMER_SOFTIRQ 0.636892021 tasklet_schedule fn=hvm_assert_evtchn_irq, sched_on=5 (softirq) 0.636892021 tasklet_enqueue fn=hvm_assert_evtchn_irq 0.636892836 raise_softirq TASKLET_SOFTIRQ on cpu 5 0.636895074 softirq_handler TASKLET_SOFTIRQ 0.636895607 tasklet_do_work fn=hvm_assert_evtchn_irq 0.636896202 vcpu_wake d1v1 0.636896712 runstate_change d1v1 blocked->runnable 0.636897197 raise_softirq SCHEDULE_SOFTIRQ on cpu 5 0.636898470 softirq_handler SCHEDULE_SOFTIRQ 0.636899465 null:schedule cpu 5, vcpu d1v1 0.636899720 sched_switch prev idle, run for 999936.973us 0.636899970 sched_switch next d1v1, was runnable for 2.411us 0.636900155 sched_switch prev idle next d1v1 0.636900448 runstate_change idle running->runnable 0.636900738 runstate_change d1v1 runnable->running vcpu wakes-up; goes in runq Scheduler triggered on CPU 6 CPU 6 schedules vcpu runs Interrupt arrives Scheduling Introduced latency: 4.5360 us (less than half of Credit2) NORMAL CASE
  • 29. Hardware and Software configuration Hardware: Xilinx Zynq MPSoC: 4 ARM A53 Cores Physical Timer Software: Xen 4.9.0-rc7 (+ phys_timer forwarding patch) Dom0: Linux 4.9, dom0_mem=1G, max_dom0_vcpus=2 1 vcpu TBM ctest
  • 30.
  • 31.
  • 32. Fin