SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
CPU optimizations in the CERN Cloud
Ops Midcycle - High Performance Computing with OpenStack - Manchester, 2016
Belmiro Moreira
belmiro.moreira@cern.ch @belmiromoreira
Arne Wiebalck
Tim Bell
Sean Crosby (Univ. of Melbourne)
Ulrich Schwickerath
What is CERN?
3
CERN Cloud – LHC and Experiments
4
CMS detector
https://www.google.com/maps/streetview/#cern
CERN Cloud – AMS
5
OpenStack at CERN by numbers
6
~ 5500 Compute Nodes (~140k cores)
•  ~ 5300 KVM
•  ~ 200 Hyper-V
~ 2800 Images ( ~ 44 TB in use)
~ 2000 Volumes ( ~ 800 TB allocated)
~ 2200 Users
~ 2500 Projects
> 17000 VMs running
Number of VMs created (green) and VMs deleted (red) every 30 minutes
The “20% overhead” problem
•  When running the batch system on top of the Cloud Infrastructure
we reach the limit of the total number of hosts in LSF
•  On our batch full node VMs we noticed that the HS06 rating was
~20% lower than on the underlying host
•  Smaller VMs behaved much better: ~8% (sum of simultaneous
HS06 runs on 4x8core VMs on a 32core host)
7
HS06 on virtual batch workers
8
HWDB
HS06
VM Size
(cores)
Per VM
HS06
Total HS06 Overhead
357±16
4x 8 82.3±11 329 7.8%
2x 16 150±5 300 16%
1x 32 284±11 284 20.4%
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Testing Optimizations – KSM off
9
•  ATLAS T0 batch VMs show an IOwait of 20-30%
•  Compute nodes started to swap even when leaving 2 GB for
the OS
Optimization by numbers – EPT off
10
HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead
357±16
4x 8 82.3±11 329 7.8%
2x 16 150±5 300 16%
1x 32 284±11 284 20.4%
HWDB HS06
VM Size
(cores)
Per VM HS06 Total HS06 Overhead
Overhead
Reduction
357±16
4x 8 87±11 348 2.5% 68%
2x 16 163.5±1 327 8.4% 52%
1x 32 311±1 311 12.9% 37%
Before:
After:
General virtualization issue?
•  Crosscheck w/ SLC6 VMs on Hyper-V
-  0.8% HS06 loss on 4x 8-core
-  3.3% HS06 loss on 1x 32-core SLC6 VM
•  No general virtualization overhead issue!
-  Rather a feature or configuration issue
•  What’s the difference between the VMs on Hyper-V and KVM?
11
NUMA
•  Hyper-V VMs have vCPUs pinned to
physical NUMA nodes
-  Pinned to sets that correspond to
physical NUMA nodes
•  OpenStack wider support for this is available in Kilo
12
NUMA - in the lab
… reduced the overhead to ~3% of the bare metal
13
Deploying in production
•  EPT off; KSM on; NUMA-aware
•  System services add ~1-2% overhead
•  We got a total overhead of:
~5%
14
and then Extremely slow nodes...
•  Small fraction of jobs 10x slower
-  VMs look OK, actually pretty good
-  Hosts: 30-50% system load, >100k IRQ/s
(mostly TLB shoot-downs)
•  Load attributed to qemu-kvm
-  ‘perf top’: 90% in _raw_spin_lock
-  ‘systemtap’: paging64_page_fault
and kvm_mmu_pte* …
15
VM CPU utilization
Compute Node CPU utilization
Back to the drawing board
•  Needed to combine optimizations with EPT on
•  Huge pages a way out?
-  Idea: reduce the number of pages to be handled, increase hit ratio
•  1GB huge pages
-  Best HS06 results (with EPT on)
•  2MB huge pages
-  Also one of the default sizes
-  Performance loss around 5% compared to bare metal on batch VMs
16
Optimization by numbers
17
- NUMA + Pinning
- 2MB huge pages
- EPT on
- KSM on
VM sizes
(cores)
Before After
4x 8 7.8% 3.3%
2x 16 16% 4.6%
1x 32 20.4% 3-6%
Deploy in production
•  A small fraction can cause a lot of trouble…
18
Summary
•  Reduced the virtualization HS06 overhead to a few
percent compared to bare metal
-  On full node VMs!
-  NUMA + pinning + huge pages + EPT on + KSM on
•  Pre-deployment testing very difficult
-  EPT off side-effects initially undetected
19
belmiro.moreira@cern.ch
@belmiromoreira
http://openstack-in-production.blogspot.com

Más contenido relacionado

La actualidad más candente

OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebula Project
 
Open stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyhOpen stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyh
OpenCity Community
 

La actualidad más candente (20)

Learning to Scale OpenStack
Learning to Scale OpenStackLearning to Scale OpenStack
Learning to Scale OpenStack
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStack
 
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
 
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
 
Stig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputerStig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputer
 
20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science
 
Supercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPCSupercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPC
 
Geneve
GeneveGeneve
Geneve
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
Monitoring Large-scale Cloud Infrastructures with OpenNebula
Monitoring Large-scale Cloud Infrastructures with OpenNebulaMonitoring Large-scale Cloud Infrastructures with OpenNebula
Monitoring Large-scale Cloud Infrastructures with OpenNebula
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
 
[OpenInfra Days Korea 2018] (Track 3) - CephFS with OpenStack Manila based on...
[OpenInfra Days Korea 2018] (Track 3) - CephFS with OpenStack Manila based on...[OpenInfra Days Korea 2018] (Track 3) - CephFS with OpenStack Manila based on...
[OpenInfra Days Korea 2018] (Track 3) - CephFS with OpenStack Manila based on...
 
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatHyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
 
Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStack
 
Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in Production
 
Open stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyhOpen stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyh
 
Containers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKAContainers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKA
 
Antoine Coetsier - billing the cloud
Antoine Coetsier - billing the cloudAntoine Coetsier - billing the cloud
Antoine Coetsier - billing the cloud
 
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus NetworksOpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
 

Similar a CPU Optimizations in the CERN Cloud - February 2016

AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 

Similar a CPU Optimizations in the CERN Cloud - February 2016 (20)

KVM Tuning @ eBay
KVM Tuning @ eBayKVM Tuning @ eBay
KVM Tuning @ eBay
 
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld 2013: Extreme Performance Series: Monster Virtual Machines VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Resilience at Extreme Scale
Resilience at Extreme ScaleResilience at Extreme Scale
Resilience at Extreme Scale
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
 
Cassandra Performance Benchmark
Cassandra Performance BenchmarkCassandra Performance Benchmark
Cassandra Performance Benchmark
 
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
 
Presentation v mware performance overview
Presentation   v mware performance overviewPresentation   v mware performance overview
Presentation v mware performance overview
 
z/VM Performance Analysis
z/VM Performance Analysisz/VM Performance Analysis
z/VM Performance Analysis
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

CPU Optimizations in the CERN Cloud - February 2016

  • 1.
  • 2. CPU optimizations in the CERN Cloud Ops Midcycle - High Performance Computing with OpenStack - Manchester, 2016 Belmiro Moreira belmiro.moreira@cern.ch @belmiromoreira Arne Wiebalck Tim Bell Sean Crosby (Univ. of Melbourne) Ulrich Schwickerath
  • 4. CERN Cloud – LHC and Experiments 4 CMS detector https://www.google.com/maps/streetview/#cern
  • 6. OpenStack at CERN by numbers 6 ~ 5500 Compute Nodes (~140k cores) •  ~ 5300 KVM •  ~ 200 Hyper-V ~ 2800 Images ( ~ 44 TB in use) ~ 2000 Volumes ( ~ 800 TB allocated) ~ 2200 Users ~ 2500 Projects > 17000 VMs running Number of VMs created (green) and VMs deleted (red) every 30 minutes
  • 7. The “20% overhead” problem •  When running the batch system on top of the Cloud Infrastructure we reach the limit of the total number of hosts in LSF •  On our batch full node VMs we noticed that the HS06 rating was ~20% lower than on the underlying host •  Smaller VMs behaved much better: ~8% (sum of simultaneous HS06 runs on 4x8core VMs on a 32core host) 7
  • 8. HS06 on virtual batch workers 8 HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead 357±16 4x 8 82.3±11 329 7.8% 2x 16 150±5 300 16% 1x 32 284±11 284 20.4% Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
  • 9. Testing Optimizations – KSM off 9 •  ATLAS T0 batch VMs show an IOwait of 20-30% •  Compute nodes started to swap even when leaving 2 GB for the OS
  • 10. Optimization by numbers – EPT off 10 HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead 357±16 4x 8 82.3±11 329 7.8% 2x 16 150±5 300 16% 1x 32 284±11 284 20.4% HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead Overhead Reduction 357±16 4x 8 87±11 348 2.5% 68% 2x 16 163.5±1 327 8.4% 52% 1x 32 311±1 311 12.9% 37% Before: After:
  • 11. General virtualization issue? •  Crosscheck w/ SLC6 VMs on Hyper-V -  0.8% HS06 loss on 4x 8-core -  3.3% HS06 loss on 1x 32-core SLC6 VM •  No general virtualization overhead issue! -  Rather a feature or configuration issue •  What’s the difference between the VMs on Hyper-V and KVM? 11
  • 12. NUMA •  Hyper-V VMs have vCPUs pinned to physical NUMA nodes -  Pinned to sets that correspond to physical NUMA nodes •  OpenStack wider support for this is available in Kilo 12
  • 13. NUMA - in the lab … reduced the overhead to ~3% of the bare metal 13
  • 14. Deploying in production •  EPT off; KSM on; NUMA-aware •  System services add ~1-2% overhead •  We got a total overhead of: ~5% 14
  • 15. and then Extremely slow nodes... •  Small fraction of jobs 10x slower -  VMs look OK, actually pretty good -  Hosts: 30-50% system load, >100k IRQ/s (mostly TLB shoot-downs) •  Load attributed to qemu-kvm -  ‘perf top’: 90% in _raw_spin_lock -  ‘systemtap’: paging64_page_fault and kvm_mmu_pte* … 15 VM CPU utilization Compute Node CPU utilization
  • 16. Back to the drawing board •  Needed to combine optimizations with EPT on •  Huge pages a way out? -  Idea: reduce the number of pages to be handled, increase hit ratio •  1GB huge pages -  Best HS06 results (with EPT on) •  2MB huge pages -  Also one of the default sizes -  Performance loss around 5% compared to bare metal on batch VMs 16
  • 17. Optimization by numbers 17 - NUMA + Pinning - 2MB huge pages - EPT on - KSM on VM sizes (cores) Before After 4x 8 7.8% 3.3% 2x 16 16% 4.6% 1x 32 20.4% 3-6%
  • 18. Deploy in production •  A small fraction can cause a lot of trouble… 18
  • 19. Summary •  Reduced the virtualization HS06 overhead to a few percent compared to bare metal -  On full node VMs! -  NUMA + pinning + huge pages + EPT on + KSM on •  Pre-deployment testing very difficult -  EPT off side-effects initially undetected 19

Notas del editor

  1. I will do my best to answer your questions.