SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
Achieving the Ultimate Performance
with KVM
Venko Moyankov
DevOps.com Webinar
2020-10-06
about me
● Solutions Architect @ StorPool
● Network and System administrator
● 20+ years in telecoms building and operating
infrastructures
linkedin.com/in/venkomoyankov/
venko@storpool.com
about StorPool
● NVMe software-defined storage for VMs and containers
● Scale-out, HA, API-controlled
● Since 2011, in commercial production use since 2013
● Based in Sofia, Bulgaria
● Mostly virtual disks for KVM
● … and bare metal Linux hosts
● Also used with VMWare, Hyper-V, XenServer
● Integrations into OpenStack/Cinder, Kubernetes Persistent
Volumes, CloudStack, OpenNebula, OnApp
3
Why performance
● Better application performance -- e.g. time to load a page, time to
rebuild, time to execute specific query
● Happier customers (in cloud / multi-tenant environments)
● ROI, TCO - Lower cost per delivered resource (per VM) through
higher density
Agenda
● Hardware
● Compute - CPU & Memory
● Networking
● Storage
Usual optimization goal
- lowest cost per delivered resource
- fixed performance target
- calculate all costs - power, cooling, space, server, network,
support/maintenance
Example: cost per VM with 4x dedicated 3 GHz cores and 16 GB
RAM
Unusual
- Best single-thread performance I can get at any cost
- 5 GHz cores, yummy :)
Compute node hardware
Compute node hardware
Compute node hardware
Intel
lowest cost per core:
- Xeon Gold 5220R - 24 cores @ 2.6 GHz ($244/core)
lowest cost per 3GHz+ core:
- Xeon Gold 6240R - 24 cores @ 3.2 GHz ($276/core)
- Xeon Gold 6248R - 24 cores @ 3.6 GHz ($308/core)
lowest cost per GHz:
- Xeon Gold 6230R - 26 cores @ 30 GHz ($81/GHz)
Compute node hardware
AMD
- EPYC 7702P - 64 cores @ 2.0/3.35 GHz - lowest cost per core
- EPYC 7402P - 24 cores / 1S - low density
- EPYC 7742 - 64 cores @ 2.2/3.4GHz x 2S - max density
- EPYC 7262 - 8 cores @3.4GHz - max IO/cache per core, per $
Compute node hardware
Form factor
from to
Compute node hardware
● firmware versions and BIOS settings
● Understand power management -- esp. C-states, P-states,
HWP and “bias”
○ Different on AMD EPYC: "power-deterministic",
"performance-deterministic"
● Think of rack level optimization - how do we get the lowest
total cost per delivered resource?
Agenda
● Hardware
● Compute - CPU & Memory
● Networking
● Storage
Tuning KVM
RHEL7 Virtualization_Tuning_and_Optimization_Guide link
https://pve.proxmox.com/wiki/Performance_Tweaks
https://events.static.linuxfound.org/sites/events/files/slides/CloudOpen2013_Khoa_Huynh_v3.pdf
http://www.linux-kvm.org/images/f/f9/2012-forum-virtio-blk-performance-improvement.pdf
http://www.slideshare.net/janghoonsim/kvm-performance-optimization-for-ubuntu
… but don’t trust everything you read. Perform your own benchmarking!
CPU and Memory
Recent Linux kernel, KVM and QEMU
… but beware of the bleeding edge
E.g. qemu-kvm-ev from RHEV (repackaged by CentOS)
tuned-adm virtual-host
tuned-adm virtual-guest
CPU
Typical
● (heavy) oversubscription, because VMs are mostly idling
● HT
● NUMA
● route IRQs of network and storage adapters to a core on the
NUMA node they are on
Unusual
● CPU Pinning
Understanding oversubscription and congestion
Linux scheduler statistics: /proc/schedstat
(linux-stable/Documentation/scheduler/sched-stats.txt)
Next three are statistics describing scheduling latency:
7) sum of all time spent running by tasks on this processor (in ms)
8) sum of all time spent waiting to run by tasks on this processor (in ms)
9) # of tasks (not necessarily unique) given to the processor
* In nanoseconds, not ms.
20% CPU load with large wait time (bursty congestion) is possible
100% CPU load with no wait time, also possible
Measure CPU congestion!
Understanding oversubscription and congestion
Memory
Typical
● Dedicated RAM
● huge pages, THP
● NUMA
● use local-node memory if you can
Unusual
● Oversubscribed RAM
● balloon
● KSM (RAM dedup)
Agenda
● Hardware
● Compute - CPU & Memory
● Networking
● Storage
Networking
Virtualized networking
● hardware emulation (rtl8139, e1000)
● paravirtualized drivers - virtio-net
regular virtio vs vhost-net vs vhost-user
Linux Bridge vs OVS in-kernel vs OVS-DPDK
Pass-through networking
SR-IOV (PCIe pass-through)
virtio-net QEMU
● Multiple context switches:
1. virtio-net driver → KVM
2. KVM → qemu/virtio-net
device
3. qemu → TAP device
4. qemu → KVM (notification)
5. KVM → virtio-net driver
(interrupt)
● Much more efficient than
emulated hardware
● shared memory with qemu
process
● qemu thread process packets
virtio vhost-net
● Two context switches
(optional):
1. virtio-net driver → KVM
2. KVM → virtio-net driver
(interrupt)
● shared memory with the host
kernel (vhost protocol)
● Allows Linux Bridge Zero
Copy
● qemu / virtio-net device is on
the control path only
● kernel thread [vhost] process
packets
virtio vhost-usr / OVS-DPDK
● No context switches
● shared memory between the
guest and the Open vSwitch
(requres huge pages)
● Zero copy
● qemu / virtio-net device is on
the control path only
● KVM not in the path
● ovs-vswitchd process
packets.
● Poll-mode-driver (PMD) takes
1 CPU core, 100%
PCI Passthrough
● No paravirtualized devices
● Direct access from the guest
kernel to the PCI device
● Host, KVM and qemu are not
on the data nor the control
path.
● NIC driver in the guest
● No virtual networking
● No live migrations
● No filtering
● No control
● Shared devices via SR-IOV
Virtual Network Performance
All measurements are between two VMs on the same host
# ping -f -c 100000 vm2
virtio-net QEMU
qemu pid
virtio vhost-net
qemu vhost thread
virtio vhost-usr / OVS-DPDK
qemu OVS
Discussion
● Deep dive into Virtio-networking and vhost-net
https://www.redhat.com/en/blog/deep-dive-virtio-networking-and-vhost-net
● Open vSwitch DPDK support
https://docs.openvswitch.org/en/latest/topics/dpdk/
Agenda
● Hardware
● Compute - CPU & Memory
● Networking
● Storage
Storage - virtualization
Virtualized
live migration
thin provisioning, snapshots, etc.
vs. Full bypass
only speed
Storage - virtualization
Virtualized
cache=none -- direct IO, bypass host buffer cache
io=native -- use Linux Native AIO, not POSIX AIO (threads)
virtio-blk vs virtio-scsi
virtio-scsi multiqueue
iothread
vs. Full bypass
SR-IOV for NVMe devices
Storage - vhost
Virtualized with qemu bypass
vhost
before:
guest kernel -> host kernel -> qemu -> host kernel -> storage system
after:
guest kernel -> storage system
storpool_server instance
1 CPU thread
2-4 GB RAM
NIC
storpool_server instance
1 CPU thread
2-4 GB RAM
storpool_server instance
1 CPU thread
2-4 GB RAM
• Highly scalable and efficient architecture
• Scales up in each storage node & out with multiple nodes
25GbE
. . .
25GbE
storpool_block instance
1 CPU thread
NVMe SSD
NVMe SSD
NVMe SSD
NVMe SSD
NVMe SSD
NVMe SSD
KVM Virtual Machine
KVM Virtual Machine
Storage benchmarks
Beware: lots of snake oil out there!
● performance numbers from hardware configurations totally
unlike what you’d use in production
● synthetic tests with high iodepth - 10 nodes, 10 workloads *
iodepth 256 each. (because why not)
● testing with ramdisk backend
● synthetic workloads don't approximate real world (example)
Latency
opspersecond
best service
36
Latency
opspersecond
best service
lowest cost per
delivered resource
37
Latency
opspersecond
best service
lowest cost per
delivered resource
only pain
38
Latency
opspersecond
best service
lowest cost per
delivered resource
only pain
39
benchmarks
Benchmarks
Real load
?
?
StorPool
Storage
@storpool StorPool
Storage
StorPool
Storage
StorPool
Storage
StorPool
Storage
Follow Us Online
Q&A
Venko Moyankov
venko@storpool.com
StorPool Storage
www.storpool.com
@storpool
Thank you!
Why performance

Más contenido relacionado

La actualidad más candente

tcpdumpとtcpreplayとtcprewriteと他。
tcpdumpとtcpreplayとtcprewriteと他。tcpdumpとtcpreplayとtcprewriteと他。
tcpdumpとtcpreplayとtcprewriteと他。
(^-^) togakushi
 
クラウドオーケストレーション「OpenStack Heat」に迫る!
クラウドオーケストレーション「OpenStack Heat」に迫る!クラウドオーケストレーション「OpenStack Heat」に迫る!
クラウドオーケストレーション「OpenStack Heat」に迫る!
Etsuji Nakai
 

La actualidad más candente (20)

Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行
 
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャー
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャーKubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャー
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャー
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)
 
Scapyで作る・解析するパケット
Scapyで作る・解析するパケットScapyで作る・解析するパケット
Scapyで作る・解析するパケット
 
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
 
OpenStackによる、実践オンプレミスクラウド
OpenStackによる、実践オンプレミスクラウドOpenStackによる、実践オンプレミスクラウド
OpenStackによる、実践オンプレミスクラウド
 
ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!
 
OpenStackを利用したNFVの商用化 - OpenStack最新情報セミナー 2017年7月
OpenStackを利用したNFVの商用化 - OpenStack最新情報セミナー 2017年7月OpenStackを利用したNFVの商用化 - OpenStack最新情報セミナー 2017年7月
OpenStackを利用したNFVの商用化 - OpenStack最新情報セミナー 2017年7月
 
20分でわかるgVisor入門
20分でわかるgVisor入門20分でわかるgVisor入門
20分でわかるgVisor入門
 
nginx入門
nginx入門nginx入門
nginx入門
 
無料で仮想Junos環境を手元に作ろう
無料で仮想Junos環境を手元に作ろう無料で仮想Junos環境を手元に作ろう
無料で仮想Junos環境を手元に作ろう
 
tcpdumpとtcpreplayとtcprewriteと他。
tcpdumpとtcpreplayとtcprewriteと他。tcpdumpとtcpreplayとtcprewriteと他。
tcpdumpとtcpreplayとtcprewriteと他。
 
MAASで管理するBaremetal server
MAASで管理するBaremetal serverMAASで管理するBaremetal server
MAASで管理するBaremetal server
 
クラウドオーケストレーション「OpenStack Heat」に迫る!
クラウドオーケストレーション「OpenStack Heat」に迫る!クラウドオーケストレーション「OpenStack Heat」に迫る!
クラウドオーケストレーション「OpenStack Heat」に迫る!
 
Unified JVM Logging
Unified JVM LoggingUnified JVM Logging
Unified JVM Logging
 
株式会社コロプラ『GKE と Cloud Spanner が躍動するドラゴンクエストウォーク』第 9 回 Google Cloud INSIDE Game...
株式会社コロプラ『GKE と Cloud Spanner が躍動するドラゴンクエストウォーク』第 9 回 Google Cloud INSIDE Game...株式会社コロプラ『GKE と Cloud Spanner が躍動するドラゴンクエストウォーク』第 9 回 Google Cloud INSIDE Game...
株式会社コロプラ『GKE と Cloud Spanner が躍動するドラゴンクエストウォーク』第 9 回 Google Cloud INSIDE Game...
 
PHP-FPM の子プロセス制御方法と設定をおさらいしよう
PHP-FPM の子プロセス制御方法と設定をおさらいしようPHP-FPM の子プロセス制御方法と設定をおさらいしよう
PHP-FPM の子プロセス制御方法と設定をおさらいしよう
 
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
 
ゼロからはじめるKVM超入門
ゼロからはじめるKVM超入門ゼロからはじめるKVM超入門
ゼロからはじめるKVM超入門
 
DRBD/Heartbeat/Pacemakerで作るKVM仮想化クラスタ
DRBD/Heartbeat/Pacemakerで作るKVM仮想化クラスタDRBD/Heartbeat/Pacemakerで作るKVM仮想化クラスタ
DRBD/Heartbeat/Pacemakerで作るKVM仮想化クラスタ
 

Similar a Achieving the Ultimate Performance with KVM

Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdfStorage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
aaajjj4
 
Nytro-XV_NWD_VM_Performance_Acceleration
Nytro-XV_NWD_VM_Performance_AccelerationNytro-XV_NWD_VM_Performance_Acceleration
Nytro-XV_NWD_VM_Performance_Acceleration
Khai Le
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 

Similar a Achieving the Ultimate Performance with KVM (20)

Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVMAchieving the ultimate performance with KVM
Achieving the ultimate performance with KVM
 
Optimization_of_Virtual_Machines_for_High_Performance
Optimization_of_Virtual_Machines_for_High_PerformanceOptimization_of_Virtual_Machines_for_High_Performance
Optimization_of_Virtual_Machines_for_High_Performance
 
Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
Optimization of OpenNebula VMs for Higher Performance - Boyan KrosnovOptimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
 
Libvirt/KVM Driver Update (Kilo)
Libvirt/KVM Driver Update (Kilo)Libvirt/KVM Driver Update (Kilo)
Libvirt/KVM Driver Update (Kilo)
 
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdfStorage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
 
WAN - trends and use cases
WAN - trends and use casesWAN - trends and use cases
WAN - trends and use cases
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machines
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV Features
 
Boyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experienceBoyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experience
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
100Gbps OpenStack For Providing High-Performance NFV
100Gbps OpenStack For Providing High-Performance NFV100Gbps OpenStack For Providing High-Performance NFV
100Gbps OpenStack For Providing High-Performance NFV
 
Red hat open stack and storage presentation
Red hat open stack and storage presentationRed hat open stack and storage presentation
Red hat open stack and storage presentation
 
Nytro-XV_NWD_VM_Performance_Acceleration
Nytro-XV_NWD_VM_Performance_AccelerationNytro-XV_NWD_VM_Performance_Acceleration
Nytro-XV_NWD_VM_Performance_Acceleration
 
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
DPDK Summit 2015 - RIFT.io - Tim MortsolfDPDK Summit 2015 - RIFT.io - Tim Mortsolf
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
 
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017  - ...Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017  - ...
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
 

Más de DevOps.com

Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
DevOps.com
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
DevOps.com
 

Más de DevOps.com (20)

Modernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source SoftwareModernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source Software
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
 
Next Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykNext Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and Snyk
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
 
2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions
 
A New Year’s Ransomware Resolution
A New Year’s Ransomware ResolutionA New Year’s Ransomware Resolution
A New Year’s Ransomware Resolution
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
 
Don't Panic! Effective Incident Response
Don't Panic! Effective Incident ResponseDon't Panic! Effective Incident Response
Don't Panic! Effective Incident Response
 
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's CultureCreating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
 
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportRole Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
 
Monitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogMonitoring Serverless Applications with Datadog
Monitoring Serverless Applications with Datadog
 
Deliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or PrivatelyDeliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or Privately
 
Securing medical apps in the age of covid final
Securing medical apps in the age of covid finalSecuring medical apps in the age of covid final
Securing medical apps in the age of covid final
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call Culture
 
The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021
 
Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?
 
Secure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift EnvironmentsSecure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift Environments
 
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
 
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Achieving the Ultimate Performance with KVM

  • 1. Achieving the Ultimate Performance with KVM Venko Moyankov DevOps.com Webinar 2020-10-06
  • 2. about me ● Solutions Architect @ StorPool ● Network and System administrator ● 20+ years in telecoms building and operating infrastructures linkedin.com/in/venkomoyankov/ venko@storpool.com
  • 3. about StorPool ● NVMe software-defined storage for VMs and containers ● Scale-out, HA, API-controlled ● Since 2011, in commercial production use since 2013 ● Based in Sofia, Bulgaria ● Mostly virtual disks for KVM ● … and bare metal Linux hosts ● Also used with VMWare, Hyper-V, XenServer ● Integrations into OpenStack/Cinder, Kubernetes Persistent Volumes, CloudStack, OpenNebula, OnApp 3
  • 4. Why performance ● Better application performance -- e.g. time to load a page, time to rebuild, time to execute specific query ● Happier customers (in cloud / multi-tenant environments) ● ROI, TCO - Lower cost per delivered resource (per VM) through higher density
  • 5. Agenda ● Hardware ● Compute - CPU & Memory ● Networking ● Storage
  • 6. Usual optimization goal - lowest cost per delivered resource - fixed performance target - calculate all costs - power, cooling, space, server, network, support/maintenance Example: cost per VM with 4x dedicated 3 GHz cores and 16 GB RAM Unusual - Best single-thread performance I can get at any cost - 5 GHz cores, yummy :) Compute node hardware
  • 8. Compute node hardware Intel lowest cost per core: - Xeon Gold 5220R - 24 cores @ 2.6 GHz ($244/core) lowest cost per 3GHz+ core: - Xeon Gold 6240R - 24 cores @ 3.2 GHz ($276/core) - Xeon Gold 6248R - 24 cores @ 3.6 GHz ($308/core) lowest cost per GHz: - Xeon Gold 6230R - 26 cores @ 30 GHz ($81/GHz)
  • 9. Compute node hardware AMD - EPYC 7702P - 64 cores @ 2.0/3.35 GHz - lowest cost per core - EPYC 7402P - 24 cores / 1S - low density - EPYC 7742 - 64 cores @ 2.2/3.4GHz x 2S - max density - EPYC 7262 - 8 cores @3.4GHz - max IO/cache per core, per $
  • 10. Compute node hardware Form factor from to
  • 11. Compute node hardware ● firmware versions and BIOS settings ● Understand power management -- esp. C-states, P-states, HWP and “bias” ○ Different on AMD EPYC: "power-deterministic", "performance-deterministic" ● Think of rack level optimization - how do we get the lowest total cost per delivered resource?
  • 12. Agenda ● Hardware ● Compute - CPU & Memory ● Networking ● Storage
  • 13. Tuning KVM RHEL7 Virtualization_Tuning_and_Optimization_Guide link https://pve.proxmox.com/wiki/Performance_Tweaks https://events.static.linuxfound.org/sites/events/files/slides/CloudOpen2013_Khoa_Huynh_v3.pdf http://www.linux-kvm.org/images/f/f9/2012-forum-virtio-blk-performance-improvement.pdf http://www.slideshare.net/janghoonsim/kvm-performance-optimization-for-ubuntu … but don’t trust everything you read. Perform your own benchmarking!
  • 14. CPU and Memory Recent Linux kernel, KVM and QEMU … but beware of the bleeding edge E.g. qemu-kvm-ev from RHEV (repackaged by CentOS) tuned-adm virtual-host tuned-adm virtual-guest
  • 15. CPU Typical ● (heavy) oversubscription, because VMs are mostly idling ● HT ● NUMA ● route IRQs of network and storage adapters to a core on the NUMA node they are on Unusual ● CPU Pinning
  • 16. Understanding oversubscription and congestion Linux scheduler statistics: /proc/schedstat (linux-stable/Documentation/scheduler/sched-stats.txt) Next three are statistics describing scheduling latency: 7) sum of all time spent running by tasks on this processor (in ms) 8) sum of all time spent waiting to run by tasks on this processor (in ms) 9) # of tasks (not necessarily unique) given to the processor * In nanoseconds, not ms. 20% CPU load with large wait time (bursty congestion) is possible 100% CPU load with no wait time, also possible Measure CPU congestion!
  • 18. Memory Typical ● Dedicated RAM ● huge pages, THP ● NUMA ● use local-node memory if you can Unusual ● Oversubscribed RAM ● balloon ● KSM (RAM dedup)
  • 19. Agenda ● Hardware ● Compute - CPU & Memory ● Networking ● Storage
  • 20. Networking Virtualized networking ● hardware emulation (rtl8139, e1000) ● paravirtualized drivers - virtio-net regular virtio vs vhost-net vs vhost-user Linux Bridge vs OVS in-kernel vs OVS-DPDK Pass-through networking SR-IOV (PCIe pass-through)
  • 21. virtio-net QEMU ● Multiple context switches: 1. virtio-net driver → KVM 2. KVM → qemu/virtio-net device 3. qemu → TAP device 4. qemu → KVM (notification) 5. KVM → virtio-net driver (interrupt) ● Much more efficient than emulated hardware ● shared memory with qemu process ● qemu thread process packets
  • 22. virtio vhost-net ● Two context switches (optional): 1. virtio-net driver → KVM 2. KVM → virtio-net driver (interrupt) ● shared memory with the host kernel (vhost protocol) ● Allows Linux Bridge Zero Copy ● qemu / virtio-net device is on the control path only ● kernel thread [vhost] process packets
  • 23. virtio vhost-usr / OVS-DPDK ● No context switches ● shared memory between the guest and the Open vSwitch (requres huge pages) ● Zero copy ● qemu / virtio-net device is on the control path only ● KVM not in the path ● ovs-vswitchd process packets. ● Poll-mode-driver (PMD) takes 1 CPU core, 100%
  • 24. PCI Passthrough ● No paravirtualized devices ● Direct access from the guest kernel to the PCI device ● Host, KVM and qemu are not on the data nor the control path. ● NIC driver in the guest ● No virtual networking ● No live migrations ● No filtering ● No control ● Shared devices via SR-IOV
  • 25. Virtual Network Performance All measurements are between two VMs on the same host # ping -f -c 100000 vm2
  • 28. virtio vhost-usr / OVS-DPDK qemu OVS
  • 29. Discussion ● Deep dive into Virtio-networking and vhost-net https://www.redhat.com/en/blog/deep-dive-virtio-networking-and-vhost-net ● Open vSwitch DPDK support https://docs.openvswitch.org/en/latest/topics/dpdk/
  • 30. Agenda ● Hardware ● Compute - CPU & Memory ● Networking ● Storage
  • 31. Storage - virtualization Virtualized live migration thin provisioning, snapshots, etc. vs. Full bypass only speed
  • 32. Storage - virtualization Virtualized cache=none -- direct IO, bypass host buffer cache io=native -- use Linux Native AIO, not POSIX AIO (threads) virtio-blk vs virtio-scsi virtio-scsi multiqueue iothread vs. Full bypass SR-IOV for NVMe devices
  • 33. Storage - vhost Virtualized with qemu bypass vhost before: guest kernel -> host kernel -> qemu -> host kernel -> storage system after: guest kernel -> storage system
  • 34. storpool_server instance 1 CPU thread 2-4 GB RAM NIC storpool_server instance 1 CPU thread 2-4 GB RAM storpool_server instance 1 CPU thread 2-4 GB RAM • Highly scalable and efficient architecture • Scales up in each storage node & out with multiple nodes 25GbE . . . 25GbE storpool_block instance 1 CPU thread NVMe SSD NVMe SSD NVMe SSD NVMe SSD NVMe SSD NVMe SSD KVM Virtual Machine KVM Virtual Machine
  • 35. Storage benchmarks Beware: lots of snake oil out there! ● performance numbers from hardware configurations totally unlike what you’d use in production ● synthetic tests with high iodepth - 10 nodes, 10 workloads * iodepth 256 each. (because why not) ● testing with ramdisk backend ● synthetic workloads don't approximate real world (example)
  • 38. Latency opspersecond best service lowest cost per delivered resource only pain 38
  • 39. Latency opspersecond best service lowest cost per delivered resource only pain 39 benchmarks
  • 42. ?
  • 43. ?
  • 45. Q&A