SlideShare a Scribd company logo
1 of 19
Download to read offline
KVM Storage
performance
With io_uring
SPONSORED BY:
CLDIN: CLouDINfra
● CLDIN builds and runs the infrastructure of Total Webhosting Solutions
● TWS is a European company with multiple hosting brands in
○ Netherlands
○ France
○ Spain
● We build our infrastructure with
○ Open source software
○ Apache CloudStack
○ Ceph
○ IPv6
CLDIN cloud deployment
● CloudStack
○ Locations
■ Netherlands: Amsterdam and Haarlem
■ Spain: Valencia
○ Advanced Networking
■ BGP+VXLAN+EVPN
○ Storage
■ Ceph (RBD)
■ TrueNAS Enterprise (ZFS HA)
● Numbers
○ ~10.000 Virtual Machines
○ ~200 physical hosts
○ ~100TB RAM
○ ~15PB storage
● Hypervisors (latest)
○ Dual AMD Epyc 64C
○ 1TB RAM
○ Dell R6525 or SuperMicro AS-1123US-TN10RT
History
● Bare metal with NVMe provides best performance
○ Lowest latency
○ Highest amount of IOps
○ But we want to run our workloads inside Virtual Machines!
● Virtual Machines
○ CPU and Memory performance has a small (~5%) overhead
○ Disk I/O has a much higher overhead…..
● KVM uses the QCOW2 format
○ Usually used when using Local Storage and NFS as Primary Storage
● Virtio-blk is a bit slow and the bottleneck
io_uring
● New mapping between host/hypervisor and VM
● Provides lower latency and thus more IOps
○ Latency and IOps are always connected
● Software requirements
○ Kernel >= 5.8
■ I tested with 5.13
○ Qemu >= 5.0
○ Libvirt >= 6.3
QCOW2 vs RAW
● QCOW2 is most flexible
○ And being used by almost all cloud deployments
○ Local Storage and NFS Primary Storage use this format
○ Supports snapshots and cloning
● RAW is fastest
○ Is not being used by many deployments
QCOW2 preallocation
By preallocating space within the QCOW2 disk image performance can be increased.
As data is saved to the QCOW2 image, the physical space used by the image will
increase. Growing the QCOW2 image takes time and thus decreases the performance.
Preallocation modes:
● preallocation=metadata - allocates the space required by the metadata but doesn’t allocate any space for the data. This is the quickest
to provision but the slowest for guest writes.
● preallocation=falloc - allocates space for the metadata and data but marks the blocks as unallocated. This will provision slower than
metadata but quicker than full. Guest write performance will be much quicker than metadata and similar to full.
● preallocation=full - allocates space for the metadata and data and will therefore consume all the physical space that you allocate (not
sparse). All empty allocated space will be set as a zero. This is the slowest to provision and will give similar guest write performance to
falloc.
Test setup
Hypervisor
● AMD Epyc 7351P 16C
● 256GB RAM
● Samsung PM983 NVMe
○ ext4 filesystem
○ No RAID
● Ubuntu 20.04
○ kernel 5.13 (HWE)
○ Qemu 5.0 (PPA)
○ Plain libvirt with manual XML file
Virtual Machine
● 16 Cores
● 64GB RAM
● Ubuntu 20.04 with kernel 5.13 (HWE)
Results: 512 bytes writes
Results: 4k writes
Results found on the internet
I’m not able to get the near
bare-metal performance.
Further testing is needed!
CloudStack & io_uring
● io_uring supported
○ Since version 4.16
○ Enabled automatically if supported by Libvirt and Qemu
● Service Offerings support different provisioning types
○ Thin: preallocation=metadata
○ Sparse: preallocation=falloc
○ Fat: preallocation=full
● https://cloudstack.apache.org/api/apidocs-4.16/apis/createDiskOffering.html
○ provisioningtype = thin/sparse/fat
Libvirt
<iothreads>16</iothreads>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' io='io_uring'/>
<source file='/var/lib/libvirt/images/vm1-data-2.qcow2'/>
<backingStore/>
<target dev='sdc' bus='scsi'/>
</disk>
<controller type='scsi' index='0' model='virtio-scsi'>
<driver queues='16' iothread='16'/>
</controller>
Conclusion
● Virtio-blk is the limiting factor currently
○ Local Storage should and can be much faster then it is right now
● 50% lower latency
● 2x performance increase with io_uring
● Other benchmarks suggest 80~90% performance of bare metal
○ Still need to investigate why we don’t reach that performance
Looking forward
● Supported in CloudStack 4.16
● Ubuntu 22.04 LTS (Jammy) has all the right packages
○ Qemu 6.3
○ Libvirt 8.0
● More performance testing is welcome
● More real-life experiences are welcome
● Small enhancements to the Libvirt XML can be made
○ You can also manually make changes using the Libvirt Qemu hooks
○ https://libvirt.org/hooks.html
Questions?
@widodh
wido@denhollander.io
Please send feedback to users@cloudstack.apache.org
Appendix
Useful links
● https://www.jamescoyle.net/how-to/1810-qcow2-disk-images-and-performanc
e
● https://blog.programster.org/qcow2-performance
● https://techpiezo.com/tech-insights/raw-vs-qcow2-disk-images-in-qemu-kvm/

More Related Content

What's hot

Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitch
Sim Janghoon
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
Hiroki Endo
 
/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity
Takuya ASADA
 

What's hot (20)

Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
 
Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLab
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Qemu
QemuQemu
Qemu
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitch
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Routed Fabrics For Ceph
Routed Fabrics For CephRouted Fabrics For Ceph
Routed Fabrics For Ceph
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 
Namespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containersNamespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containers
 
[CNCF TAG-Runtime 2022-10-06] Lima
[CNCF TAG-Runtime 2022-10-06] Lima[CNCF TAG-Runtime 2022-10-06] Lima
[CNCF TAG-Runtime 2022-10-06] Lima
 
/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 

Similar to Boosting I/O Performance with KVM io_uring

OpenNebulaConf 2013 - How Can OpenNebula Fit Your Needs: A European Project F...
OpenNebulaConf 2013 - How Can OpenNebula Fit Your Needs: A European Project F...OpenNebulaConf 2013 - How Can OpenNebula Fit Your Needs: A European Project F...
OpenNebulaConf 2013 - How Can OpenNebula Fit Your Needs: A European Project F...
OpenNebula Project
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 
NetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapNetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmap
Ruslan Meshenberg
 

Similar to Boosting I/O Performance with KVM io_uring (20)

Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOcean
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
How can OpenNebula fit your needs - OpenNebulaConf 2013
How can OpenNebula fit your needs - OpenNebulaConf 2013 How can OpenNebula fit your needs - OpenNebulaConf 2013
How can OpenNebula fit your needs - OpenNebulaConf 2013
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
OpenNebulaConf 2013 - How Can OpenNebula Fit Your Needs: A European Project F...
OpenNebulaConf 2013 - How Can OpenNebula Fit Your Needs: A European Project F...OpenNebulaConf 2013 - How Can OpenNebula Fit Your Needs: A European Project F...
OpenNebulaConf 2013 - How Can OpenNebula Fit Your Needs: A European Project F...
 
How Can OpenNebula Fit Your Needs: A European Project Feedback
How Can OpenNebula Fit Your Needs: A European Project FeedbackHow Can OpenNebula Fit Your Needs: A European Project Feedback
How Can OpenNebula Fit Your Needs: A European Project Feedback
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVMAchieving the ultimate performance with KVM
Achieving the ultimate performance with KVM
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
NetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapNetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmap
 
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM servers
 
Bsdtw17: lightning talks/wip sessions
Bsdtw17: lightning talks/wip sessionsBsdtw17: lightning talks/wip sessions
Bsdtw17: lightning talks/wip sessions
 

More from ShapeBlue

More from ShapeBlue (20)

CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlueCloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
 
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueCloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
 
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlueVM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
 
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubHow We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
 
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
 
How We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIO
How We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIOHow We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIO
How We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIO
 
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
 
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
 
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
 
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
 
Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...
Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...
Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...
 
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
 
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
 
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
 
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
 
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
 
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
 
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Boosting I/O Performance with KVM io_uring

  • 3. CLDIN: CLouDINfra ● CLDIN builds and runs the infrastructure of Total Webhosting Solutions ● TWS is a European company with multiple hosting brands in ○ Netherlands ○ France ○ Spain ● We build our infrastructure with ○ Open source software ○ Apache CloudStack ○ Ceph ○ IPv6
  • 4. CLDIN cloud deployment ● CloudStack ○ Locations ■ Netherlands: Amsterdam and Haarlem ■ Spain: Valencia ○ Advanced Networking ■ BGP+VXLAN+EVPN ○ Storage ■ Ceph (RBD) ■ TrueNAS Enterprise (ZFS HA) ● Numbers ○ ~10.000 Virtual Machines ○ ~200 physical hosts ○ ~100TB RAM ○ ~15PB storage ● Hypervisors (latest) ○ Dual AMD Epyc 64C ○ 1TB RAM ○ Dell R6525 or SuperMicro AS-1123US-TN10RT
  • 5. History ● Bare metal with NVMe provides best performance ○ Lowest latency ○ Highest amount of IOps ○ But we want to run our workloads inside Virtual Machines! ● Virtual Machines ○ CPU and Memory performance has a small (~5%) overhead ○ Disk I/O has a much higher overhead….. ● KVM uses the QCOW2 format ○ Usually used when using Local Storage and NFS as Primary Storage ● Virtio-blk is a bit slow and the bottleneck
  • 6. io_uring ● New mapping between host/hypervisor and VM ● Provides lower latency and thus more IOps ○ Latency and IOps are always connected ● Software requirements ○ Kernel >= 5.8 ■ I tested with 5.13 ○ Qemu >= 5.0 ○ Libvirt >= 6.3
  • 7. QCOW2 vs RAW ● QCOW2 is most flexible ○ And being used by almost all cloud deployments ○ Local Storage and NFS Primary Storage use this format ○ Supports snapshots and cloning ● RAW is fastest ○ Is not being used by many deployments
  • 8. QCOW2 preallocation By preallocating space within the QCOW2 disk image performance can be increased. As data is saved to the QCOW2 image, the physical space used by the image will increase. Growing the QCOW2 image takes time and thus decreases the performance. Preallocation modes: ● preallocation=metadata - allocates the space required by the metadata but doesn’t allocate any space for the data. This is the quickest to provision but the slowest for guest writes. ● preallocation=falloc - allocates space for the metadata and data but marks the blocks as unallocated. This will provision slower than metadata but quicker than full. Guest write performance will be much quicker than metadata and similar to full. ● preallocation=full - allocates space for the metadata and data and will therefore consume all the physical space that you allocate (not sparse). All empty allocated space will be set as a zero. This is the slowest to provision and will give similar guest write performance to falloc.
  • 9. Test setup Hypervisor ● AMD Epyc 7351P 16C ● 256GB RAM ● Samsung PM983 NVMe ○ ext4 filesystem ○ No RAID ● Ubuntu 20.04 ○ kernel 5.13 (HWE) ○ Qemu 5.0 (PPA) ○ Plain libvirt with manual XML file Virtual Machine ● 16 Cores ● 64GB RAM ● Ubuntu 20.04 with kernel 5.13 (HWE)
  • 12. Results found on the internet I’m not able to get the near bare-metal performance. Further testing is needed!
  • 13. CloudStack & io_uring ● io_uring supported ○ Since version 4.16 ○ Enabled automatically if supported by Libvirt and Qemu ● Service Offerings support different provisioning types ○ Thin: preallocation=metadata ○ Sparse: preallocation=falloc ○ Fat: preallocation=full ● https://cloudstack.apache.org/api/apidocs-4.16/apis/createDiskOffering.html ○ provisioningtype = thin/sparse/fat
  • 14. Libvirt <iothreads>16</iothreads> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' io='io_uring'/> <source file='/var/lib/libvirt/images/vm1-data-2.qcow2'/> <backingStore/> <target dev='sdc' bus='scsi'/> </disk> <controller type='scsi' index='0' model='virtio-scsi'> <driver queues='16' iothread='16'/> </controller>
  • 15. Conclusion ● Virtio-blk is the limiting factor currently ○ Local Storage should and can be much faster then it is right now ● 50% lower latency ● 2x performance increase with io_uring ● Other benchmarks suggest 80~90% performance of bare metal ○ Still need to investigate why we don’t reach that performance
  • 16. Looking forward ● Supported in CloudStack 4.16 ● Ubuntu 22.04 LTS (Jammy) has all the right packages ○ Qemu 6.3 ○ Libvirt 8.0 ● More performance testing is welcome ● More real-life experiences are welcome ● Small enhancements to the Libvirt XML can be made ○ You can also manually make changes using the Libvirt Qemu hooks ○ https://libvirt.org/hooks.html
  • 19. Useful links ● https://www.jamescoyle.net/how-to/1810-qcow2-disk-images-and-performanc e ● https://blog.programster.org/qcow2-performance ● https://techpiezo.com/tech-insights/raw-vs-qcow2-disk-images-in-qemu-kvm/