4. Openstack Summit Sydney November 2017
4
SBC is - Compute, Network and I/O Intensive NFV
SBC sits at the Border of Networks and acts as an Interworking Element,
Demarcation point, Centralized Routing database, Firewall and Traffic Cop
5. Openstack Summit Sydney November 2017
5
PPS for a Telco NFV
IFG 12 Stripped
on wire
Preamble 8
Ethernet
Header
14
64
IP Header 20
Transport 8
Packet Payload 18
CRC 4
84
Maximum MPPS 1.5
6. Openstack Summit Sydney November 2017
6
● Guarantee
Ensure application response time.
● Low Latency and Jitter
Pre-defined constraints dictate throughput and capacity for a given VM
configuration.
● Deterministic
RTC demands predictive performance.
● Optimized
Tuning OpenStack parameters to reduce latency has positive impact on
throughput and capacity.
● Packet Loss
Zero Packet Loss so the quality of RT traffic is maintained.
Performance Requirements of an SBC NFV
7. Openstack Summit Sydney November 2017
7
○ CPU - Sharing with variable VNF loads
The Virtual CPU in the Guest VM runs as Qemu threads on the Compute Host which are treated as normal processes in the Host. This
threads can be scheduled in any physical core which increases cache misses hampering performance. Features like CPU pinning helps in
reducing the hit.
○ Memory - Small Memory Pages coming from different sockets
The virtual memory can get allocated from any NUMA node, and in cases where the memory and the cpu/nic is from different NUMA,
the data needs to traverse the QPI links increasing I/O latency. Also TLB misses due to small kernel memory page sizes increases Hypervisor
overhead. NUMA Awareness and Hugepages helps in minimizing the effects
○ Network - Throughput and Latency for small packets
The network traffic coming into the Compute Host physical NICs needs to be copied to the tap devices by the emulator threads which is
passed to the guest. This increases network latency and induces packet drops. Introduction of SR-IOV and OVS-DPDK helps the cause.
○ Hypervisor/BIOS Settings - Overhead, eliminate interrupts, prevent preemption
Any interrupts raised by the Guest to the host results in VM entry and exit calls increasing the overhead of the hypervisor. Host OS tuning
helps in reducing the overhead.
Performance Bottlenecks in Openstack
The Major Attributes which Govern Performance and Deterministic
behavior
8. Openstack Summit Sydney November 2017
8
● Isolate cores for Fast Path Traffic, Slow Path Traffic and OAM.
● Use of Poll Mode Drivers for Network Traffic
○ DPDK
○ PF-RING
● Use HugePages for DPDK Threads
● Do Proper Sizing of VNF Based on WorkLoad.
Performance tuning for VNF(Guest)
9. Openstack Summit Sydney November 2017
9
PERFORMANCE GAIN WITH CONFIG CHANGES
and Optimized NFV
● Enable CPU Pinning
● Configure libvirt to expose the host CPU features to the guest
● Enable ComputeFilter Nova scheduler filter
● Remove CPU OverCommit
● CPU Topology of the Guest
● Segregate real-time and non real-time workloads to different
computes using host aggregates
● Isolate Host processes from running on pinned CPU
● Enable NUMA Awareness
● Enable Hugepages on the host for Guest
Memory.
● Extend Nova scheduler with the NUMA
topology filter
● Remove Memory OverCommit
10. Openstack Summit Sydney November 2017
10
Networks in OpenStack
PF1 PF2
VNF with SR-IOV
Single-Root IO Virtualization
Kernel
space
User
space
VNF with Open vswitch
(kernel datapath)
VNF with OVS-DPDK
(DPDK datapath)
Up to 50kpps Up to 4Mpps per socket*
*Lack of NUMA Awareness
Up to 21 Mpps per core
11. Openstack Summit Sydney November 2017
● Kernel Tuning
○ The “cpu-partitioning” profile will also tune the kernel to
■ Remove read-copy-update work from isolated CPUs
■ Reduce timer tick to isolated CPUs (when busy) from
1000 to 1/second
○ For best performing 0-packet loss, also use “isolcpus” boot
parameter
○ Disable KSM (Kernel Sharable Memory)
Host Tunables for Performance - Kernel configuration
12. Openstack Summit Sydney November 2017
12
● Configuring the txqueuelen of tap devices in case of OVS ML2 plugins:
○ https://blueprints.launchpad.net/neutron/+spec/txqueuelen-configuration-on-tap
● Isolate Emulator threads to different cores than the vCPU pinned cores:
○ https://blueprints.launchpad.net/nova/+spec/libvirt-emulator-threads-policy
● SR-IOV Trusted VF:
○ https://blueprints.launchpad.net/nova/+spec/sriov-trusted-vfs
● Accelerated devices ( GPU/FPGA/QAT) & Smart NICs.
○ https://blueprints.launchpad.net/horizon/+spec/pci-stats-in-horizon
○ https://blueprints.launchpad.net/nova/+spec/pci-extra-info
● SR-IOV Numa Awareness
○ https://blueprints.launchpad.net/nova/+spec/reserve-numa-with-pci
Future/Roadmap Items
13. Openstack Summit Sydney November 2017
13
Q & A
More Details :
https://www.openstack.org/summit/sydney-2017/summit-
schedule/events/20538/secrets-for-approaching-bare-
metal-performance-with-real-time-virtual-network-functions-
in-openstack
14. Openstack Summit Sydney November 2017
14
Thank You
Contact:
skarmarkar@sonusnet.com
sodey@sonusnet.com
Editor's Notes
Core segregation of network/signaling/oAM
Workload - network (virtio, etc)
virtio, sriov, ovs-dpdk
RCU should be disabled in the core doing DPDK busy loop. Enabling Dynamic ticking for the cores also helps here.
halt_poll_ns is the new kernel feature which reduces the VM entry/exit calls. Tuning it according to the VNF requirement is beneficial.
Isolate cpus for HOST process. Don’t use the guest process CPUs.
Kernel same-page merging is a technology which finds common memory pages inside a linux system and merges the pages so there is only a single copy, saving memory resources. However, there is an overhead due to the scanning process which may cause the applications to run more slowly.