SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
pps Matters
Muhammad Moinur Rahman

moin@bofh.im
What is a switch/router?
• A switch forwards frame based on MAC address

• A router forwards packets based on IP address
What is a Software Switch/Router?
• Software based implementations

• Routers

• BIRD, FRR, Zebra, Quagga, ExaBGP

• Switches

• Open vSwitch

• Mostly installable in a Virtualized Environment or on a *nix environment
What is Hardware Switch/Router?
• Manufactured by big names like Cisco, Juniper, ARISTA, Extreme, Nokia

• Comes with Price Tag

• Sometime comes with really big size

• Has different and multiple ports

• * X 1/10/25/40/50/100/400GB

• So many jargons

• ASIC/Merchant Silicon

• GBPS/TBPS backplane capacity

• GBPS/TBPS forwarding capacity

• k/K/m/M pps forwarding

• line rate forwarding
What is ASIC/merchant Silicon?
• ASIC Miners - Just one example

• Application Specific Integrated Circuits

• Some applications

• Bitcoin Miner

• Voice Recorder

• Cryptographic Accelerator

• Network Switches

• Firewalls

• New Lingo for DC Switches is Silicon

• Off the shelf or Custom Built ASICs

• Broadcom, Cavium are some Silicon Manufacturers

• Broadcom Tomahawk is the flagship ASIC
The BIG Questions
1. If there are open source switch/routers why do we need to buy price
tagged Vendor Devices?

2. Why use Silicon or chips instead of generic X86 processors

3. *nix OS can do anything. Why don’t we install those apps and get rid of
Hardware Vendors?
x86 vs ASIC
• x86

• Jack of all, master of none

• CPU and PCI interrupts

• Limited PCIe bandwidth and based on CPU arch

• ASIC

• Master of one

• No interrupts

• Sky is the limit for PCIe bandwidth
POSIX poses
• POSIX sockets evolved from Berkley Sockets

• BSD Sockets are still the defacto standard since 4.2 BSD Unix

• Adopted from Linux to Windows

• Basic life cycle

• socket(), bind(), listen(), accept(), sendmsg(), recvmsg()

• Network Stacks are implemented in-kernel

• So the functions are using system-call

• Higher overhead for Context Switch and CPU Cache Pollution

• Back-and-forth game in Multi-Core CPU and Multi Queue NIC 

• socket buffers(skb) or network memory buffer(mbuf) stresses OS memory
allocators
Mind the GAP
• Minimal pause required between packets or frames

• Interpacket GAP/Interframe spacing/Interframe GAP

• The standard is 96 bit times

• 9.6 µs for 10 Mbit/s Ethernet

• 0.96 µs for 100 Mbit/s (Fast) Ethernet

• 96 ns for Gigabit Ethernet

• 38.4 ns for 2.5 Gigabit Ethernet

• 19.2 ns for 5 Gigabit Ethernet

• 9.6 ns for 10 Gigabit Ethernet

• 2.4 ns for 40 Gigabit Ethernet

• 0.96 ns for 100 Gigabit Ethernet
run KERNEL run
• KERNEL processing time for 1538 bytes of frame

• at 10Gbit/s == 1230.4 ns between packets (815Kpps)

• at 40Gbit/s == 307.6 ns between packets (3.26Mpps)

• at 100Gbit/s == 123.0 ns between packets (8.15Mpps)

• Smallest frame size of 84 bytes 

• at 10Gbit/s == 67.2 ns between packets (14.88Mpps)

• CPU budget

• 67.2ns => 201 cycles (@3GHz)
OS Limitation
• Most OS are jack of all and master of none

• Desktop, Mail Server, Web Server, DNS Server

• Graphics Rendering, Gaming, Day to Day work

• They are not designed for performance packet processing

• Not optimized for line rate packet processing

• Vyatta, bsdrp are to name a few

• Lots of other commercial os

• That is not the END GAME
kernel bypass
zero-copy
• CPU skips task of copying Data from one memory area to another

• Saves CPU cycles

• Saves memory bandwidth

• OS elements

• Device Driver

• File Systems

• Network Protocol Stack

• zero-copy versions

• Reduces number of mode switching between kernel space and user space
applications

• mostly uses raw sockets with mmap(Memory Map)

• kernel bypass utilizes zero-copy and they arre not the same
RDMA
• Remote Direct Memory Access

• Implemented over high speed, low-latency networks(fabrics)

• Direct access to remote host’s memory

• Dramatically reduces latency and CPU overhead

• Requires specialized hardware specially NIC with support for RDMA

• Bypass remote or local operating system

• Transfers data in between wire and application memory

• Bypasses CPU, cache and context switching

• Transfer continues parallel with OS operations without affecting OS
performance 

• Applications can or cannot be RDMA aware
RDMA(continued)
• Link Layer protocol can be 

• Ethernet

• iWARP(internet Wide Area
RDMA Protocol) combines with
TCP Offload Engine

• NVMe over Fabrics(NVMEoF)

• iSCSI Extensions over
RDMA(iSER)

• SMB Direct

• Sockets Direct Protocol(SDP)

• SCSI RDMA Protocol(SRP)

• NFS over RDMA

• GPUDirect
• Link Layer protocol can be 

• InfiniBand

• Oldest RDMA
implementations

• Main manufacturers were
Intel and Mellanox

• Mostly used in Super
Computing environment

• Ethernet can be run over
InfiniBand

• Omni-Path

• Low Latency Networking
Architecture by Intel
RoCE
• RDMA over Converged Ethernet

• Two versions

• RoCEv1 focuses on Ethernet Link Layer mainly Ethertype 0x8915

• RoCEv2 focuses on Internet Layer mainly UDP/IPv4 and UDP/IPv6

• Routable RoCE is the other lingo of v2 due to it’s routable capability

• Also runs over non-converged Ethernet

• RoCE vs InfiniBand

• RoCE requires lossless Ethernet

• RoCE vs iWARP

• RoCE performs RDMA over Ethernet/UDP whereas iWARP uses TCP

• Some of the vendors are

• Nvidia -> Mellanox

• Broadcom -> Emulex

• Cavium -> QLogic/Marvel Technology
The Cool People of Internet
• Connection Establishment (SYN;SYN-ACK;ACK)

• Acknowledgement of traffic receipt

• Checksum and Sequence

• Sliding Window Calculation

• Congestion Control

• Connection Termination
TOE(TCP Offload Engine)
• Offloads kernel TCP stacks in NIC

• Free up host CPU cycles

• Reduces PCI traffic in between PCI bus and host CPU 

• Types

• Parallel-Stack Full Offload

• Host OS TCP/IP stack and parallel stack with “vampire tap”

• HBA full Offload

• Host Bus Adapter used mainly in iSCSI host adapters

• Besides TCP it also offloads iSCSI functions

• TCP chimney partial Offload

• Mainly a Microsoft lingo; but mostly used alternatively

• Selective TCP stacks are offloaded
tso/lro
• TCP Segmentation Offload

• Big chunks of data are split into multiple packets by NIC before
transmission

• The size depends on MTU of a link in between networking devices

• NIC calculates and splits the data when offloaded from host OS

• Large Receive Offload

• Just the opposite

• Multiple packets of single stream are aggregated into single buffer
before handing over to host OS reducing CPU cycle
chksum
• Although a weak check compared to modern checksum methods but TCP
needs error checking

• Uses one’s complement algorithm

• This is CPU intensive work

• But can be offloaded into NIC if supported

• And it has some disadvantages:

• If used along with packet analyzers; it will report invalid checksums for
packets received

• If used with some virtualization platform which do not have checksum
offload capacity in it’s virtual nic driver
eco systems for fast packet processing
• There are lots of framework

• From open source to commercial

• Sometimes tightly coupled with a vendor

• Specially Network Interface Card vendor

• But there are open standards too

• Some eco systems are vnf friendly or offers application development API
for building new solutions

• Commercial ones are really costly considering the price of NIC
xdp (eXpress Data Path)
• In Linux Kernel since 4.8

• eBPF based high performance Data path

• Similar to AF_PACKET a new address family AF_XDP

• Only supported in Intel and Mellanox cards

• eBPF is offloaded to NIC; in case drivers are unavailable then this is CPU
processed and performs slower

• 26 Mpps per core drop test has been checked successfully with
commodity hardware

• Designed for programmability

• This is not kernel bypass but rather integrated fast-path in kernel

• Works seamlessly with kernel TCP stack
pf_ring
• Available for Linux kernels 2.6.32 and newer

• Loadable kernel module

• 10 Gbit Hardware Packet Filtering using commodity network adapters

• Device driver independent

• Libpcap support for seamless integration with existing pcap-based applications.

• ZC version requires commercial license per mac

• User-space ZC (new generation DNA, Direct NIC Access) drivers for extreme packet capture/transmission speed as
the NIC NPU (Network Process Unit) is pushing/getting packets to/from userland without any kernel intervention.
Using the 10Gbit ZC driver you can send/received at wire-speed at any packet sizes.

• PF_RING ZC library for distributing packets in zero-copy across threads, applications, Virtual Machines.

• Support of Accolade, Exablaze, Endace, Fiberblaze, Inveatech, Mellanox, Myricom/CSPI, Napatech, Netcope and
Intel (ZC) network adapters

• Kernel-based packet capture and sampling

• Ability to specify hundred of header filters in addition to BPF

• Content inspection, so that only packets matching the payload filter are passed

• PF_RING™ plugins for advanced packet parsing and content filtering

• Works pretty well within ntop ecosystem
DPDK(Data Plane Development Kit)
• Set of Data Plane libraries and NIC drivers

• Maintained by Linux Foundation but BSD licensed

• Programming framework for x86, ARM and powerPC

• Environment Abstraction Layer(EAL) is created consisting of a set of
hardware/software environment

• Supports lots of hardware

• AMD, Amazon, Aquantia, Atomic Rules, Broadcom, Cavium, Chelsio,
Cisco, Intel, Marvell, Mellanox, NXP, Netcope, Solarflare

• Extensible to different architecture and systems like Intel IA-32 and
FreeBSD
fd.io (Fast Data Input/Output)
• Run by LFN - The LF(Linux Foundation) Networking Fund

• Cisco has donated VPP(Vector Packet Processing) library to fd.io

• This library has been in production by Cisco since 2003

• Leverages DPDK capabilities

• Aligned to support NFV and SDN

• OPNFV is a sub-project of fd.io
netmap
• A novel framework which utilizes known techniques to reduce packet-
processing costs

• A fast packet I/O mechanism between the NIC and user-space

• Removes unnecessary metadata (e.g. sk_buf) allocation

• Amortized systemcall costs, reduced/removed data copies

• Supported both in FreeBSD and Linux as loadable kernel module

• Comes as default from FreeBSD 11.0

• Released with BSD-2CLAUSE; FreeBSD is the primary development platform

• Supported with Intel, Realtek and Chelsio cards

• 14.8 Mpps achieved in 10G NIC with a 900mhz CPU

• Chelsio has tested 100G traffic in netmap mode with 99.99% success rate
Other ecosystems
• OpenOnload by Solarflare

• Napatech
References
• pf_ring https://www.ntop.org

• DPDK https://www.dpdk.org

• fd.io https://fd.io

• netmap http://info.iet.unipi.it/~luigi/netmap/
Questions
Thank You

Más contenido relacionado

La actualidad más candente

Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVYoshihiro Nakajima
 
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNICIndonesia Network Operators Group
 
Eric Vyncke - Layer-2 security, ipv6 norway
Eric Vyncke - Layer-2 security, ipv6 norwayEric Vyncke - Layer-2 security, ipv6 norway
Eric Vyncke - Layer-2 security, ipv6 norwayIKT-Norge
 
Henrik Strøm - IPv6 from the attacker's perspective
Henrik Strøm - IPv6 from the attacker's perspectiveHenrik Strøm - IPv6 from the attacker's perspective
Henrik Strøm - IPv6 from the attacker's perspectiveIKT-Norge
 
Layer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus NetworksLayer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus NetworksVikram G Hosakote
 
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpec
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpecОбеспечение безопасности сети оператора связи с помощью BGP FlowSpec
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpecCisco Russia
 
Eric Vyncke - IPv6 security in general
Eric Vyncke - IPv6 security in generalEric Vyncke - IPv6 security in general
Eric Vyncke - IPv6 security in generalIKT-Norge
 
Subnet Pools and Pluggable IPAM
Subnet Pools and Pluggable IPAMSubnet Pools and Pluggable IPAM
Subnet Pools and Pluggable IPAMcarlbaldwin
 
Flowspec @ Bay Area Juniper User Group (BAJUG)
Flowspec @ Bay Area Juniper User Group (BAJUG)Flowspec @ Bay Area Juniper User Group (BAJUG)
Flowspec @ Bay Area Juniper User Group (BAJUG)Juniper Networks
 
Silverlight Wireshark Analysis
Silverlight Wireshark AnalysisSilverlight Wireshark Analysis
Silverlight Wireshark AnalysisYoss Cohen
 
Cloud Traffic Engineer – Google Espresso Project by Shaowen Ma
Cloud Traffic Engineer – Google Espresso Project  by Shaowen MaCloud Traffic Engineer – Google Espresso Project  by Shaowen Ma
Cloud Traffic Engineer – Google Espresso Project by Shaowen MaMyNOG
 
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLANIndonesia Network Operators Group
 

La actualidad más candente (20)

Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
EVPN Introduction
EVPN IntroductionEVPN Introduction
EVPN Introduction
 
Route Origin Validation - A MANRS Approach
Route Origin Validation - A MANRS ApproachRoute Origin Validation - A MANRS Approach
Route Origin Validation - A MANRS Approach
 
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
 
Eric Vyncke - Layer-2 security, ipv6 norway
Eric Vyncke - Layer-2 security, ipv6 norwayEric Vyncke - Layer-2 security, ipv6 norway
Eric Vyncke - Layer-2 security, ipv6 norway
 
Henrik Strøm - IPv6 from the attacker's perspective
Henrik Strøm - IPv6 from the attacker's perspectiveHenrik Strøm - IPv6 from the attacker's perspective
Henrik Strøm - IPv6 from the attacker's perspective
 
Layer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus NetworksLayer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus Networks
 
Haystack + DASH7 Security
Haystack + DASH7 SecurityHaystack + DASH7 Security
Haystack + DASH7 Security
 
Multicast in OpenStack
Multicast in OpenStackMulticast in OpenStack
Multicast in OpenStack
 
Having Honeypot for Better Network Security Analysis
Having Honeypot for Better Network Security AnalysisHaving Honeypot for Better Network Security Analysis
Having Honeypot for Better Network Security Analysis
 
MQTT + DASH7 Integration
MQTT + DASH7 IntegrationMQTT + DASH7 Integration
MQTT + DASH7 Integration
 
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpec
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpecОбеспечение безопасности сети оператора связи с помощью BGP FlowSpec
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpec
 
Eric Vyncke - IPv6 security in general
Eric Vyncke - IPv6 security in generalEric Vyncke - IPv6 security in general
Eric Vyncke - IPv6 security in general
 
Subnet Pools and Pluggable IPAM
Subnet Pools and Pluggable IPAMSubnet Pools and Pluggable IPAM
Subnet Pools and Pluggable IPAM
 
Netflow slides
Netflow slidesNetflow slides
Netflow slides
 
Flowspec @ Bay Area Juniper User Group (BAJUG)
Flowspec @ Bay Area Juniper User Group (BAJUG)Flowspec @ Bay Area Juniper User Group (BAJUG)
Flowspec @ Bay Area Juniper User Group (BAJUG)
 
Stun turn poc_pilot
Stun turn poc_pilotStun turn poc_pilot
Stun turn poc_pilot
 
Silverlight Wireshark Analysis
Silverlight Wireshark AnalysisSilverlight Wireshark Analysis
Silverlight Wireshark Analysis
 
Cloud Traffic Engineer – Google Espresso Project by Shaowen Ma
Cloud Traffic Engineer – Google Espresso Project  by Shaowen MaCloud Traffic Engineer – Google Espresso Project  by Shaowen Ma
Cloud Traffic Engineer – Google Espresso Project by Shaowen Ma
 
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
 

Similar a pps Matters

High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentOPNFV
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdfJunZhao68
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
 
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G coreTối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G coreVietnam Open Infrastructure User Group
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack eurobsdcon
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RSimon Huang
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Colin Charles
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket LinxiaofengMichael Zhang
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machinesinside-BigData.com
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AITyrone Systems
 
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 

Similar a pps Matters (20)

High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deployment
 
To Infiniband and Beyond
To Infiniband and BeyondTo Infiniband and Beyond
To Infiniband and Beyond
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdf
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G coreTối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3R
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machines
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
Cloud Networking Trends
Cloud Networking TrendsCloud Networking Trends
Cloud Networking Trends
 
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 

Más de Bangladesh Network Operators Group

Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and CephAccelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and CephBangladesh Network Operators Group
 
Contents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User ExperienceContents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User ExperienceBangladesh Network Operators Group
 
Re-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with GrafanaRe-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with GrafanaBangladesh Network Operators Group
 

Más de Bangladesh Network Operators Group (20)

Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and CephAccelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
 
Recent IRR changes by Yoshinobu Matsuzaki, IIJ
Recent IRR changes by Yoshinobu Matsuzaki, IIJRecent IRR changes by Yoshinobu Matsuzaki, IIJ
Recent IRR changes by Yoshinobu Matsuzaki, IIJ
 
Fact Sheets : Network Status in Bangladesh
Fact Sheets : Network Status in BangladeshFact Sheets : Network Status in Bangladesh
Fact Sheets : Network Status in Bangladesh
 
AI Driven Wi-Fi for the Bottom of the Pyramid
AI Driven Wi-Fi for the Bottom of the PyramidAI Driven Wi-Fi for the Bottom of the Pyramid
AI Driven Wi-Fi for the Bottom of the Pyramid
 
IPv6 Security Overview by QS Tahmeed, APNIC RCT
IPv6 Security Overview by QS Tahmeed, APNIC RCTIPv6 Security Overview by QS Tahmeed, APNIC RCT
IPv6 Security Overview by QS Tahmeed, APNIC RCT
 
Network eWaste : Community role to manage end of life Product
Network eWaste : Community role to manage end of life ProductNetwork eWaste : Community role to manage end of life Product
Network eWaste : Community role to manage end of life Product
 
A plenarily integrated SIEM solution and it’s Deployment
A plenarily integrated SIEM solution and it’s DeploymentA plenarily integrated SIEM solution and it’s Deployment
A plenarily integrated SIEM solution and it’s Deployment
 
IPv6 Deployment in South Asia 2022
IPv6 Deployment in South Asia  2022IPv6 Deployment in South Asia  2022
IPv6 Deployment in South Asia 2022
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
RPKI Deployment Status in Bangladesh
RPKI Deployment Status in BangladeshRPKI Deployment Status in Bangladesh
RPKI Deployment Status in Bangladesh
 
An Overview about open UDP Services
An Overview about open UDP ServicesAn Overview about open UDP Services
An Overview about open UDP Services
 
12 Years in DNS Security As a Defender
12 Years in DNS Security As a Defender12 Years in DNS Security As a Defender
12 Years in DNS Security As a Defender
 
Contents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User ExperienceContents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User Experience
 
BdNOG-20220625-MT-v6.0.pptx
BdNOG-20220625-MT-v6.0.pptxBdNOG-20220625-MT-v6.0.pptx
BdNOG-20220625-MT-v6.0.pptx
 
Route Leak Prevension with BGP Community
Route Leak Prevension with BGP CommunityRoute Leak Prevension with BGP Community
Route Leak Prevension with BGP Community
 
Tale of a New Bangladeshi NIX
Tale of a New Bangladeshi NIXTale of a New Bangladeshi NIX
Tale of a New Bangladeshi NIX
 
MANRS for Network Operators
MANRS for Network OperatorsMANRS for Network Operators
MANRS for Network Operators
 
Re-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with GrafanaRe-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with Grafana
 
RPKI ROA updates
RPKI ROA updatesRPKI ROA updates
RPKI ROA updates
 
Blockchain Demystified
Blockchain DemystifiedBlockchain Demystified
Blockchain Demystified
 

Último

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

pps Matters

  • 1. pps Matters Muhammad Moinur Rahman moin@bofh.im
  • 2. What is a switch/router? • A switch forwards frame based on MAC address • A router forwards packets based on IP address
  • 3. What is a Software Switch/Router? • Software based implementations • Routers • BIRD, FRR, Zebra, Quagga, ExaBGP • Switches • Open vSwitch • Mostly installable in a Virtualized Environment or on a *nix environment
  • 4. What is Hardware Switch/Router? • Manufactured by big names like Cisco, Juniper, ARISTA, Extreme, Nokia • Comes with Price Tag • Sometime comes with really big size • Has different and multiple ports • * X 1/10/25/40/50/100/400GB • So many jargons • ASIC/Merchant Silicon • GBPS/TBPS backplane capacity • GBPS/TBPS forwarding capacity • k/K/m/M pps forwarding • line rate forwarding
  • 5. What is ASIC/merchant Silicon? • ASIC Miners - Just one example • Application Specific Integrated Circuits • Some applications • Bitcoin Miner • Voice Recorder • Cryptographic Accelerator • Network Switches • Firewalls • New Lingo for DC Switches is Silicon • Off the shelf or Custom Built ASICs • Broadcom, Cavium are some Silicon Manufacturers • Broadcom Tomahawk is the flagship ASIC
  • 6.
  • 7. The BIG Questions 1. If there are open source switch/routers why do we need to buy price tagged Vendor Devices? 2. Why use Silicon or chips instead of generic X86 processors 3. *nix OS can do anything. Why don’t we install those apps and get rid of Hardware Vendors?
  • 8. x86 vs ASIC • x86 • Jack of all, master of none • CPU and PCI interrupts • Limited PCIe bandwidth and based on CPU arch • ASIC • Master of one • No interrupts • Sky is the limit for PCIe bandwidth
  • 9. POSIX poses • POSIX sockets evolved from Berkley Sockets • BSD Sockets are still the defacto standard since 4.2 BSD Unix • Adopted from Linux to Windows • Basic life cycle • socket(), bind(), listen(), accept(), sendmsg(), recvmsg() • Network Stacks are implemented in-kernel • So the functions are using system-call • Higher overhead for Context Switch and CPU Cache Pollution • Back-and-forth game in Multi-Core CPU and Multi Queue NIC • socket buffers(skb) or network memory buffer(mbuf) stresses OS memory allocators
  • 10. Mind the GAP • Minimal pause required between packets or frames • Interpacket GAP/Interframe spacing/Interframe GAP • The standard is 96 bit times • 9.6 µs for 10 Mbit/s Ethernet • 0.96 µs for 100 Mbit/s (Fast) Ethernet • 96 ns for Gigabit Ethernet • 38.4 ns for 2.5 Gigabit Ethernet • 19.2 ns for 5 Gigabit Ethernet • 9.6 ns for 10 Gigabit Ethernet • 2.4 ns for 40 Gigabit Ethernet • 0.96 ns for 100 Gigabit Ethernet
  • 11. run KERNEL run • KERNEL processing time for 1538 bytes of frame • at 10Gbit/s == 1230.4 ns between packets (815Kpps) • at 40Gbit/s == 307.6 ns between packets (3.26Mpps) • at 100Gbit/s == 123.0 ns between packets (8.15Mpps) • Smallest frame size of 84 bytes • at 10Gbit/s == 67.2 ns between packets (14.88Mpps) • CPU budget • 67.2ns => 201 cycles (@3GHz)
  • 12. OS Limitation • Most OS are jack of all and master of none • Desktop, Mail Server, Web Server, DNS Server • Graphics Rendering, Gaming, Day to Day work • They are not designed for performance packet processing • Not optimized for line rate packet processing • Vyatta, bsdrp are to name a few • Lots of other commercial os • That is not the END GAME
  • 14. zero-copy • CPU skips task of copying Data from one memory area to another • Saves CPU cycles • Saves memory bandwidth • OS elements • Device Driver • File Systems • Network Protocol Stack • zero-copy versions • Reduces number of mode switching between kernel space and user space applications • mostly uses raw sockets with mmap(Memory Map) • kernel bypass utilizes zero-copy and they arre not the same
  • 15. RDMA • Remote Direct Memory Access • Implemented over high speed, low-latency networks(fabrics) • Direct access to remote host’s memory • Dramatically reduces latency and CPU overhead • Requires specialized hardware specially NIC with support for RDMA • Bypass remote or local operating system • Transfers data in between wire and application memory • Bypasses CPU, cache and context switching • Transfer continues parallel with OS operations without affecting OS performance • Applications can or cannot be RDMA aware
  • 16. RDMA(continued) • Link Layer protocol can be • Ethernet • iWARP(internet Wide Area RDMA Protocol) combines with TCP Offload Engine • NVMe over Fabrics(NVMEoF) • iSCSI Extensions over RDMA(iSER) • SMB Direct • Sockets Direct Protocol(SDP) • SCSI RDMA Protocol(SRP) • NFS over RDMA • GPUDirect • Link Layer protocol can be • InfiniBand • Oldest RDMA implementations • Main manufacturers were Intel and Mellanox • Mostly used in Super Computing environment • Ethernet can be run over InfiniBand • Omni-Path • Low Latency Networking Architecture by Intel
  • 17. RoCE • RDMA over Converged Ethernet • Two versions • RoCEv1 focuses on Ethernet Link Layer mainly Ethertype 0x8915 • RoCEv2 focuses on Internet Layer mainly UDP/IPv4 and UDP/IPv6 • Routable RoCE is the other lingo of v2 due to it’s routable capability • Also runs over non-converged Ethernet • RoCE vs InfiniBand • RoCE requires lossless Ethernet • RoCE vs iWARP • RoCE performs RDMA over Ethernet/UDP whereas iWARP uses TCP • Some of the vendors are • Nvidia -> Mellanox • Broadcom -> Emulex • Cavium -> QLogic/Marvel Technology
  • 18. The Cool People of Internet • Connection Establishment (SYN;SYN-ACK;ACK) • Acknowledgement of traffic receipt • Checksum and Sequence • Sliding Window Calculation • Congestion Control • Connection Termination
  • 19. TOE(TCP Offload Engine) • Offloads kernel TCP stacks in NIC • Free up host CPU cycles • Reduces PCI traffic in between PCI bus and host CPU • Types • Parallel-Stack Full Offload • Host OS TCP/IP stack and parallel stack with “vampire tap” • HBA full Offload • Host Bus Adapter used mainly in iSCSI host adapters • Besides TCP it also offloads iSCSI functions • TCP chimney partial Offload • Mainly a Microsoft lingo; but mostly used alternatively • Selective TCP stacks are offloaded
  • 20. tso/lro • TCP Segmentation Offload • Big chunks of data are split into multiple packets by NIC before transmission • The size depends on MTU of a link in between networking devices • NIC calculates and splits the data when offloaded from host OS • Large Receive Offload • Just the opposite • Multiple packets of single stream are aggregated into single buffer before handing over to host OS reducing CPU cycle
  • 21. chksum • Although a weak check compared to modern checksum methods but TCP needs error checking • Uses one’s complement algorithm • This is CPU intensive work • But can be offloaded into NIC if supported • And it has some disadvantages: • If used along with packet analyzers; it will report invalid checksums for packets received • If used with some virtualization platform which do not have checksum offload capacity in it’s virtual nic driver
  • 22. eco systems for fast packet processing • There are lots of framework • From open source to commercial • Sometimes tightly coupled with a vendor • Specially Network Interface Card vendor • But there are open standards too • Some eco systems are vnf friendly or offers application development API for building new solutions • Commercial ones are really costly considering the price of NIC
  • 23. xdp (eXpress Data Path) • In Linux Kernel since 4.8 • eBPF based high performance Data path • Similar to AF_PACKET a new address family AF_XDP • Only supported in Intel and Mellanox cards • eBPF is offloaded to NIC; in case drivers are unavailable then this is CPU processed and performs slower • 26 Mpps per core drop test has been checked successfully with commodity hardware • Designed for programmability • This is not kernel bypass but rather integrated fast-path in kernel • Works seamlessly with kernel TCP stack
  • 24. pf_ring • Available for Linux kernels 2.6.32 and newer • Loadable kernel module • 10 Gbit Hardware Packet Filtering using commodity network adapters • Device driver independent • Libpcap support for seamless integration with existing pcap-based applications. • ZC version requires commercial license per mac • User-space ZC (new generation DNA, Direct NIC Access) drivers for extreme packet capture/transmission speed as the NIC NPU (Network Process Unit) is pushing/getting packets to/from userland without any kernel intervention. Using the 10Gbit ZC driver you can send/received at wire-speed at any packet sizes. • PF_RING ZC library for distributing packets in zero-copy across threads, applications, Virtual Machines. • Support of Accolade, Exablaze, Endace, Fiberblaze, Inveatech, Mellanox, Myricom/CSPI, Napatech, Netcope and Intel (ZC) network adapters • Kernel-based packet capture and sampling • Ability to specify hundred of header filters in addition to BPF • Content inspection, so that only packets matching the payload filter are passed • PF_RING™ plugins for advanced packet parsing and content filtering • Works pretty well within ntop ecosystem
  • 25. DPDK(Data Plane Development Kit) • Set of Data Plane libraries and NIC drivers • Maintained by Linux Foundation but BSD licensed • Programming framework for x86, ARM and powerPC • Environment Abstraction Layer(EAL) is created consisting of a set of hardware/software environment • Supports lots of hardware • AMD, Amazon, Aquantia, Atomic Rules, Broadcom, Cavium, Chelsio, Cisco, Intel, Marvell, Mellanox, NXP, Netcope, Solarflare • Extensible to different architecture and systems like Intel IA-32 and FreeBSD
  • 26. fd.io (Fast Data Input/Output) • Run by LFN - The LF(Linux Foundation) Networking Fund • Cisco has donated VPP(Vector Packet Processing) library to fd.io • This library has been in production by Cisco since 2003 • Leverages DPDK capabilities • Aligned to support NFV and SDN • OPNFV is a sub-project of fd.io
  • 27. netmap • A novel framework which utilizes known techniques to reduce packet- processing costs • A fast packet I/O mechanism between the NIC and user-space • Removes unnecessary metadata (e.g. sk_buf) allocation • Amortized systemcall costs, reduced/removed data copies • Supported both in FreeBSD and Linux as loadable kernel module • Comes as default from FreeBSD 11.0 • Released with BSD-2CLAUSE; FreeBSD is the primary development platform • Supported with Intel, Realtek and Chelsio cards • 14.8 Mpps achieved in 10G NIC with a 900mhz CPU • Chelsio has tested 100G traffic in netmap mode with 99.99% success rate
  • 28.
  • 29. Other ecosystems • OpenOnload by Solarflare • Napatech
  • 30. References • pf_ring https://www.ntop.org • DPDK https://www.dpdk.org • fd.io https://fd.io • netmap http://info.iet.unipi.it/~luigi/netmap/