This presentation features a walk through the Linux kernel networking stack covering the essentials and recent developments a developer needs to know. Our starting point is the network card driver as it feeds a packet into the stack. We will follow the packet as it traverses through various subsystems such as packet filtering, routing, protocol stacks, and the socket layer. We will pause here and there to look into concepts such as segmentation offloading, TCP small queues, and low latency polling. We will cover APIs exposed by the kernel that go beyond use of write()/read() on sockets and will look into how they are implemented on the kernel side.
2. Agenda
●
How does a packet get in and out of the net stack?
●
●
How does a packet get through the net stack?
●
●
2
RX Handler, IP Processing, TCP Processing, TCP
Fast Open
How to account for memory and do flow control?
●
●
NAPI, Busy Polling, RSS, RPS, XPS, GRO, TSO
Socket Buffers, Flow Control, TCP Small Queues
Q&A
Kernel Networking Walkthrough
4. How does a packet get in and out of the Network
Stack?
4
Kernel Networking Walkthrough
5. Receive & Transmit Process
NIC
Network Stack
(Kernel Space)
Ring Buffer
Parse
IP
Parse
TCP/UDP
Socket Buffer read()
Forward
DMA
Device?
Ring Buffer
5
Local?
Process
(User Space)
Task
Construct
IP
Construct
TCP/UDP
Kernel Networking Walkthrough
write()
Socket Buffer
6. The 3 ways into the Network Stack
Interrupt Driven
Network
Stack
Ring Buffer
NAPI based Polling
poll()
Network
Stack
Ring Buffer
Busy Polling
busy_poll()
Task
Network
Stack
Ring Buffer
6
Kernel Networking Walkthrough
7. RSS – Receive Side Scaling
●
●
NIC distributes packets across multiple RX queues
allowing for parallel processing.
Separate IRQ per RX queue, thus selects CPU to
run hardware interrupt handler on.
RX-queue-1
CPU 1
RX-queue-2
CPU 3
filter
RX-queue-3
CPU 1
RX-queue-4
CPU 5
7
Kernel Networking Walkthrough
8. RPS – Receive Packet Steering
●
Software filter to select CPU # for processing
●
Use it to ...
... distribute single queue to
multiple CPUs
... redo queue - CPU mapping
RX-queue-1
RX-queue-2
RX-queue-3
RX-queue-4
8
CPU 1
CPU 1
CPU 2
CPU 2
CPU 3
CPU 3
Kernel Networking Walkthrough
9. Hardware Offload
●
RX/TX Checksumming
●
●
Virtual LAN filtering and tag stripping
●
●
9
Perform CPU intensive
checksumming in hardware.
Strip 802.1Q header and store VLAN
ID in network packet meta data.
Filter out unsubscribed VLANs.
Kernel Networking Walkthrough
10. Generic Receive Offload
NAPI based GRO
poll()
Network
Stack
Ring Buffer
GRO
MTU
10
Kernel Networking Walkthrough
Up to 64K
11. Segmentation Offload
Up to 64K
Network
Stack
Generic Segmentation Offload (GSO)
MTU
Ring Buffer
TCP Segmentation Offload (TSO)
MTU
11
Kernel Networking Walkthrough
12. How does a packet get through the Network
Stack?
(c) Karen Sagovac
12
Kernel Networking Walkthrough
13. Packet Processing
Link Layer
Packet Socket
ETH_P_ALL
Ingress QoS
tcpdump
Bridge
Open vSwitch
RX Handler
Team
Bonding
macvlan
macvtap
IPv4
Proto Handler
IPv6
ARP
Feast of the hungry chicks
IPX
Drop
13
Kernel Networking Walkthrough
...
16. TCP Fast Open
(net.ipv4.tcp_fastopen)
Regular
Fast Open
Client
1st Req
Server
Client
1st Req
SYN
ACK
SYN+
2x RTT
ACK+
HTTP
GE
Server
2x RTT
T
SYN
ookie
CK+C
A
SYN+
ACK+
HTTP
GET
Data
2nd Req
Data
2nd Req
SYN
1x RTT
ACK
SYN+
2x RTT
ACK+
HTTP
GE
T
Data
16
Kernel Networking Walkthrough
SYN+
Cook
ie+
HTTP
GET
+Data
+ACK
SYN