SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Simple is better
Building fast IPv6 transition mechanisms
on Snabb Switch
31 January 2016 – FOSDEM 2016
Katerina Barone-Adesi kbarone@igalia.com
Andy Wingo wingo@igalia.com
Snabb Switch
A toolkit for building network functions
High performance, flexible, hackable
data plane
The Tao of Snabb
Simple > Complex
Small > Large
Commodity > Proprietary
Simple > Complex
How do we compose network functions
from smaller parts?
Build inside of network function like
composing UNIX pipelines
intel10g | reassemble | lwaftr | fragment
| intel10g
Apps independently developed, linked
together at run-time
Simple > Complex
What is a packet?
struct packet {
unsigned char data[10*1024];
uint16_t length;
};
Small > Large
Early code budget: 10000 lines
Build in a minute
Constraints driving creativity
Small > Large
Secret weapon: LuaJIT
High performance with minimal fuss
Small > Large
Minimize dependencies
1 minute build budget includes LuaJIT
and all deps
Deliverable is single binary
Small > Large
Writing our own drivers, in Lua
User-space networking
The data plane is our domain, not the
kernel’s
❧
Not DPDK’s either!❧
Fits in 10000-line budget❧
Commodity > Proprietary
Open source (Apache 2.0)
Commodity > Proprietary
Open data sheets
Intel 82599 10Gb, soon up to 100Gb
Soon: Mellanox (they agree to release
data sheet!)
Also Linux tap interfaces, virtio host and
guest
Commodity > Proprietary
Double down on 64-bit x86 servers
Prefer CPU over NIC where possible
Embrace the memory hierarchy
Storytime!
“We need to do work on data... but
there’s just so much of it and it’s really
far away.”
Storytime!
Modern x86: who’s winning?
Clock speed same since years ago
Main memory just as far away
HPC people are winning
“We need to do work on data... but
there’s just so much of it and it’s really
far away.”
Three primary improvements:
CPU can work on more data per cycle,
once data in registers
❧
CPU can load more data per cycle,
once it’s in cache
❧
CPU can make more parallel fetches
to L3 and RAM at once
❧
Networking folks can win
too
Instead of chasing zero-copy, tying
yourself to ever-more-proprietary
features of your NIC, just take the hit
once: DDIO into L3.
Copy if you need to – copies with L3 not
expensive.
Software will eat the world!
Networking folks can win
too
Once in L3, you have:
wide loads and stores via AVX2 and
soon AVX-512 (64 bytes!)
❧
pretty good instruction-level
parallelism: up to 16 concurrent L2
misses per core on haswell
❧
wide SIMD: checksum in software!❧
software, not firmware❧
</storytime>
So what about the lwAFTR
IPv6 transition on Snabb: a lwAFTR
Why IPv6?
● The IPv4 address space is exhausted
- IANA top level exhaustion in 2011
- 4/5 Regional Internet Registries exhausted
- September 2012 in Europe
- September 2015 in the US
- AfriNIC within the next few years
● The internet is still growing
● Moving to IPv6 helps
IPv6 transition mechanisms
● Users want everything to continue working
… including IPv4 websites, networked games,
etc
● Some user equipment cannot do IPv6
● Several options: NAT64, 464XLAT, DS-Lite...
Why Lightweight 4over6?
● Similar to DS-Lite, but less centralized state
● Share IPv4 addresses between users
● Each user gets a port range
● Allows providers to have a simpler architecture:
pure IPv6, not dual-stacked IPv4 and IPv6, in
their internal network
● Standardized as RFC 7596 in 2015
Two main parts: B4 and AFTR
● Both encapsulate and decapsulate IPv4-in-IPv6
● Each user (subscriber) has a B4
● The network provider has one or more AFTRs,
which store per-subscriber (not per-flow)
information
● The information: The B4's IPv6 address, IPv4
address, and port range.
lw4o6 architecture
Lw4o6 address sharing
IPv4 is tunnelled in IPv6
Snabb lwAFTR
● Started July 2015
● Proof of concept data plane October 2015
● It's already usable and fast.
● http://github.com/igalia/snabbswitch/
- lwaftr* branches
- Apache License v2
Performance
● Hardware: two 10-gigabit NICs
- Intel 82599ES, SFI/SFP+
● Xeon processor: E5-2620 v3 @ 2.40GHz
● Snabb-lwaftr alpha release
● 550-byte packets
● Over 4 million packets/second
→ over 17 gigabit/second handled on one core
Challenges
● Correctly handling ICMP
- conveying failure information, for instance to
an IPv4 host if a failure occurs within the tunnel
● Speed
● Speed with a lot of subscribers
● Correctness
● Hairpinning
Hairpinning: client-to-client traffic
And we’re back
Implementation challenges
Binding table lookup - Port partition
When to hairpin?
Virtualization
Policy
Configuration
Binding table lookup
Say, Belgium: millions of tunnels
Per-tunnel: IPv4, IPv6 of B4, port set ID
At least 4 + 16 + 2 = 22 bytes
2M entries: 44MB
You can’t fit it in L3.
Binding table lookup
So always budget for an L3 cache miss –
but only one!
4 MPPS in: 250 ns/packet
One cache miss RTT (80 ns) within
budget
Many fetches can happen in that RTT
Binding table lookup: v1
Open-addressed robin-hood hash table
with linear probing
Result probably right where we first look
for it, otherwise in adjacent memory,
might fetch adjacent cache lines
Binding table lookup: v2
Maximum probe length around 8 for 2e6
entries, 40% occupancy
Stream in all 8 entries at once in parallel
Branchless binary search over those 8
entries
Binding table lookup: v3
Stream in all 8 entries at once in parallel
for multiple packets in parallel❧
32 packets at a time: amortized 50ns/
lookup
Worst-case bounds!
Port partitioning
Different IPv4 addresses can have their
ports partitioned in different ways
Need f(ipv4, port) -> params
Current solution: partition IPv4 space
into ranges with same parameters, use
binary search
Hairpinning
Problem: after decapsulating IPv4
packet, send to internet or re-tunnel back
to IPv6?
Answer: Use port partition as quick
check, if so do the hairpinning
Yay software
Virtualization
Want to make a virtualized lwaftr
Missing virtio-net implementation
Work by Virtual Open Systems; thanks!
Usual workload: One Snabb-NFV per
interface on the host
Same performance
Yay software!
Policy
Ingress/egress filtering
Pflua! https://github.com/Igalia/pflua
As an app!
Configuration
Compile binding table from text
Update/control plane TBD
Future work
Yang
Smaller packets
Integrate ILP binding table fetch
40Gb
Thanks!
kbarone@igalia.com
wingo@igalia.com
https://github.com/SnabbCo/snabbswitch
https://github.com/Igalia/snabbswitch
ps. We are hiring!

Más contenido relacionado

La actualidad más candente

Open stack networking vlan, gre
Open stack networking   vlan, greOpen stack networking   vlan, gre
Open stack networking vlan, gre
Sim Janghoon
 

La actualidad más candente (20)

rtnetlink
rtnetlinkrtnetlink
rtnetlink
 
iptables 101- bottom-up
iptables 101- bottom-upiptables 101- bottom-up
iptables 101- bottom-up
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101
 
Notes on Netty baics
Notes on Netty baicsNotes on Netty baics
Notes on Netty baics
 
IP Virtual Server(IPVS) 101
IP Virtual Server(IPVS) 101IP Virtual Server(IPVS) 101
IP Virtual Server(IPVS) 101
 
Building GUI App with Electron and Lisp
Building GUI App with Electron and LispBuilding GUI App with Electron and Lisp
Building GUI App with Electron and Lisp
 
Writing a fast HTTP parser
Writing a fast HTTP parserWriting a fast HTTP parser
Writing a fast HTTP parser
 
Netty - a pragmatic introduction
Netty - a pragmatic introductionNetty - a pragmatic introduction
Netty - a pragmatic introduction
 
A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13
 
Rust kafka-5-2019-unskip
Rust kafka-5-2019-unskipRust kafka-5-2019-unskip
Rust kafka-5-2019-unskip
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Open stack networking vlan, gre
Open stack networking   vlan, greOpen stack networking   vlan, gre
Open stack networking vlan, gre
 
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
CNTUG x SDN Meetup #33  Talk 1: 從 Cilium 認識 cgroup ebpf - RuianCNTUG x SDN Meetup #33  Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
 
Building scalable network applications with Netty (as presented on NLJUG JFal...
Building scalable network applications with Netty (as presented on NLJUG JFal...Building scalable network applications with Netty (as presented on NLJUG JFal...
Building scalable network applications with Netty (as presented on NLJUG JFal...
 
How fluentd fits into the modern software landscape
How fluentd fits into the modern software landscapeHow fluentd fits into the modern software landscape
How fluentd fits into the modern software landscape
 
OSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoringOSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoring
 
Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015
 
Building Netty Servers
Building Netty ServersBuilding Netty Servers
Building Netty Servers
 
Netty Notes Part 3 - Channel Pipeline and EventLoops
Netty Notes Part 3 - Channel Pipeline and EventLoopsNetty Notes Part 3 - Channel Pipeline and EventLoops
Netty Notes Part 3 - Channel Pipeline and EventLoops
 
Driving containerd operations with gRPC
Driving containerd operations with gRPCDriving containerd operations with gRPC
Driving containerd operations with gRPC
 

Similar a Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSDEM 2016)

Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Igalia
 

Similar a Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSDEM 2016) (20)

Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
 
IPv4aaS tutorial and hands-on
IPv4aaS tutorial and hands-onIPv4aaS tutorial and hands-on
IPv4aaS tutorial and hands-on
 
Tutorial: IPv6-only transition with demo
Tutorial: IPv6-only transition with demoTutorial: IPv6-only transition with demo
Tutorial: IPv6-only transition with demo
 
OpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking ArchitectureOpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking Architecture
 
Io t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doinIo t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doin
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Rapid IPv6 Deployment for ISP Networks
Rapid IPv6 Deployment for ISP NetworksRapid IPv6 Deployment for ISP Networks
Rapid IPv6 Deployment for ISP Networks
 
3hows
3hows3hows
3hows
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
 
IPv6 transition and coexistance - Jordi Palet
IPv6 transition and coexistance - Jordi PaletIPv6 transition and coexistance - Jordi Palet
IPv6 transition and coexistance - Jordi Palet
 
Snabb, a toolkit for building user-space network functions (ES.NOG 20)
Snabb, a toolkit for building user-space network functions (ES.NOG 20)Snabb, a toolkit for building user-space network functions (ES.NOG 20)
Snabb, a toolkit for building user-space network functions (ES.NOG 20)
 
IPv6 Transition & Deployment, including IPv6-only in cellular and broadband
IPv6 Transition & Deployment, including IPv6-only in cellular and broadbandIPv6 Transition & Deployment, including IPv6-only in cellular and broadband
IPv6 Transition & Deployment, including IPv6-only in cellular and broadband
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under LinuxPractical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
 
Compatibility between IPv4 and IPv6
Compatibility between IPv4 and IPv6Compatibility between IPv4 and IPv6
Compatibility between IPv4 and IPv6
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
There and back again
There and back againThere and back again
There and back again
 
Run Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT NetworkRun Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT Network
 

Más de Igalia

Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
Igalia
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
Igalia
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
Igalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Igalia
 

Más de Igalia (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdf
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSDEM 2016)

  • 1. Simple is better Building fast IPv6 transition mechanisms on Snabb Switch 31 January 2016 – FOSDEM 2016 Katerina Barone-Adesi kbarone@igalia.com Andy Wingo wingo@igalia.com
  • 2. Snabb Switch A toolkit for building network functions High performance, flexible, hackable data plane
  • 3. The Tao of Snabb Simple > Complex Small > Large Commodity > Proprietary
  • 4. Simple > Complex How do we compose network functions from smaller parts? Build inside of network function like composing UNIX pipelines intel10g | reassemble | lwaftr | fragment | intel10g Apps independently developed, linked together at run-time
  • 5. Simple > Complex What is a packet? struct packet { unsigned char data[10*1024]; uint16_t length; };
  • 6. Small > Large Early code budget: 10000 lines Build in a minute Constraints driving creativity
  • 7. Small > Large Secret weapon: LuaJIT High performance with minimal fuss
  • 8. Small > Large Minimize dependencies 1 minute build budget includes LuaJIT and all deps Deliverable is single binary
  • 9. Small > Large Writing our own drivers, in Lua User-space networking The data plane is our domain, not the kernel’s ❧ Not DPDK’s either!❧ Fits in 10000-line budget❧
  • 10. Commodity > Proprietary Open source (Apache 2.0)
  • 11. Commodity > Proprietary Open data sheets Intel 82599 10Gb, soon up to 100Gb Soon: Mellanox (they agree to release data sheet!) Also Linux tap interfaces, virtio host and guest
  • 12. Commodity > Proprietary Double down on 64-bit x86 servers Prefer CPU over NIC where possible Embrace the memory hierarchy
  • 13. Storytime! “We need to do work on data... but there’s just so much of it and it’s really far away.”
  • 14. Storytime! Modern x86: who’s winning? Clock speed same since years ago Main memory just as far away
  • 15. HPC people are winning “We need to do work on data... but there’s just so much of it and it’s really far away.” Three primary improvements: CPU can work on more data per cycle, once data in registers ❧ CPU can load more data per cycle, once it’s in cache ❧ CPU can make more parallel fetches to L3 and RAM at once ❧
  • 16. Networking folks can win too Instead of chasing zero-copy, tying yourself to ever-more-proprietary features of your NIC, just take the hit once: DDIO into L3. Copy if you need to – copies with L3 not expensive. Software will eat the world!
  • 17. Networking folks can win too Once in L3, you have: wide loads and stores via AVX2 and soon AVX-512 (64 bytes!) ❧ pretty good instruction-level parallelism: up to 16 concurrent L2 misses per core on haswell ❧ wide SIMD: checksum in software!❧ software, not firmware❧
  • 19. IPv6 transition on Snabb: a lwAFTR
  • 20. Why IPv6? ● The IPv4 address space is exhausted - IANA top level exhaustion in 2011 - 4/5 Regional Internet Registries exhausted - September 2012 in Europe - September 2015 in the US - AfriNIC within the next few years ● The internet is still growing ● Moving to IPv6 helps
  • 21. IPv6 transition mechanisms ● Users want everything to continue working … including IPv4 websites, networked games, etc ● Some user equipment cannot do IPv6 ● Several options: NAT64, 464XLAT, DS-Lite...
  • 22. Why Lightweight 4over6? ● Similar to DS-Lite, but less centralized state ● Share IPv4 addresses between users ● Each user gets a port range ● Allows providers to have a simpler architecture: pure IPv6, not dual-stacked IPv4 and IPv6, in their internal network ● Standardized as RFC 7596 in 2015
  • 23. Two main parts: B4 and AFTR ● Both encapsulate and decapsulate IPv4-in-IPv6 ● Each user (subscriber) has a B4 ● The network provider has one or more AFTRs, which store per-subscriber (not per-flow) information ● The information: The B4's IPv6 address, IPv4 address, and port range.
  • 26. IPv4 is tunnelled in IPv6
  • 27. Snabb lwAFTR ● Started July 2015 ● Proof of concept data plane October 2015 ● It's already usable and fast. ● http://github.com/igalia/snabbswitch/ - lwaftr* branches - Apache License v2
  • 28. Performance ● Hardware: two 10-gigabit NICs - Intel 82599ES, SFI/SFP+ ● Xeon processor: E5-2620 v3 @ 2.40GHz ● Snabb-lwaftr alpha release ● 550-byte packets ● Over 4 million packets/second → over 17 gigabit/second handled on one core
  • 29. Challenges ● Correctly handling ICMP - conveying failure information, for instance to an IPv4 host if a failure occurs within the tunnel ● Speed ● Speed with a lot of subscribers ● Correctness ● Hairpinning
  • 32. Implementation challenges Binding table lookup - Port partition When to hairpin? Virtualization Policy Configuration
  • 33. Binding table lookup Say, Belgium: millions of tunnels Per-tunnel: IPv4, IPv6 of B4, port set ID At least 4 + 16 + 2 = 22 bytes 2M entries: 44MB You can’t fit it in L3.
  • 34. Binding table lookup So always budget for an L3 cache miss – but only one! 4 MPPS in: 250 ns/packet One cache miss RTT (80 ns) within budget Many fetches can happen in that RTT
  • 35. Binding table lookup: v1 Open-addressed robin-hood hash table with linear probing Result probably right where we first look for it, otherwise in adjacent memory, might fetch adjacent cache lines
  • 36. Binding table lookup: v2 Maximum probe length around 8 for 2e6 entries, 40% occupancy Stream in all 8 entries at once in parallel Branchless binary search over those 8 entries
  • 37. Binding table lookup: v3 Stream in all 8 entries at once in parallel for multiple packets in parallel❧ 32 packets at a time: amortized 50ns/ lookup Worst-case bounds!
  • 38. Port partitioning Different IPv4 addresses can have their ports partitioned in different ways Need f(ipv4, port) -> params Current solution: partition IPv4 space into ranges with same parameters, use binary search
  • 39. Hairpinning Problem: after decapsulating IPv4 packet, send to internet or re-tunnel back to IPv6? Answer: Use port partition as quick check, if so do the hairpinning Yay software
  • 40. Virtualization Want to make a virtualized lwaftr Missing virtio-net implementation Work by Virtual Open Systems; thanks! Usual workload: One Snabb-NFV per interface on the host Same performance Yay software!
  • 42. Configuration Compile binding table from text Update/control plane TBD
  • 43. Future work Yang Smaller packets Integrate ILP binding table fetch 40Gb