Daniel Firestone and Gabriel Silva's presentation from the 2017 Open Networking Summit.
SDN is at the foundation of all large scale networks in the public cloud, such as Microsoft Azure - at past ONSes, Microsoft has detailed how all of Azure's virtual networks, load balancing, and security operate on SDN. But how do we make a software network scale to an era of 40, 50, and 100 gigabit networks on servers, providing great performance to end customers with ever increasing VM and container scale and density?
In this presentation, Daniel Firestone and Gabriel Silva will detail Azure Accelerated Networking, using Azure's FPGA-based SmartNICs. They will show how using FPGAs, we can achieve the programmability of a software network with the performance of a hardware one. They will detail how this and other host SDN advances have led to huge performance increases for Linux VMs in particular, and Linux-based NFV appliances, giving Azure industry-leading network performance.
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Accelerated SDN in Azure
1. Accelerated SDN in Azure
Daniel Firestone, Tech Lead and Manager
Gabriel Silva, Product Manager
Azure Networking Host SDN Team
2. Agenda
• Azure scale
• SDN in Azure
• Scaling SDN with SmartNIC
• Linux Networking
• Demo
3. mediacaching identity service bus
mobile
services
cloud
services
virtual
machines
Data
Services tableData Lake
blob
storage
SQL
database
App
Services
media
hpcintegration analytics
caching identity service bus
web apps
mobile
services
cloud
services
Infrastructure
Services cdn
virtual
machines
virtual
network vpn
traffic
manager
Microsoft Azure
5. Fortune 500 using
Microsoft Cloud
>85%
TRILLION
Azure storage
objects
MILLION
Azure Active
Directory Orgs
1 out of 3
Azure VMs
are Linux VMs
TRILLION
requests/day
> 3 TRILLION
Azure Event Hubs
events/week
>110 BILLION
Azure DB requests/day
Azure Scale & Momentum
>18
BILLION
Azure Active Directory
authentications/week
New Azure customers a month
>120,000
6. How Do We Build Software
Networks in the Cloud?
7. SDN: Building the right abstractions for Scale
Abstract by separating management,
control, and data planes
REST APIs
Controller
Virtual Switch
Management Plane
Control Plane
Management plane Create a tenant
Control plane Plumb these tenant
ACLs to these
switches
Data plane Apply these ACLs to
these flows
Example: ACLs
Data plane needs to apply per-flow policy
to millions of VMs
How do we apply billions of flow policy
actions to packets?
8. Virtual Filtering Platform (VFP)
Azure’s SDN Dataplane
• Acts as a virtual switch inside Hyper-V
VMSwitch
• Provides core SDN functionality for Azure
networking services, including:
• Address Virtualization for VNET
• VIP -> DIP Translation for SLB
• ACLs, Metering, and Security Guards
• Uses programmable rule/flow tables to
perform per-packet actions
• Supports all Azure dataplane policy at
40GbE+ with offloads
9. Flow Tables are the right abstraction for the Host
Node: 10.4.1.5
VFP
Blue VM1
10.1.1.2
NIC
Controller
Tenant Description
VNet Description
Flow Action
VNet Routing
Policy ACLsNAT
Endpoints
Flow ActionFlow Action
TO: 10.2/16 Encap to GW
TO: 10.1.1.5 Encap to 10.5.1.7
TO: !10/8 NAT out of VNET
Flow ActionFlow Action
TO: 79.3.1.2 DNAT to 10.1.1.2
TO: !10/8 SNAT to 79.3.1.2
Flow Action
TO: 10.1.1/24 Allow
10.4/16 Block
TO: !10/8 Allow
• VMSwitch exposes a typed Match-
Action-Table API to the controller
• One table per policy
• Key insight: Let controller tell the
switch exactly what to do with
which packets (e.g. encap/decap),
rather than trying to use existing
abstractions (Tunnels, …)
VNET LB NAT ACLS
11. Traditional Approach to Scale: ASICs
• We’ve worked with network ASIC vendors over the years to accelerate
many functions, including:
• TCP offloads: Segmentation, checksum, …
• Steering: VMQ, RSS, …
• Encapsulation: NVGRE, VXLAN, …
• Direct NIC Access: DPDK, PacketDirect, …
• RDMA
• Is this a long term solution?
12. Host SDN Scale Challenges in Practice
• Hosts are Scaling Up: 1G 10G 40G 50G 100G
• Reduces COGS of VMs (more VMs per host) and enables new workloads
• Need the performance of hardware to implement policy without CPU
• Not enough to just accelerate to ASICs – need to move entire stacks to HW
• Need to support new scenarios: BYO IP, BYO Topology, BYO Appliance
• We are always pushing richer semantics to virtual networks
• Need the programmability of software to be agile and future-proof –
12-18 month ASIC cycle + time to roll new HW is too slow
How do we get the performance of hardware
with programmability of software?
13. Our Solution – Azure SmartNIC
• HW is needed for scale, perf, and COGS at 40G+
• 12-18 month ASIC cycle + time to roll new HW is too slow
• To compete and react to new needs, we need agility – SDN
• SmartNIC combines agility of SDN with speed+COGS of HW
Roll out Hardware as we do Software
Blade
SmartNIC
NIC
ASIC
FPGA
CPU
ToR
Bump in the Wire:
Reconfigurable FPGA +
NIC ASIC
14. SmartNIC Design
• Use an FPGA for reconfigurable functions
• Same FPGAs already used in Bing
• Roll out Hardware as we do software
• Programmed using Generic Flow Tables
• Language for programming SDN to hardware
• Uses connections and structured actions as
primitives
• SmartNIC can also do Crypto, QoS, storage
acceleration, and more…
Blade
SmartNIC
NIC
ASIC
FPGA
CPU
ToR
15. 2015-17 FPGA Deployments:
40G Bump in the Wire
CPU CPU FPGA
NIC
DRAM DRAM DRAM
Server Blade FPGA board
Gen3 2x8
Gen3 x8
QPI Switch
QSFP
QSFPQSFP
40Gb/s
40Gb/s
OCS Blade with NIC and FPGA
FPGA
Tray
Backplane
Option Card
Mezzanine
Connectors
SmartNIC FPGA Card
All new Azure Compute servers since
late 2015 ship with FPGAs!
16. Azure Accelerated Networking: Fastest Cloud Network!
• Highest bandwidth VMs of any cloud
• DS15v2 & D15v2 VMs get 25Gbps
• Consistent low latency network performance
• Provides SR-IOV to the VM
• 10x latency improvement
• Increased packets per second (PPS)
• Reduced jitter means more consistency in workloads
• Enables workloads requiring native performance to run in cloud VMs
• >2x improvement for many DB and OLTP applications
18. Controller Controller Controller
VFP
VFP APIs
GFT Offload API (NDIS)
VMSwitch
VM
ARM APIs
SR-IOV
(Host Bypass)
GFT
Table
First Packet
GFT Offload Engine
SmartNIC
50G
QoSCrypto RDMA
GFT
19
SmartNIC - Accelerating SDN
19.
20. Linux on Azure Momentum
• Over 1 in 3 Azure VMs run Linux, and growing
• Over 40% of new VMs created today in Azure are Linux!
• Powershell is now open source and available on Linux
• Azure Container Service is GA – Integration with Docker Swarm,
Kubernetes, DC/OS
Our goal is that everything in Azure works for Linux VMs and Windows
VMs equally
21. VM Boost for Linux
VM Boost is a combination of Linux and Azure optimizations:
• Linux network driver improvements
• Microsoft Hyper-V Network adapter
• Fewer CPU interrupts
• Take advantage of segmentation offload in Azure
• Azure Platform optimizations
• VNET offloads recently deployed to production
• VFP performance improvements
Result is up to 200% improvement on larger VM sizes!
22. Announcing Accelerated Networking For
Linux (Preview)
• Now offered on Dv2 Series VMs
• Preview available in select regions
• West US 2
• South Central US
• Supported on multiple Linux distributions in the Azure Marketplace
• Ubuntu
• CentOS
Get the highest bandwidth VMs of any cloud (25Gbps) with ultra-low
latency on Linux!
24. Linux Network Virtual Appliances on Azure
• Dedicated to getting the best performance for 1st and 3rd party
appliances running on Azure
• Accelerated Networking for Linux allows appliance vendors to get low
latency, high throughput, high packets per second, and low jitter, for a
best in class experience
• Working with multiple partners to enable your favorite appliances
• SRIOV is just the first step, more improvements coming!
25. Microsoft’s Contributions to Linux Networking
• All Hyper-V and Azure integration, including Accelerated Networking, is
now upstream in the kernel
• SR-IOV problem – how do you live migrate a VM or service the host
network stack if the VM has a virtual function?
• Previously solved in Hyper-V for Windows VMs with transparent failover to synthetic
path
• Now contributed upstream to Linux – support for Live Migration for SR-IOV
Linux VMs running over Hyper-V (potentially others)
• First industry support for SR-IOV Live Migration in Linux!
• Supports serviceability in Azure with Accelerated Networking