Analogous to server virtualization, Network Virtualization decouples and isolates virtual networks (i.e. tenant) from the underlying network hardware. One of the key value propositions of Software-Defined Networking (SDN) is to enable the provisioning and operation of virtual networks. This tutorial motivates the need for network virtualization, describes the high-level requirements, provides an overview of all architectural approaches, and gives you a clear picture of the vendor landscape.
Previously presented at ONUG Fall 2013 and Spring 2014.
2. • THEORY
‒ Why Virtualize Networks?
‒ What is Software-defined Networking?
‒ Buzzwords:
OpenFlow, Open vSwitch, OVSDB, OpenStack, OpenDayLight
‒ Deploying Network Virtualization
‒ Vendor solution survey and landscape
Agenda
2
MY GOAL FOR TODAY: Make you dream of network containers.
5. • Corporate priority for managing internal workloads
‒ Supporting multiple tenants with virtualized resources
(computing, storage, and networking)
‒ Speed up configuration to allow workload rollout sooner
Immediate feedback on service rollout
Reduced time to revenue
• Hybrid clouds with bursting
‒ Leveraging guarantees of public CSP Reduced upfront CapEx and OpEx
‒ Bursting to public cloud for peak loads Reduced overprovisioning
‒ Lossless live migration Improved disaster recovery
Introducing the Multi-tenant Cloud
Network is a serious blocker of this vision
5
6. 1. Efficiency limited by VLANs and/or subnets
Multi-tenant Cloud DC today
6
Reality:
25% utilization
Goal:
80% utilization
7. 2. Network design limited by poor isolation
Multi-tenant Cloud DC today
7
a) Separate physical
networks for
different load,
b) 'N' VLANs
allocated to each
tenant
VM
A1
Hypervisor
Host 1
Switch-1 Switch-2 Switch-3
Switch-1 Switch-2 Switch-3
WAN
VLAN-101-x VLAN-101-x VLAN-101-x
VLAN-101-x
VLAN-101-x
VLAN-101-x
VLAN-101-x VLAN-101-x
VLAN-101-x VLAN-101-x
VLAN-101-x VLAN-101-x
8. 3. Scaling infrastructure is problematic
‒ L2 switching does not scale because of need to track
large number of MAC addresses
‒ L3 routing scales, but traditional architecture does not
support IP address overlap between tenants
Multi-tenant Cloud DC today
8
Leaf SW1 Leaf SW2
Spine SW3
Server 1 VM 1 VM 2
Server 2 VM 3 VM 4
Server 3 VM 5 VM 6
Server 4 VM 7
Server 5 VM 8 VM 9
Server 6 VM10 VM 11
9. 4. Poor orchestration of virtualized L4-L7 appliances
Multi-tenant Cloud DC today
9
Internet Internet
NFV
10. 5. VMs are not treated as first class citizens
‒ Over 70% of today's servers are virtual machines
‒ But,
East-west traffic poorly managed
Lack of prioritization and rate-limiting at VM level
Traffic between VMs on same server often unsupervised
6. Dynamic workload over multiple clouds is tricky
‒ Provisioning network takes forever
‒ Flat L2 network requires L2VPNs and other complex
entities that are not easily created on the fly
Multi-tenant DC Today
10
11. • Lack of abstraction that decouples infrastructure
from policy framework
• Lack of ways to define the application container
with dependencies on resources
Basic Problem underlying all this
11
14. • Closed to Innovations in the infrastructure
Current Internet
Specialized Packet
Forwarding
Hardware
Service Service Service
Specialized Packet
Forwarding
Hardware
Service Service Service
Specialized Packet
Forwarding
Hardware
Service Service Service
Specialized Packet
Forwarding
Hardware
Service Service Service
Specialized Packet
Forwarding
Hardware
Operating
System
Operating
System
Operating
System
Operating
System
Operating
System
Service Service Service
Closed
Current Mode of Operation:
High complexity and cost,
Coarse traffic management,
not easy to innovate on top
14
15. “Software-defined Networking” Approach
Specialized Packet
Forwarding
Hardware
Service Service Service
Specialized Packet
Forwarding
Hardware
Service Service Service
Specialized Packet
Forwarding
Hardware
Service Service Service
Specialized Packet
Forwarding
Hardware
Service Service Service
Specialized Packet
Forwarding
Hardware
Operating
System
Operating
System
Operating
System
Operating
System
Operating
System
Service Service Service
Network Operating System
LB
service
FW
service
IP routing
service
15
16. “Software-defined Network”
Simple Packet
Forwarding
Hardware
Simple Packet
Forwarding
Hardware
Simple Packet
Forwarding
Hardware
Simple Packet
Forwarding
Hardware Simple Packet
Forwarding
Hardware
Network Operating System
OpenFlow or
other API
North-bound
interface API
Unchanged mgmt API
Future Mode of Operation:
Lower complexity and cost,
Granular traffic management,
Dynamic and Automated
LB
service
FW
service
IP routing
service
16
Legacy
Router
17. Modes of SDN Deployment
1.In-network: Existing/green-field network fabrics upgraded to support OpenFlow
2.Overlay: WITHOUT changing fabric, the intelligence is added to edge-devices,
‒ as an additional appliance (e.g., bump-in-wire managed by controller)
‒ as enhanced server kernel bridge (e.g., OpenVSwitch in x86 hypervisors)
Control Path OpenFlowHardware
switch
Data path
(Hardware)
Figure courtesy of
Martin Casada @ VMware
17
18. • Google (Mode #1):
‒ Uses Openflow controllers and enabled
switches to interconnect their data centers
• AT&T, eBay, Fidelity Investments, NTT
and Rackspace (Mode #2):
‒ Using OpenStack Quantum and Nicira NVP
controller to manage the virtual networks
within cloud environment
• Genesis hosting
(Hybrid Mode #1 + Mode #2)
‒ Uses NEC controller in intra-data-center
scenario in production setting
Publicly Announced SDN Deployments
18
19. Business Potential of SDN
19
Business potential How?
Reduced time to revenue Speed up of service provisioning
New revenue Through new business models centered around
on-demand usage
Improved policy compliance Ensure that cloud workload is compliant with
enterprise policies (e.g., access control)
OpEx saving Automated operations and easier management
of resources
Reduced OpEx during upgrades Introduce new functions and service by
replacing just software stack
21. A quick primer on OpenFlow
Controller
PC
OpenFlow Switch
OpenFlow Switch OpenFlow Switch
Alice's code
Decision?
OpenFlow
Protocol
Alice'sRule
Alice's Rule Alice's Rule
OpenFlow offloads control intelligence to a remote software
21
Match L1: Tunnel ID, Switchport
L2: MAC addr, VLAN ID,
Ether type
L3: IPv4/IPv6 fields, ARP
L4: TCP, UDP
Action • Output to zero or
more ports
• Encapsulate
• Header rewriting
• Send to controller
22. Sample OpenFlow Physical Switches
Model Virtualize Notes
HP Procurve
5400zl or 6600
1 OF instance
per VLAN
-LACP, VLAN and STP processing
before OpenFlow
-Wildcard rules or non-IP pkts
processed in s/w
-Header rewriting in s/w
-CPU protects mgmt during loop
NEC IP8800 1 OF instance
per VLAN
-OpenFlow takes precedence
-Most actions processed in hardware
-MAC header rewriting in h/w
Brocade MLX
routers
Multiple OF
instance per
switch
-Hybrid OpenFlow switch with legacy
protocols and OpenFlow coexisting
-OpenFlow commands can override
state created by legacy protocos
Pronto 3290 or
3780 with Pica8 or
Indigo firmware
1 OF instance
per switch
-No legacy protocols (like VLAN, STP)
-Most actions processed in hardware
-MAC header rewriting in h/w
23. • Kernel module that replaces the standard linux bridge to
provide significant packet matching and processing flexibility
Open vSwitch (OVS)
23
Figure courtesy
Thomas Graf @Red Hat
24. • API that is an alternative to OpenFlow
‒ Lightweight
‒ Transactional
‒ Not SQL
‒ Persistent
‒ No packet_in events
• Include Configuration and Control
• Also manages slow-moving state:
‒ VM placement (via VMM integration)
‒ Tunnel setup
OVSDB
24
25. Open-source OpenFlow Controllers
25
Controller Notes
Ryu (NTT) •Apache license
•Python
NOX/POX (ONRC) •GPL
•C++ and Python
Beacon (Stanford Univ.) •BSD-like license
•Java-based
Maestro (Rice Univ.) •GPL
•Based on Java
Trema (NEC) •GPL 2.0
•Written in C and Ruby
Floodlight (Big Switch) •Apache license
•Java-based
OpenDayLight
(Linux Foundation)
•Eclipse Public License
•Java-based
26. • Vendor-driven consortium (with Cisco, IBM, and others)
for developing open-source SDN controller platform
OpenDayLight Controller
26
27. Stack for Networking with OpenStack
Typical workflow
1. Create a network
2. Associate a subnet
with the network
3. Boot a VM and attach
it to the network
4. Delete the VM
5. Delete any ports
6. Delete the network
pSwitch
pSwitch
vSwitch
Network Virtualization App
SDN Controller
vSwitch
Plugin
API
27
Neutron
30. 1. Traffic isolation across virtual
networks
‒ No VLANs and its 4094 limit
‒ Flexible containerization and
switching of traffic
‒ Clear state management
‒ IP address overlap allowed
2. Scalably identifying individual
VM’s traffic
‒ Intercepting traffic
‒ Virtual network identification
‒ Tracking hosts with
minimal state
Requirements/Challenges
30
3. Integration with legacy
‒ Encapsulation and tunneling
‒ VLAN to VxLAN gateways
‒ Support bare metal servers
4. Chaining and orchestrating
virtual L4-L7 services
‒ Placement, number of instances,
offloading
5. Troubleshooting support
‒ End-to-end visibility
‒ Mapping Virtual to Physical for
troubleshooting
31. Deployment mode #1: Underlay
VPN termination,
L3 routing
VM VM VM VMVM VM IP 192.168.1.2, MAC 0x1
VM VM VM VMVM VM
VM VM VM VMVM VM
VM VM VM VMVM VM
Controller
cluster
CLI, REST, GUI
IP 192.168.1.2, MAC 0x2
IP 192.168.2.2, MAC 0x1
IP 192.168.1.2, MAC 0x3
IP 192.168.1.2, MAC 0x2
IP 192.168.1.2, MAC 0x1
IP 192.168.2.1, MAC 0x2
IP 192.168.1.3, MAC 0x4
Tenant membership
decided based on
{switch-port, MAC, IP}
tuple in each flow
31
VNet identified
using VLANs,
VxLANs or GRE
Internet
Custom routing
by controller
32. • Problem: OpenFlow switches have resource limitations
‒ Weak CPU incapable of doing traffic summarization, frequent
statistics reporting, and packet marking
‒ Flow-table limitation in switches (e.g., 1500 exact match entries)
‒ Switch-controller communication limits (e.g., 200 packet_in/sec)
‒ Firmware does not always expose the full capabilities of the chipset
• Solution:
‒ Next generation of hardware customized for OpenFlow
‒ New TCAMs with larger capacity
‒ Intelligent traffic aggregation
‒ Minimal offloading to vSwitches
Performance Limitations
32
33. Legacy
L3 routing
Legacy
L2 switching
VM VM VM VMVM VM
10.1.1.0/24 10.1.2.0/24 10.2.1.0/24
10.1.1.1 10.1.1.2 10.1.2.1 10.1.2.2 10.2.1.1 10.2.1.2
VM VM VM VMVM VM
VM VM VM VMVM VM
VM VM VM VMVM VM
vDP vDP vDP vDP vDP vDP
Controller
cluster
Internet
Logical link
v/p-GatewayCLI, REST, GUI
Deployment mode #2: Overlay
vDP: Virtual Data Plane
VM addressing
masked from fabric
Tunnels
Tenant membership
decided by virtual
interface on the vSwitch
vDP
34. • Use of tunneling techniques,
such as STT, VxLAN, GRE
• Functionality implemented at
the vDP include:
‒ Virtual network switching, rate
limiting, distributed ACLs, flow
marking, policy enforcement
• Functionality implemented at
the gateway can include:
‒ NAT, Tunnel termination, Designated
broadcast, VLAN interface
• Network core is not available
for innovation
Overlay-based Network Virtualization
34
Topology acts like
a single switch
35. • Bare metal mode
‒ Running a native OS with
baked in containerization
• Hypervisor mode
‒ Typically supported with
KVM, Xen or Hyper-V
• Appliance mode
‒ Typically with VMware ESX
Typical Insertion Choices
35
VM
A1
VM
B1
Hypervisor
VM
A3
DVS
SDN
Engine
Host server
VM
A1
VM
B1
VM
A3
Custom vSwitch
VM
D1
Host server
VM
A1
VM
B1
VM
A3
VM
D1
Host server
Hypervisor SDN Engine
36. VxLAN Tunneling
36
• Between VxLAN Tunnel End Points (VTEP) in each host server
• UDP port numbers allows better ECMP hashing
• In absence of SDN control plane, IP multicast is used for
layer-2 flooding (broadcasts, multicasts and unknown unicasts)
VTEP outer
MAC header
Outer IP
header
Outer UDP
header
VxLAN
header
Original L2 packet
VxLAN flags
Reserved
24bit VN ID
Reserved
Source port
VxLAN port
UDP Length
Checksum
37. MPLS over GRE Tunneling
37
Transport
header of the
Authoritative
Edge Device
MPLS o GRE
header
Original L2
packet
38. • Solution:
‒ Offload it to the top-of-
rack leaf switch
‒ Use hardware gateway
• Problem:
‒ Overlay mode is CPU
hungry at high line rates
and has anecdotally fared
poorly in real world
Performance Limitations
38
Throughput Recv
side cpu
Send
side cpu
Linux Bridge: 9.3 Gbps 85% 75%
OVS Bridge: 9.4 Gbps 82% 70%
OVS-STT: 9.5 Gbps 70% 70%
OVS-GRE: 2.3 Gbps 75% 97%
Source: http://networkheresy.com/2012/06/08/the-overhead-of-software-tunneling/
39. • Combined overlay and underlay (fabric) to achieve:
‒ end-to-end visibility
‒ complete control
‒ best mix of both worlds
• The integration may need 1) link-local VLANs or 2)
integration with VM manager to detect VM profile
Deployment mode #3: Hybrid
39
41. Rack
Four types of SDN solutions
1. SDN-Dataplane
‒ Traffic handling
devices
Physical
Virtual
2. SDN-Control
‒ Decoupled control
plane
OpenFlow++
Overlay
3. SDN-Fabric
‒ Combined data and
control plane
4. SDN-Mgmt
‒ Extensible mgmt
software and API
Core
Aggregation
Edge
Controller
cluster
Management/
Orchestration
Virtual switches
Server
manager
41
42. Vendor Ecosystem
Data plane
(Elements used for traffic
handling)
Controller solutions
(Decoupled control
plane)
Fabric
(Combined data
and control plane)
Management
(Extensible mgmt
software and API)
L2-L4
routing
SDN-D-
PSwitch
SDN-D-
VSwitch
SDN-C-
OpenFlow
SDN-C-
Overlay
SDN-D-Fabric SDN-N-Mgmt
42
(*Not necessarily complete)
43. Vendor Ecosystem
Data plane
(Elements used for traffic
handling)
Controller solutions
(Decoupled control
plane)
Fabric
(Combined data
and control plane)
Management
(Extensible mgmt
software and API)
L4-L7
services
SDN-S-Dataplane SDN-S-Control SDN-S-Fabric SDN-S-
Orchestrator
43
(*Not necessarily complete)
44. Converging Architecture for L2-L4
• P+V or Overlay-Underlay
‒ Vendors are converging towards an architecture where
Overlay: Provides flexibility
Underlay: Provides performance
‒ Achieve end-to-end visibility and control
• Vendor options
‒ Same vendor for overlay and underlay (e.g., Cisco Insieme
+ Cisco 1KV, Big Switch SwitchLight, HP, Juniper)
‒ Different vendor for each
Overlay : VMware, IBM, PLUMgrid, Nuage/ALU
Underlay: Arista, Brocade, Pica8, Cumulus
44
45. Overlay: VMware NSX
• VxLAN and STT tunneling
• Partnership with several hardware vendors for VTEP
45
46. • Open-source solution that uses MPLS/GRE/VxLAN in
dataplane and XMPP for control plane signaling
Overlay: Juniper Contrail System
46
XMPP
47. Cloud Service
Management Plane
Datacenter
Control Plane
Datacenter
Data Plane
Virtual
Routing &
Switching
Virtualized
Services
Directory
Virtualized
Services
Controller
HYPERVISOR
HYPERVISOR
HYPERVISOR
HYPERVISOR
HYPERVISOR
HYPERVISOR
Brooklyn Datacenter - Zone 1
Virtualized Services Directory (VSD)
• Policy Engine – abstracts complexity
• Service templates and analytics
Virtualized Services Controller (VSC)
• SDN Controller, programs the network
• Rich routing feature set
Virtual Routing & Switching (VRS)
• Distributed switch / router – L2-4 rules
• Integration of bare metal assets
Nuage Networks
Virtualized Services Platform (VSP)
IP Fabric
Edge Router
MP-BGPMP-BGP
Hardware
GW for
Bare
Metal
Overlay: Nuage Networks VSP
• Tunnel encapsulation using VXLAN or VPNoGRE.
• Hardware integration for Gateway through MP-BGP
47
48. Hybrid: HP-VMware Partnered NSX
• Virtual switches from Vmware or HP
• Physical switches from HP
48
OVSDB
OpenFlow
49. Hybrid: Big Switch “P+V” Fabric
• Fabric combining physical and virtual OpenFlow switches
‒ Support for end-to-end network virtualization
‒ Support for integrating L4-L7 and other legacy devices
49
50. • Multi-tenant logical networks – 1000 Virtual Tenant Networks
• Multipath fabric with traffic engineering – 200 switches/controller
• End to end resiliency - millisecond link failover
Underlay: NEC ProgrammableFlow
VTN2(Layer2)
VTN1 (Layer3)
Controller
cluster
OpenFlow
Protocol
Switch Pool Server Pool
Physical Network
vRouter
vBridge
Virtual Tenant
Networks
Interface
50
51. L4-L7: Embrane Heleos
• Elastically rollout virtual L4-L7 appliances on x86
hardware based on metrics
• Approach complementary to L2-L4 network virtualization
solutions
51
52. L4-L7: Cisco vPath
• Similar to SDN, vPath architecture decouples control
plane and data plane, but for L4-L7 service
‒ Intelligence on the Virtual Service Gateway (VSG)
‒ Enforcement on the vPath agent in vSwitch
52
53. L4-L7: vArmour Virtual Distri. Firewall
• Physical or virtual Multi-
enforcement points that
integrate to a single policy
• Pre-configured security
group for diff app are
applied automatically
through Nova integration
• EP = Security and Fwding
• L2-L4 SDN not essential
53
Director
cluster
P/V Enforcement Point EP EP EP
vArmour FWaaS Plugin