Ensuring Technical Readiness For Copilot in Microsoft 365
NSX for vSphere Logical Routing Deep Dive
1. NSX for vSphere Logical Routing Deep Dive
Pooja Patel, VMware, Inc.
NET8131R
#NET8131R
2. Growing NSX Momentum
A rapid journey of customer adoption across industries
1700+ Customers
8 out of VMware’s
top 10 deals in Q216
included NSX
100% YoY
growth
Consistent year-to-year Q216
3. Security
Inherently secure infrastructure
Automation
IT at the speed of business
Application continuity
Data center anywhere
NSX customer use cases
Micro-segmentation
DMZ anywhere
Secure end user
IT automating IT
Multi-tenant infrastructure
Developer cloud
Disaster recovery
Cross cloud
Multi data center pooling
4. • This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these
features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or
sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not
been determined.
Disclaimer
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these
features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or
sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not
been determined.
Disclaimer
4
10. NSX Logical Routing Component – Distributed Logical Router
Optimized for E-W.
Distributed Logical Router instance is
instantiated on ESX hosts.
LIFs(Logical Interfaces) are defined on
the Distributed Router to handle VM
default gateway traffic.
Multiple LIFs per DLR instance.
Multiple DLR instances are supported
to isolate separate tenant domains.
DLR Control VM peers with the Edge
VM and exchanges routing
information.
Control VM is also used in software L2
bridging for placement and high-
availability on bridge.
10
DLR
Control VM
DLR Instance
ESXi
Hypervisor Kernel Modules
(VIBs)
Distributed
Logical Router
LIF1 LIF2 LIF3
12. Edge Services Gateway - Sizing
VM form factor - 4 sizing options.
Edge form factor can be changed after initial deployments via UI/API.
Resource reservation is done for CPU/Memory for NSX Edge VMs during creation from 6.2.3.
12
NSX Edge Services
Gateway
Form
Factor
vCPU Memory (MB) Usage
X-Large 6 8192 L7 Load Balancing
*dedicated core for
LB
Quad- Large 4 2048 High throughput
ECMP or High
Performance Firewall
Large 2 1024 Small/Medium DC or
multi-tenant
Compact 1 512 Small Deployments,
POCs and single
service use
VPN
13. External
Network
NSX Logical Routing – Topology view
13
Physical view
DLR
Logical view
VPN
External
VM2
VXLAN 5001
VXLAN 5002
VXLAN 5003
ESX Host A
LIF1 LIF2 LIF3
ESX Host B
LIF1 LIF2 LIF3
ESX Host C
LIF1 LIF2 LIF3
NSX Edge VM
DLR instance DLR instance DLR instance
DLR Control VM
VM1
VM2
Peering
VLAN based network
VPN
VM1
VXLAN 5003
VLAN
VXLAN 5001 VXLAN 5002
14. Centralized Routing for East-West Communication
Hair Pinning
14
vSphere Host
VDS1
Transport Network
10.10.10.10/24
vSphere Host
VXLAN 5001
20.20.20.20/24
NSX Services Edge GW VM
VXLAN 5002
Compute
Rack 1
Edge/Mgmt. Rack
1
7
2
3
4
5
6
Frame sent over VXLAN
transport Network to the
Gateway IP of Green Logical
Switch
VM1 on Green Logical Switch
(172.16.1.10) communicates
with VM2 on Red Logical
Switch(172.16.2.10)
Frame delivered to the
destination VTEP
(20.20.20.20)
Packet delivered to the
Gateway Interface for
Routing
After the Routing decision, the
frame is sent to the VM2 on Red
Logical Switch
Packet delivered to the destination
172.16.1.10
172.16.2.10
.1
.1
Frame delivered to the
destination VTEP
(10.10.10.10)
VDS2
VPN
VM1
VM2
15. NSX Logical Routing : Components Interaction
15
NSX Edge
(Acting as next hop router)
Web App
Distributed Logical Router
Instance
192.168.2.1
192.168.2.2
Forwarding Address
192.168.2.11
DLR Control VM
Data
Path
Control
Controller Cluster
Control
NSX Mgr
Distributed Logical Router is created using
NSX Manager UI or Rest API.1
OSPF/BGP peering between the NSX
Edge and logical router control VM3
Learnt routes from the NSX Edge are
pushed to the Controller for distribution4
Controller sends the route updates to all
ESXi hosts
5
Routing kernel modules on the hosts
handle the data path traffic6
1
3
4
5
6
Controller pushes logical router LIF
configuration to ESXi hosts
2
2
OSPF, BGP
Db
VXLAN
VLAN
VPN
Peering
External Network
16. Distributed Routing Traffic Flow
Same Host
16
vSphere Host
vSphere Distributed Switch
vSphere Host
VXLAN 5001
VXLAN 5002
Host 1 Host 2
1
2
LIF1 : 172.16.1.1
LIF2 : 172.16.2.1
LIF2 – ARP Table
VM IP VM MAC
172.16.2.10 MAC2
DA: vMAC
SA: MAC1
PayloadL2 IP
DA: 172.16.2.10
SA: 172.16.1.10
MAC1
MAC2
LIF1
LIF2 vMAC Internal LIFs
Destination
Interface
Mask Gateway Connect
172.16.1.0 255.255.255.0 0.0.0.0 Direct
172.16.2.0 255.255.255.0 0.0.0.0 Direct
Routing Table
3
4
10.10.10.10/24 20.20.20.20/24
Transport Network
172.16.1.10
172.16.2.10
VM1
VM2
17. Distributed Routing Traffic Flow
From External Networks (Ingress)
17
VDS2
vSphere Host
VDS1
vSphere Host
VXLAN 5001
NSX Edge GW
VXLAN 5003
Host 1 Host 2
Uplink
172.16.1.10
MAC1
5
12
3
4
6
192.168.2.0/24 Transit Network
.1
The Packets are forwarded
to Transit Network LIF
configured on Logical
Router
After route lookup, the packet is encapsulated
in VXLAN header and sent to the VTEP where
VM1 172.16.1.10 resides
Edge GW routes the traffic to the
next hop router interface
192.168.2.2
Packet delivered to the
destination
LIF2
LIF1
vMAC
LIF2 : 172.16.1.1 (I)
LIF1 : 192.168.2.2 (U)
Internal Networks
10.10.10.10/24 20.20.20.20/24
.2.1
4
Destination Network Next Hop
172.16.1.0/24 192.168.2.2
VM1
VPN
Device on the External Network (192.168.100.10) communicates
with VM1 on Green Logical Switch (172.16.1.10)
External
Networks
VXLAN Transport Network
192.168.100.10
19. 19
Logical Routing High Availability (HA)
DLR Control VM
External Network
NSX Edge
Distributed
Routing
E1
Physical Router
Active Standby
E2
Web DB
DLR
App
Active/Standby HA Model
VPN VPN
VPN
ECMP HA Model
2
…
E8E3E1
Physical Router
E2
Web DB
DLR
App
DBWeb App
VM VM VM VMVM VM
1
Active Standby
Active/Standby HA Model
20. I am ACTIVE
Active/Standby HA Model
20
HA Interface
How does Active/Standby HA work?
Edge High-availability – Configurable on Edge Services Gateways & DLR Control VMs.
Keepalives + State Sync Information - Exchanged between Active & Standby Edges on a designated HA interface.
Declare Dead Timer - Configurable
Non-preemptive HA
Stateful failover for services:
• FW - connection tracking LB - Sticky table DHCP- Lease Information Routing- FIB information
1
Active StandbyStandby
Hypervisor 1 Hypervisor 2
VPN VPNVPNVPN
X
Declare
Dead Timer
Expiry
Let me send
probes on my
interfaces…
No response on
any of the
interfaces :(
Sending
GARPs.
Waiting......
I am not receiving
keep-alives from
my peer
Active
21. Active/Standby HA Model
21
DLR
Physical Router
VXLAN
VLAN
Active Standby
.2
.1
.2
.1
E1-1E1-0
192.168.100.0/24
192.168.2.0/24
Routing
peering
1
Active
VPN
Standby
VPN
External Network
All N-S traffic is handled by the Active NSX Edge.
Only active NSX Edge establishes routing adjacencies to the DLR Control
VM and the physical router.
Anti-affinity & Graceful Restart are enabled by default.
Stateful services are supported on the NSX Edge pair
FW, Load-Balancing, NAT, DHCP
HA Recommendations
Dynamic Routing Timers - OSPF 30/120 BGP 60/180
Dedicate Logical Switch as the HA Interface for DLR Control VMs/ESGs.
Declare Dead Timer is configurable and can be tuned down to 6 seconds
Web
172.16.1.0/24
App
172.16.2.0/24
DB
172.16.3.0/24
22. Active/Standby HA Model
22
DLR
Physical Router
VXLAN
VLAN
Active Standby
.2
.1
.2
.1
E1-1E1-0
192.168.100.0/24
192.168.2.0/24
Routing
peering
1
Active
VPN
Standby
VPN
External Network
ESXi Host DLR instance route table
DB
172.16.3.0/24
App
172.16.2.0/24
Web
172.16.1.0/24
24. 24
Logical Routing High Availability (HA)
DLR Control VM
External Network
NSX Edge
Distributed
Routing
E1
Physical Router
Active Standby
E2
Web DB
DLR
App
Active/Standby HA Model
VPN VPN
VPN
ECMP HA Model
2
…
E8E3E1
Physical Router
E2
Web DB
DLR
App
Active Standby
Active/Standby HA Model
DBWeb App
VM VM VM VMVM VM
1
25. ECMP HA Model (Up to 8 NSX Edges)
25
E3E1
Physical Routers
E2
…
External Network
Routing
peerings
VXLAN
VLAN
Routing
peerings
E8
2
Web DBApp
DLR
…
External Network
….
North-South traffic is handled by all Active NSX Edges
Multiple equal cost paths in the DLR FIB
Traffic is hashed based on Src/Dst IP address values
…
.4 .5 .6
26. ECMP HA Model (Up to 8 NSX Edges)
26
DLR
E3 E8E1
…
Routing
peerings
VXLAN
VLAN
Routing
peerings
E2
2
Web DBApp
Physical Routers
External Network
X
North-South traffic is handled by all Active NSX Edges
• Multiple equal cost paths in the DLR FIB
• Traffic is hashed based on Src/Dst IP address values
HA Recommendations
• No need to enable Edge HA for each Active Edge.
• Aggressive Routing Timers for fast failover
• Asymmetric routing paths – Stateful services not
supported(Stateful Edge Firewall, NAT).
• DFW is supported
• Set URPF setting to loose
.4 .5 .6
27. DLR
27
Physical Router
E2
Active
Routing
peering
VXLAN
VLAN
DLR Control VM – Failure and Recovery
Active Standby
.1
192.168.100.0/24
192.168.2.0/24
Routing
peering
2
Web DBApp
E1
DLR
External Network
.3
.2
.4
.2 .3
X
HA Recommendations:
Incase of aggressive OSPF/BGP routing timers for ECMP
• Static summary route to reach logical networks with DLR Forwarding Address as next hop.
• Floating static route with higher admin distance to be injected on each ESG.
• ESG to redistribute static to Physical.
Dual Failure Scenario
Ensure DLR Control VMs and ESGs are on separate ESX hosts.
Aggressive OSPF/BGP routing timers – Adjacency will flap incase of DLR Control VM failure.
28. Comparison of Edge HA Models
Active/Standby HA Model
Bandwidth
Single Path
(~10 Gbps/Tenant)
Stateful Services Supported - NAT, LB, FW, DHCP
Availability
Convergence with stateful services
enabled
ECMP Model
Bandwidth
Up to 8 Paths
(~80 Gbps/Tenant)
Stateful Services Not Supported
*DFW is supported
Availability
High
~ 3-4 sec with (1,3 sec) timers tuning
E1
Physical Router
Active Standby
E2
Routing
peering
Web DB
DLR
App
Active Standby
DLR
Control VM
…
E8E3E1
Physical Router
E2
Routing
peerings
Web DB
DLR
App
Active Standby
DLR
Control VM
VPN VPN
1
2
30. VLAN 20
Edge Uplink
Physical Routers
NSX ECMP Edges
VXLAN 5020
Transit Link
DLR Instance
30
Enterprise Routing Topology
…
Reference Design for SDDC
with NSX & vSphere
NET7857
…
E1 E2 E3 E8
DLR Control VMs
Routing peerings
FIB update
Routing
peerings
VXLAN
VLAN
Web1 App1 DB1 WebN AppN DBN
External Network
VM VM VM VM VM VMVM VMVM VM VM VM
31. VLAN 20
Edge Uplink
Physical Routers
NSX ECMP Edges
VXLAN 5020
Transit Link
DLR Instance
31
Enterprise Routing Topology + L2 Bridging
…
E1 E2 E3 E8
DLR Control VMs
Routing peerings
FIB update
Routing
peerings
VXLAN
VLAN
Web1 App1 DB1
External Network
VM VM VM VMVMVM
L2 Bridging
The hypervisor on which Active DLR Control VM is
placed is designated as Bridge Host with Active
Bridge Instance.
DLR Control VM only used for placement and HA of
bridge instance (not in datapath).
L2 Bridging – VNI to VLAN mapping.
DB
Server
DB
Server
32. Multi Tenant Routing Topology
32
Tenant 9
DLR
Instance 9
Tenant 1
NSX Edge
VXLAN 5021
Transit Link
VXLAN 5029
Transit Link
…
Can be deployed by Enterprises, SPs and
hosting companies.
Upto 9 tenants (as 10vNICs on the Edge
VM)
No support for overlapping IP addresses
between Tenants connected to the same
NSX Edge.
ECMP
Routing peerings
Routing peering
Web1
App1 DB1 Web1 App1 DB1
External Network
DLR
Instance 1
VM VM VM VM VM
VM VM VM
VM VMVM VM
33. Multi Tenant Routing Topology
33
NSX Edge
VXLAN Trunk
Use of Trunk interface on the NSX Edge (in
addition to Internal and Uplink).
Allows up to 200 sub-interfaces on a single
vNIC and establish peering with a separate
DLR instance.
Routing protocols are supported over sub-
interfaces.
Routing
peerings
Tenant 1
Tenant 2
Tenant 200
Single vNIC
Web1
VM VM VM
App1
VM VM
DB1
VM
External Network
Web1 App1 DB1
VM VM VM VMVM VM
34. High Scale Multi Tenant Topology – 2-tier
34
Tenant 1
…
Tenant NSX Edge with
HA NAT/LB features
ECMP NSX Edge X-Large
(Route Aggregation Layer)
ECMP Tenant
NSX Edge
VXLAN Uplinks (or
VXLAN Trunk)
VXLAN Uplinks (or
VXLAN Trunk)
VXLAN 5100
Transit
…
E1 E8
Web1 App1 DB1
VM VM VM VMVM VM
DLR
Instance 9
Web1 App1 DB1
VM VM VM VMVM VM
External Network
35. Cross-VC Multi-site topology
CONFIDENTIAL 35
ULS App1
Universal Distributed Logical Router Instance
VM VM VM
VM VM VM
ULS Web1
Site A Site B
vCenter
Server A
vCenter
Server B
Universal
Controller Cluster
NET7854R
Multi-Site Networking and Security
with Cross-VC NSX – Part 1 & 2
Universal Transport Zone
External Network
Control VM
w/ Local Egress
Control VM
w/ Local Egress
ULS Transit A ULS Transit B
36. Topologies Comparison
Topology Characteristics
Enterprise One DLR for all apps
DFW for VM to VM security
Typically no NAT
ECMP Edges
Multitenant Up to 9 tenants w/o trunk
Up to 200 tenants w/ trunk
DLR per tenant
No overlapping IP
High scale multitenant DLR and Edge per tenant
2-tier of Edges
Tenant IP scheme can overlap
Note: These topologies can be stretched across VC boundaries by using Cross-VC NSX.
38. Key Takeways
Logical Routing in NSX enables communication between workloads belonging to different
subnets.
• Distributed Logical Routers optimize traffic flows for E-W communication.
• Edge Services Gateways handle N-S communication to the physical network & provide network
services.
Two models for High Availability - Active-Standby and ECMP model
Multiple logical topologies can be built using UI/programmatically by combining NSX Distributed
Routing and Edge Services Gateway components.
Logical Routing now extends across vCenter boundaries.
38
39. Relevant Sessions and References
Sessions
References
NSX for vSphere Network Virtualization Design Guide (Ver 3.0)
https://communities.vmware.com/docs/DOC-27683
39
NET9029 NSX Logical Load Balancing: From Basics to Fine Art
NET7857
NET7858
Reference Design for SDDC with NSX and vSphere: Part 1 & 2
NET7865 Operational Best Practices for VMware NSX
NET7854
NET7861
Multisite Networking and Security with Cross-vCenter NSX: Part 1 & 2
41. NSX partner ecosystem
Physical Infrastructure
Security
Application Delivery
Operations and Visibility
44
DYNAMIC INSERTION OF
PARTNER SERVICES
42. Learn
Connect & Engage
communities.vmware.com
NSX Product Page & Technical Resources
vmware.com/products/nsx
Network Virtualization Blog
blogs.vmware.com/networkvirtualization
VMware NSX on YouTube
youtube.com/user/vmwarensx
Where to get started
Experience
30+ Unique NSX Sessions
Breakouts, quick talks & group discussions
Visit the VMware Booth
Use case demos, chat with NSX experts
Visit NSX Technical Partner Booths
Integration demos – EPSec & NetX, Hardware VTEP,
Ops & Visibility
Test Drive NSX with free Hands-on Labs
Expert-led or Self-paced. labs.hol.vmware.com
Use
NSX Proactive Support Service
Optimize performance based on data monitoring
and analytics to help resolve problems, mitigate
risk and improve operational efficiency.
vmware.com/consulting
Take
Training and Certification
Several paths to professional certifications. Learn
more at the Education & Certification Lounge.
vmware.com/go/nsxtraining
Notas del editor
Let us take a look at the Agenda:
-We will first go thru a quick NSX introduction and component overview. We will spend 5-7 minutes on this. In this presentation we will talk specifically about NSX for vSphere
-Then we will go into the meat of our presentation and introduce Logical Routing. We will spend the bulk of our time talking about Logical Routing concepts, high-availiability models and deployment topologies.
-We will try to dedicate the last 10 minutes to wrap-up and QA.
- I really wanted to cover different routing connetivity options but we will run out of time…so I have added them in the annex for your reference.
So what is NSX.
NSX is a networking and security platform which provides a variety of services in Software..
Some of the services we provide in software are
-Logical Switching
-Logical Routing both E-W as well as N-S Routing
-Services like NAT
-Load balancing ( Both inline and one-arm). We also have DLB in tech-preview.
-Firewalling (Both E-W Distributed Firewall and N-S Perimeter Firewalling).
-We also have very rich activity monitoring capabilities built in where u can monitor what VMs are being accessed by what users etc.
-For remote connectivity, we have VPN services like: SSL VPN, IPSec VPN, L2 VPN etc.
We have a dedicated session which talk about Load Balancing and another one on Leveraging NSX for Remote and Branch offices
-DHCP services (DHCP relay /DHCP server)
Connectivity to physical -> We have virtual L3 gateways which provide on/off ramp functionality from logical overlay network to the physical.
-support for software bridges as well as newly introduced HW VTEP functionality.
The first requirment is we need a physical network which can provide a stable/resilient backplane to forward traffic. We are underlay topology agnostic. The only requirement we have is the ability to carry VXLAN packets....we have a 50 byte overhead so MTU needs to be 1550 and above.
We use VXLAN overlay technology to extend L2 over L3.
Now coming to NSX architecture, we have a DP, CP and a MP
So what comprises the data plane?
Lets start with the ESX hypervisor and the VDS
When we enable NV on an NSX cluster,we go and install what we call VIBs or VMware installation bundles on each hypervisor.
.
-These VIBs provide Logical Switching, Distributed Routing and Firewalling at a hypervisor level. So instead of one giant centralized router we have these bunch of distributed routers which run inside each ESX host at a kernel level.
-We also have the NSX edge - VM form factor and provides a variety of services.
Its main function is to act as on/ramp off/ramp from logical network to physical network.It acts as a router and also provides other functionality like NAT, Firewall, Load-balancing, VPN services, DHCP services etc.
####So the ESXi hosts, the VDS and Edge make up a high performance Data-Plane to forward traffic.
CP:Consists of a cluster of 3 controllers. the brain of the solution.
Controllers build a clear picture of who is connected where, keeps track of VM, Logical Switch , VTEP association and help in distributing ARP and routing info.
DLR Control VM peers with the physical and exchanges routing info.
MP:
Consists of NSX Manager. single config point and entry point to the system for the REST APIs. Tightly coupled with vCenter and accessible via the vSphere Web GUI.
-Also used to install NSX and make hypervisors NV ready with secret VIBs and VTEPs.
-On top of NSX u can have a CMP to consume and orchestrate NSX .
U can also use a Cloud Mgmt system to consume NSX’s networking and security services. Some examples of CMS are vMware’s VRA and Openstack
Let us take a look at the Agenda:
-We will first go thru a quick NSX introduction and component overview. We will spend 5-7 minutes on this.
-Then we will go into the meat of our presentation and introduce Logical Routing. We will spend the bulk of our time talking about Logical Routing concepts, high-availiability models and deployment topologies.
-We will try to dedicate the last 10 minutes to wrap-up and QA.
So what challenges do we have in legacy way of doing things?
So what benefits of NSX Logical Routing over traditional legal way of doing things?
-BIG CENTRALIZED Router
-In traditional NW designs, every time a VM on a segment wants to communicate to a VM on a different segment traffic will need to go to a Big centralized router in the physical network to get routed. That is sub-optimal
Paradigm: NSX Distributed Routing is built on the paradigm of pushing the services as close to the application as possible.
In our case , we r pushing the services and performing lookups right in the ESX hypervisors where the application is running.
So eseentially in our design, we have these small distributed router instances running in each hypervisor so the load is distributed .
###among several hypervisors and we have a huge distributed router.
This helps us achieve BETTER SCALABILITY -
OPTIMIZATION OF DATA PATH
Other advantage we get is Optimization of the data-path.
Lets assume u have a VM1 on subnet-1 Web Tier and VM2 on subnet-2 running on DB Tier on the same hypervisor. There is no need to go to the Aggregation switch to get the packet routed and back to the same host. It is Sub-optimal and eats bw.
PROGAMMATIC CONSUMPTION.
You can now create virtualized networks programmatically very quickly without touching the physical infrastructure.
All u do is go to your CMP(like vRealize or Openstack) and say -spin a tenant for me, -spin some apps for this tenant -by the way also spin the virtual network for me . NSX Logical Routing, VXLAN based logical switching. and our other services makes this possible.
So now lets look at the first building block – Distributed Logical Router.
It has 2 components – DP Component + CP Component.
DP Component is the DLR instance which is pushed on each hypervisor and CP component is the DLR Control VM. We will look at both of these in detail.
When a distributed router is provisioned in NSX manager and DLR instance is pushed on all the hypervisors in the NSX domain. This DLR instance will provide E-W routing between the logical-switches attached to it.
What are LIF – LIF are logical interfaces defined on DLR instances and assigned IP addresses. They handle VM gateway traffic for the appropriate logical network.
Multiple LIFs on each DLR and an ARP table is maintained per LIF. Exactly the same DLR instance and the same LIFs are pushed to each hypervisor.
vMAC is the MAC address of the LIF
vMAC is same across all the Hosts and it is never seen by the physical network
VMs use the vMAC as their default gateway MAC address
ARP/MAC timeout on ESX Host 180/300 if no activity.
Now what if u want to create multiple tenants and what multiple DLRs to isolate tenant domains. That is possible too. We can create multiple DLR instances
DLR Control VM:
Now What if VMs need to communicate N-S to the physical network. So we need a way to communicate to the edge router which communicates N-S and exchange routing informariotn. That is the function of the DLR ControL VM. It establishes routing adj with the edge router and exchanges routing info.
This way u don’t need each ESX hypervisor to peer with physical and have 1000s of adjacencies.
Other function – L2 bridging.
DLR Control VM – Optional in small/static topologies.
Notes:
This 3 minute timeout is controlled by the switch security module which has a 180 second expiry for any addresses learnt through ARP snooping:
-Now let us have a look at the second building block which is the Edge Services Gateway.
-Edge services gateway is a VM appliance deployed in VM form factor.
-It provides N-S On/Ramp Off-Ramp connectivity between the Logical Networks and the physical networks.
It is very versatile and supports a plethora of network services in software like Routing(Static and Dynamic), NAT, Load Baancing, Firewalling, VPN, DHCP, DNS forwarding etc.
On the routing side: Dynamic routing as well as static routing for Supports OSPF, BGP. NAT can also be performed on the traffic running on the edge.
NSX Edge supports DHCP Relay functionality and can also act as a DHCP server.
We also have DNS forwarding capabilities.
If your sizing needs change, u can change the form factor of an already deployed edge from the NSX Manager UI/Rest Api.
Compact: Suitable for small POC environments.
Large: Small DCs.
Quad-large: ECMP & LB
If u need high L7 LB performance, XL is more suited
More details can be found in our design guides and reference design session
###
1000mhz per CPU reservation
Yes, single dedicated core for haproxy, core 4 in x-large.
Processes, such as config engine, event mananger, affinity to core 5.
Core 0-4 are for network traffic, but from 6.2.0, I changed nagios processes running on core 0-4, before this version, nagios processes are also running on core 5. L4 LB (IPVS) is in kernel and running on core 0-4.
As I heard from Mitesh that, from 6.2.3, L2VPN process also affinities to core 4 for x-large.
Now lets put both these pieces together in our topology.
Lets look at our logical view first.
We have a bunch of VMs which are connected to DLR for E-W routing.IF these VMs want to communicate N-S they have to communicate to the edge which will do VXLAN-VLAN translation and present the traffic to external networks.
Now lets look at the physical topology.
We have 3 hosts which are a part of our NSX domain. We have 3 logical switches.
We haveRouter with 3 logical interfaces LIF1, LIF2 and LIF3 and connect LIF to LS1 , LIF2 to the green network VXLAN 5002 and Purple VXLAN for Transit between DLR and Edge.
This DLR is instantiated on all the ESX hosts which are part of this NSX domain. So we now have a lil router running on all these hosts which can route traffic E-W between these 3 networks.
Now we create the Edge Gateway and we are going to connect it to two networks – Purple VXLAN network on its internal interface and the External world on its uplink interface.
We can establish a Dynamic Routing between the Edge and the external network. And a peering relationship between the DLR Control VM and the Edge.
Have 3 ESX hosts.
ESX host A ComputeESX Host B Compute
ESX Host C Edge and DLR Control VM to outside.
So now lets look at what wud happen if there was no Distributed Routing and we did centralized routing?
Sub-optimal
More latency
Using b/w for no reason
Now lets take a deeper look at how components interaction and communication between the various components.
DLR is created using NSX manager UI or REST API. What happens as a result of this step is that a DLR ControL VM is created as well as the DLR instance is instantiated on all hypervisors in the TZ but no LIFs are pushed.
Then the controller pushes Logical LIF configuration to the ESXi hosts.
Talk about Logical Routing -> DLR Control VM Optional topology
1) External nw: needs to know bout subnet to reach logical space.
Edge advertises routes to physical.
So physical knows the subnets reachable in logical-space.
Selectively advertise Web-Tier subnet to physical.
2) Logical space needs to know how traffic should return back to physical VM.
Needs to know the gateway.
All little routers know that when outside, send traffic to IP address of NSX edge.
DP completely bypasses CP.
DR Enabled.
The packet reaches the distributed router. DR has two LIFs LIF1 and LIF2. The routing tables shows the destination network as directly connect to the rotuer…so a routing lookup is performed and the packet is put over LIF2.
Traffic will be received at the DLR.
2) Route betn LIF1 and LIF2.
3) When route to LIF2 -
There is an associated ARP table at every LIF interface.
If there was no ARP entry for 172.16.2.10 , DR wud have generated ARP request to populate arp table.
ARP timeout.
-When an IP on external network want to communicate with VM on the logical network lets look at ingress.
It knows to reach logical network, it needs to get to the Edge uplink.
2 routing lookups.
-When traffic is handed to transport network- it is just switched over VXLAN.
If we are going in reverse direction ,first lookup wud happen here. And second lookup at service vm.
Fundamental thing to remember is:
Routing always happens at the ESX host where the Source VM is located.-> Routing happens at entry point and after that we just do switching.
Now that we have looked at all the building blocks in detail, lets look at Deployment & High Availability.
High-availability Design is important so that components are always highly available to resume services rapidly in the event of an outage. . Since the Edge and the DLR control VM run on ESX hosts, there is always a possibility of a host outage so if convergence is important, we shud design for HA.
In NSX, we have High-Availability options for both our routing compoents
-DLR Control VM-Edge Services Gateway VMs.
(DLR isnt in data-path but it is still responsible for exchanging routing updates).
So if for any reason the DLR control VM goes down, we have another one in standby taking its place.
For the ESG VM, we have 2 options – We have the active/standby model just like the DLR, we also have an ECMP model. The goal there is to have Active-Active HA where all edges are active at the same time.
We will discuss each model in detail in the subsequent slides.
The nice thing is enabling/disabling HA is a configurable option in the NSX, so it is easy to configure and embedded in the product.
###################
High Availability in already embedded and can be configured in the product
Elements needing HA are:DLR Control VM -> If it goes down, we have another one which can take over rapidly.
Seconf Element:Edge Services Gateway.
Same kind of model as the DLR Control VM – Active Standby.
Secon model of HA for Edge – ECMP. Goal is to get Active-Active HA….all active at same time.
For DLR Control VM – We have the Active/Standby HA model.
#################
In the last session, we got feedback that why design with HA?
Is High Availiabilty important to u?
High Availability Design is very important so that services can resume quickly in the event of an outage. Since the Edge and the DLR control VM run on ESX hosts, there is always a possibility of a host outage so we shud always design for high-availability so that convergence happens and services resume.
In NSX, We have HA/redundancy options for both our components:
-DLR control VM
-ESG VM
For the DLR Control VM, we have an Active/Standby HA model (DLR isnt in data-path but it is still responsible for exchanging routing updates).
So if for any reason the DLR control VM goes down, we have another one taking its place.
For the ESG VM, we have 2 options – We have the active/standby model and we also have an active-active model which provides scale-out
We will discuss each model in detail in the subsequent slides.
The nice thing is enabling/disabling HA is a configurable option in the NSX, so it is easy to configure.
Just explain high level, the next slides are giving the details.
The logical topologies don’t need to be designed with HA in mind, NSX will take care of it.
Now lets take a deeper look at the Active-Standby model.
In this model, a pair of edge VMs are deployed, one is the Active Edge and the second one is the Standby Edge.
By default Anti-affinity rules are configured by the system, so that the Active and the Standby land on different ESX hosts.
Keep-alives are exchanged between the Active VM and the Standby VM on a designated HA interface.
HA interface - can be a dedicated interface or a shared interface. By default, it is the first internal interface(non Uplink interface) on the ESG.
In addition to keep-alives, State information is also synchronized over this HA interface. Information like Firewall Connection tracking, NAT Tables, LB persistence tables and Routing tables, DHCP leases etc are synced.
Now lets have a look at what happens in failure
Here keep-alies are being exchangec between active/standby.
Hypveriso with Active VM fails.
Standby see that it isnt receiving keep-alives from peer…so it waits until the declar dead timer expires.
Declare Dead Timer is 15 seconds. If the Declare dead timer expires, the peer is declared unreachable.
For split brain prevention, Probes are sent on internal and uplink interfaces.
If no one replies, standby assumes peer is unreachable.
Standby takes over and becomes active and takes over the configuration and sends out GARPs.
Now what happens when the old Active recovers.
The keepalives resume….and the old Active becomes Standby nad syncs state information from Active. There is non-preemptive HA where we don’t switch back to reduce traffic outage.
On DLR Control VM- mechanics is the same, u have to pick an HA interface.
Best practice is to choose an HA interface on the transport network for both DLR and ESG control VMs.
Now lets take a look at this topology.
We have a DLR with 3 logical switches.
We have a DLR Control VM which is peering with the Edge services gateway.
ESG is configured with HA enabled. So there is an active ESG and a standby ESG deployed.
You can go to the NSX manager to find out which ESG is active and which is standby.
Only the active ESG establishes peering relationship with the DLR and the physical.
The standby ESG doesnt have any routing protocol setup or an IP address.
Let take a look at the routing table on the DR.
In 6.2 we introduced centrali CLI, so u execute commands on different components without sshing into those components. In this screenshot we are looking at the Distributed Router table on a particular hypervisor.
You can see the 3 directly connected networks Web App DB
zYou can see the default gateway point to the 192.168.2.1 (Forwarding address of the ESG)
Let take a look at the routing table on the DR.
Now lets look at the failure scenario.
What happens with Edge E1 fails.
When the active Edges fails, the standby Edge takes over the IP addresses of the active Edge. Routing peering happens with the standby. edge The adjacency from DLR and physical network perspective is still exactly the same without configuring routing.
Gratitiuous ARP
The new active Edge send out gratitious ARPs so that the network knows that the IP address has moved from a VM on one host to a VM on another host.
State information like Route Entries, DHCP leases and the Connection Tracker table entries is synced….
This is valuable when u r doing services like Firewall/NAT so there is no disruption.
<WIP> THE HA addresses are not taken over.
Lets take a look at the NSX manager view now. We also execture a show service high availability and see that HA index 1 is now active and peer is unreachable.
The keep-alive HA configuration address are on the 140 subnet. This is something which the user picked. U can leave the configuration blank and it will ping an auto-config 169.254 IP range.
#########
Now lets look at the failure scenario.
What happens with Edge E1 fails.
When the active Edges fails, the standby Edge takes over all the IP addresses of the active Edge. Routing peering with standby. The adjacency from DLR and physical network perspective is still exactly the same without configuring routing.
The new active Edge send out gratitious ARPs so that the network knows that the IP address has moved from a VM on one host to a VM on another host.
State information like Route Entries, DHCP leases and the Connection Tracker table entries is synced….
This is valuable when u r doing services like Firewall/NAT so there is no disruption.
Graceful restart is enabled by default. So leave the routing protocol timers to default or higher than the time it takes to transition, so that adjacency doesn’t flap.
Now lets look at the second model which is active-active
In the ECMP HA model,
All edges are active (upto 8 edges are supported in ECMP model).
NO keep-alives between E1 and E2. They are totally independent of each other. Edge HA not enabled. We don’t need graceful restart. We don’t need anyone to take over…so we recommend aggressive routing timers.
We can support equal cost paths from DLR to 8 edges.
If you look the In Data-path, we have 8 paths from the DR to the ESGs.
DLR instance is seeing 8 entries in the routing-table
How do we know which path to take : DR determines that based on a hash. Hash is based on IP source/dest (as any router in the industry).
First layer ECMP : DLR to ESG. This is first layer of ECMP
Second Level of ECMP:
Typically at the Edges: at least 2 NIC on host for redundnacy to two different TORs.
On return path: we see exactly the same. We see 8 diff entries for the different paths. We have OSPF configured in this topology hence u see N2
Detection: Based on Routing timers.
As soon as DLR detects that one Edge is down, it will rehash the packet to all the remaining edges.
In failure
As DLR detects one Edge is down, it will rehash the packet….it will re-loadbalance the traffic to remaining edges
###############
So with aggressive timers, it will take 3 seconds for physical to detect if an Edge is down.
Same thing for DLR.
North-South traffic is handled by all Active NSX Edges
Multiple equal cost paths in the DLR FIB
Traffic is hashed based on Src/Dst IP address values
Other recommendations
No need to enable Edge HA for each Active Edge.
Graceful restart on the NSX Edge Services Gateways taking part in ECMP can be disabled.
Routing Timers can be made aggressive for faster convergence.
vSphere HA should be enabled for the NSX Edge VMs.
Active/Active ECMP currently implies stateless behavior.
No support for stateful Edge Firewall, Load Balancing or NAT.
DFW can be used to protect the tenant VMs in the logical network.
Make sure URPF setting is set to loose as return traffic can return from a different ESG.
Now lets talk about DLR Control VM Failure and Recovery.
DLR control VM is not in data-path however it does peer with the ESGs and exchanges routing information.
DLR control VM can be configured in HA – Active-Standby Model.
Now there is a special failure scenario we need to consider for DLR Control VMs especially when aggressive timers come into play with ECMP Edges.
If DLR Control VM goes down, the routing adjacency between the DLR Control VM and the ESGs will flap. So the Edges will go ahead and flush their tables and remove all prefixes learnt from the DLR. This will cause N/S traffic to be blackholed.
To mitigate this failure, we will add a static route with higher admin distane with the forwarding address of the DLR as next hop and this route will be redistributed upstream to the physical. In my case, I have a static route which summarizes all my logical networks using /16. So if adjacency between DLR and ESG flaps we still have N-S traffic flowing.
More discussion can be found in the NSX Design guide.
################
The aggressive setting of routing protocol timers as applied to ECMP mode for
faster recovery has an important implication when dealing with the specific failure
scenario of the active control VM. This failure would now cause ECMP Edges to
bring down the routing adjacencies previously established with the control VM in
less than 3 seconds. This means that ECMP Edges flush their forwarding tables,
removing all the prefixes originally learned from the DLR. This would cause the
north-south communications to stop.
To mitigate this failure of traffic, static route with a higher administrative distance
than the dynamic routing protocol used between ESG and DLR are needed. This
configuration is shown in Figure 113.
Read page 133 of DIG 3.0
If routing timers are aggressive for ECMP, we may see a routing flap during a DLR Control VM failure event.
Floating static route if Nimish isnt talking about it
Type7: N2
Type5: E2
The basic difference between a N type as well as E type is that they belong to two different types of LSAs. Type 5 is E and Type 7 is N. An external route (redistributed from another routing protocol, static route or connected route) will be tagged as a Type 5 LSA (E route). This LSA is circulated throughout the OSPF domain except for Stub, Totally Stubby and Not-so-stubby areas.
Stub areas are not allowed to have external routes, which means there should be no ASBR in a Stub area. But if you have a situation where you have to have a ASBR in your Stub area the only solution is to configure the Stub area as Not-So-Stubby Area or NSSA. A route redistributed (from a connected, static or other routing protocol) inside an NSSA area is a Type 7 LSA or N route. This LSA is circulated only within the NSSA area
convert lsa 7 into lsa 5
may propagate summarized logical switch address space only
Now lets do a recap of both the models we saw and summarize them.
First model is Active Standby Model. Here you have a single path per Edge. Keep-alives r exchanged betwene active and standby U have support to synchrozie state so stateful services are supported.
mportant point: Routing timers are default or high and graceful restart is enabled.Synchorizing all service tables.
1) ECMP Model:
If u deploy this model u get 8 paths to physical. So 80 G thruput..
U get fast failover using aggressive timers.
What u lose is ability to enable Stateful services at this ESG as return paths may be assymetrics and break the stateful firewall.
DFW at the workload vNIC/VM level will still work.
@@@@@@@@@@@
#Certain stateful services like NAT and Edge Firewalling will break as the return traffic paths may be asymmetric. Stateful firewall will drop your packets
#Hence, it is recommended not to configure stateful services like NAT, EdgeFirewall on the Edge which is in ECMP mode.
Way more balanced and very fast convergence
Distributed firewall is enforced at the vNIC level so it will work fine.
Lets look at some deployment topologies now.
*******Enterprise topology, optimizes as much E-W traffic as possible by adding as many LIFs on the one DLR instance.
This is a typical enterprise topology.
There’s a single DLR for multiple logical switches /applications to optimize E-W communication for apps.
This doesn’t mean that every VM can communicate with each other. We are using security polices (DFW/microsegmentation)to limit this communication.
For EdgeTopology: In this example, we are using Active-active.
Services like LB are running in One-arm fashion for the Apps.
Goal is BW, fast convergence.
There is no NAT in this topology.
DHCP Relay is supported on DLR
########################
To do: Add another uplink???
Why are there Edges?
- doing physical to logical routing, it’s complex and not easy to automate so we only set it up one time.
DHCP service: is on DLR(first hop).
This is the same enterprise topology with a variation for L2 Bridging. Lets say u have some physical database hosts which are not on the VXLAN logical switch which need L2 connectivity to your workloads. U can use our L2 Bridging feature to bridge Overlay VNI to VLAN.
So how do you configure L2 bridging. U basically configure L2 bridging on the DLR Control VM to bridge a Logical Switch to a VLAN. The hypervisor where the DLR COntroL VM is palced becomes a bridge host and handles the bridging traffic. ControL VM is only for placement.
u still get optimized E-W routing alongwith bridging.
Now lets look at the second topology which is a multi-tenant topology.
Multiple Instances…
If u have multiple tenants or silos, and want to separate them without lots of DFW rules, we can dedicate one DLR instance per Tenant.
Here u have one DLR per tenant. Each DLR has its own transit logical switch to connect to the edges.
Why only 9 tenants to same nsx edge? => because Edge is a VM and a VM is limited to 10 VNIC
U only have 9 vNIcs available on the vNIC to interconnect DLR instances.
Other limitation of this topology: no overlapping IP addresses
Tenant- 3 tier app.
As many tenants on demand dynamically. (Destory create without making any changes to the physical network).
@@@@@@
We can still do ECMP between the Edge and the TOR(active-standby)
Now lets say u have more than 10 tenants u want to support on a single edge(u can basically use what we call the trunk interfaace on the ESG)
Trunk Interface:
We can have multiple sub-interfaces on a single vNIC
200 sub-interfaces
We do not support overlapping IP addresses so we can’t use NAT
NAT is only on Edge-uplink.
Now lets look at a topology where u want to provide services for the tenant but still leverage equal cost multi-pathing for high thruput to the physical network.
#you can have a Dedicated pair of edges per tenant.
Two tier edges allow the scaling with administrative control
High scale multi-tenancy is enabled with multiple tiers of edge interconnected via VxLAN transit uplink
Top tier edge acting as a provider edge manage by cloud(central) admin. Provider edge can scale up to 8 ECMP edges for scalable routing.
Second tier edges are provisioned and managed by tenant
-Based on tenant requirement tenant edge can be ECMP or stateful
-Support for overlapping IP addresses between Tenants connected to different first tier NSX Edges
Now what if you want to stretch logical networks and security policies across multiple vCenters and potentially multiple-sites.
We support notion of universal objects. UTZ, ULS which can stretch L2 over L3 over different mgmt boundaries.
For Distributed Routing acrros VCs/sites we introduced the concept of a Universal DLR Instance which can provide distributed routing across multiple sites. There is also a notion of Local Egress so that Egress out of the logical networks to the physical networks can be chosen based on a Locale ID.
We have 2 full sessions covering this topic which u can attend. But the jist is you have a Universal DLR with Local Egress capabilities.
Edges still have to be deployed per site.
Now lets compare and contrast the various models we saw.
This brings us to the end of our presentation. So lets go over some key take-aways
Different components for Logical Routing.
Diff HA Models
Different Topologies.
More information can be found:
Design session
Think about NSX as a platform, it is not a point product.
Finally, a true platform requires successful participation of a third-party ecosystem. NSX has developed a RICH ecosystem of partners that span across physical to virtual, operations and visibility, app delivery services, and security services categories. This extensible, distributed service platform supports the novel concept of dynamic service chain that provides multiple platform integration points and automates the deployment, orchestration, and scale-out of partner services.