2. 1/44
Contents
The Conventional Architecture & Problem
The New Architecture
The Monsoon Architecture
The VL2 Architecture
The SEATTLE Architecture
The PortLand Architecture
The TRILL
Related Works
Summary
The CDCN(Cloud Data Center Network) Architecture Proposal
Trend
4. 3/44
Confidential
The Problems of a Conventional DC
Ethernet is hard to scale out
- STP
- Broadcast (ARP, RARP, DHCP…)
- Packet Floods in Switch (for Mac Learning)
Fragmentation of resources
No Performance Isolation
Poor server to server connectivity
Need very high reliability near top of the tree (Single Point of Failure)
5. 4/44
Confidential
The Problems of a Conventional DC
Fragmentation of Resources
- VLANs used to isolate properties from each other
- IP addresses topologically determined by ARs
- Reconfiguration of IPs and VLAN trunks
• painful, error-prone, slow, often manual
6. 5/44
Confidential
The Problems of a Conventional DC
No Performance Isolation
- VLANs typically provide only reachability isolation
- One service sending/receiving too much traffic hurts all services sharing its
subtree
7. 6/44
Confidential
The Problems of a Conventional DC
Poor server to server connectivity
- Data centers run two kinds of applications:
• Outward facing (serving web pages to users)
• Internal computation
- 70~80% of the packets stay inside the data center
10. 9/44
Confidential
The Monsoon Architecture
Monsoon
- A new network architecture, which scales and commoditizes data center networking.
Abstract
- Scale-out instead of Scale-up
- A single large Layer 2 domain
- Using programmable commodity layer 2 switches and servers.
- Hierarchy has 2:
• TOR(Top-Of-Rack) Switch => Access Switch
• LB(Load Balancing) Switch => Core Switch
- Scale to 100,000 servers or more.
11. 10/44
Confidential
The Monsoon Architecture
Objectives
- Low-Cost & Scale-out
- Uniform high capacity
• Capacity between two servers limited only by their NICs
• No need to consider topology when adding servers
- Performance isolation
• Traffic of one service should be unaffected by others
- Layer-2 semantics
• Flat addressing, so any server can have any IP address
• Server configuration is the same as in a LAN
• Legacy applications depending on broadcast must work
12. 11/44
Confidential
The Monsoon Architecture
Server-to-Server Forwarding
- An Example Monsoon Topology (Clos Network)
• A scale-out design with broad layers
- Same bisection BW at each layer -> no oversubscription
- Extensive path diversity -> Graceful degradation under failure
SWITCH Up-link Port Down-link Port #
Inter. SW N/A 10Gbps X 144 72
Aggr. SW 10Gbps X 72 10Gbps X 72 144
TOR SW 10Gbps X 2 1Gbps X 20 5,184
13. 12/44
Confidential
The Monsoon Architecture
Clos Network Topology
- A Multistage(ex. 3-stage) switching network.
- The advantage
• The connection between a large number of input and output ports can be made by
using only small-sized switches.
• It can be shown that with k ≥ n, the clos network can be non-blocking like a crossbar
switch.
- Clos Theorem: If K >= 2n-1, then a new connection can always be added
without rearrangement
14. 13/44
Confidential
The Monsoon Architecture
Server-to-Server Forwarding
Valiant Load Balancing
• Every flow “bounced” off a random intermediate switch
• Probably hotspot free for any admissible traffic matrix
• Servers could randomize flow-lets if needed
16. 15/44
Confidential
The Monsoon Architecture
Server-to-Server Forwarding
- Encapsulation used to transfer complexity to servers
• Commodity switches have simple forwarding primitives
• Complexity moved to computing the headers
- Encapsulation available
• IEEE 802.1ah defines MAC-in-MAC encapsulation
Frame processing when packets go from one server to another in the same data center.
17. 16/44
Confidential
The Monsoon Architecture
Server-to-Server Forwarding
- Data center OSes already heavily modified for VMs, storage, etc.
• A thin shim for network support is no big deal
- Applications work with Application Addresses
• AA’s are flat names; infrastructure addresses invisible to apps
- No change to applications or clients outside DC
The networking stack of a host.
The Monsoon Agent looks up remote IPs in the central directory.
Monsoon
Agent
18. 17/44
Confidential
The Monsoon Architecture
External Connection & Full Topology(Example)
- Routers do not support the Monsoon functions
- Ingress Server with each Access Router
• Implements the Monsoon functionality and acts as a GW to the DC.
• Two Interface : AR & TOR switch
• Default GW
ARAR AR AR ···
Ingress
Server
···Ingress
Server
Ingress
Server
Ingress
Server
21. 20/44
Confidential
The VL2 Architecture
VL2 uses
- flat addressing to allow service instances to be placed anywhere in the network
- Valiant Load Balancing to spread traffic uniformly across network paths
- end system-based address resolution to scale to large server pools without introducing
complexity to the network control plane.
Objectives
- Uniform high capacity
- Performance isolation
- Layer-2 semantics
Topology
- Low-cost switch into a Clos topology.
• Traffic Engineering
- Valiant Load Balancing
22. 21/44
Confidential
The VL2 Architecture
Building on proven networking technology
- Link-state routing
• To maintain the Switch-level topology
• Not end hosts’ information
- ECMP to enable VLB
Separating names from locators
- Hosting any service on any server.
- Addressing scheme
• AAs(Application-specific Addresses) & LAs(Location-specific Addresses)
• Directory system: mapping between names and locators.
• VL2 agent (in Host) : 2.5Layer, invokes the directory system’s resolution service.
Embracing end-system
- VL2 agent in host
25. 24/44
Confidential
The VL2 Architecture
Potential issue for both ECMP and VLB
- transient congestion on some links.
- it can change the hash used to create the source address periodically or
whenever TCP detects a severe congestion event (e.g., a full window loss) or an
Explicit Congestion Notification.
- Switches today only support up to 16-way ECMP, with 256-way ECMP being
released by some vendors this year.
- Some inexpensive switches cannot correctly retrieve the five-tuple values when
a packet is encapsulated with multiple IP headers. Thus, the agent at the source
computes a hash of the five-tuple values and writes that value into the source
IP address field, which all switches do use in making ECMP forwarding
decisions.
26. 25/44
Confidential
The VL2 Architecture
Discussion
- Cost & Scale
• the VL2 topology can scale to create networks with no oversubscription.
• switches with 144 ports (D = 144) are available today for $150K.
• switches with 24 ports (D = 24) are available today for $8K.
• Building a conventional network with no oversubscription would cost roughly 14× the
cost of a equivalent VL2 network with no oversubscription.
28. 27/44
Confidential
The SEATTLE Architecture
Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises.
- In SIGCOMM, 2008.
Flat addressing of end-hosts
- Switches use hosts’ MAC addresses for routing
- Ensures zero-configuration and backwards-compatibility
Automated host discovery at the edge
- Switches detect the arrival/departure of hosts
- Obviates flooding and ensures scalability
Hash-based on-demand resolution
- Hash deterministically maps a host to a switch
- Switches resolve end-hosts’ location and address via hashing
- Ensures scalability
Shortest-path forwarding between switches
- Switches run link-state routing to maintain only switch-level topology (i.e., do
not disseminate end-host information)
- Ensures data-plane efficiency
32. 31/44
Confidential
The PortLand Architecture
Add a new host
Transfer a packet
Key features
- Layer 2 protocol based on tree topology
- PMAC encode the position information
- Data forwarding proceeds based on PMAC
- Edge switch’s responsible for mapping between
PMAC and AMAC (Rewriting)
- Fabric manger’s responsible for address resolution
- Edge switch makes PMAC invisible to end host
- Each switch node can identify its position by itself
- Fabric manager keep information of overall topology.
Corresponding to the fault, it notifies affected nodes.
- PMAC(48bits): pod(16).position(8).port(8).vmid(16)
34. 33/44
Confidential
The TRILL
TRILL: Transparent Interconnection of Lots of Links
- TRILL is a new standard protocol to perform Layer 2 bridging with IS-IS link state routing
technology.
A simple idea
- Encapsulate native frames in a transport header providing a hop count.
- Route the encapsulated frames using IS-IS.
- Decapsulate the native frame before delivery.
Definitions
- RBridge - Routing Bridge
• A device which implements TRILL
- RBridge Campus
• A network of RBridges, links, and any intervening bridges, bounded by end stations/layer 3
router.
35. 34/44
Confidential
The TRILL
Encapsulation & Header
TRILL Header – 64 bits
Nicknames - auto-configured 16-bit campus local names for RBridges
V = Version (2 bits)
R = Reserved (2 bits)
M = Multi-Destination (1 bit)
OpLng = Length of TRILL Options
Hop = Hop Limit (6 bits)
38. 37/44
Confidential
Related Works
OpenFlow
- Shares idea of simple switches controlled by external SW
- Monsoon & VL2 is a philosophy for how to use the switches
Brocade: Brocade One (TRILL, Clos Net, DCB)
Cisco: FabricPath (TRILL)
Juniper: Qfabric (HW & FC)
39. 38/44
Confidential
Summary
Comparison of the Data Center Network Architecture
Monsoon VL2 SEATTLE FAT-TREE PortLand SPAIN
MOOS
E
TRILL Dcell Bcube MDCube
Org. MS Research
Univ. of
Princeton
Univ. of California
San Diego
HP
Univ. of
Cambrid
ge
MS Research Asia
Publishing
SIGCOMM
2008
SIGCOMM
2009
SIGCOMM
2008
SIGCOMM
2008
SIGCOMM
2009
NSDI 2010
DC CAVE
S Works
hop
2009
RFC 5556
2009
SIGCOMM
2008
SIGCOMM
2009
CoNEXT
2009
Authors
Albert
Greenberg…
Albert
Greenberg,
Changhoon
Kim…
Changhoon
Kim…
M. Al-Fares…
R.N.
Mysore…
J. Mudigon
da,
M. Al-Fare
s…
M. Scott
…
Radia
Perlman
C. GUO… C. GUO…
H. Wu,
C. GUO…
Topology Clos Network Clos Network N/A Fat-Tree Fat-Tree N/A N/A N/A
Bcube Topo
logy
Packetizing
MAC-in-MAC
(802.1ah PBB)
IP-in-IP IP-in-IP(?) IP rewriting
MAC
rewriting
(PMAC)
MAC
rewriting
TRILL Hdr
Load
Spreading
MAC-Rotation ECMP ECMP ECMP ECMP
Multi-path O O X O O O X O
Mod. of
End-Host?
O O X X X O X X O
Mod. of
switches?
O X O
O
(Special HW)
O
(Special
HW)
X
O
(Rbridge)
△
ARP
Directory
Server
Directory
Server
DHT
on
the switches
Fabric
Manager
ESADI