Approaches to Designing a High-Performance Switch Router

Metanoia, Inc.
Critical Systems Thinking™

Approaches to Designing a
High-Performance Switch
Router
Dr. Vishal Sharma
Principal Consultant
Metanoia, Inc.
Phone: +1 408-955-0910
Email: v.sharma@ieee.org
Web: http://www.metanoia-
© Copyright 2002
All Rights Reserved inc.com

Metanoia, Inc.

Classification of Switch Architectures
 1st gen. – shared-bus based
 Bus-based with central memory, centralized processing

 2nd gen. – advanced shared-bus based
 Bus-based with local memory, distributed processing

 3rd gen. – interconnection fabric w/ multiple parallel paths
 Crossbar or cross-point switch, rings, …

 4th gen. – distributed switch
 Interconnect smaller, ASIC-based 1st, 2nd, or 3rd generation switches
in a regular topology
 Centralized, high-perf. switch core, with distributed line cards

©Copyright 2002,
All Rights Reserved Designing a High-Performance Switch Router 2

Metanoia, Inc.
Switch Architectures: Shared-bus Critical Systems Thinking™

with Central Memory

e
plan
B ack
DMA DMA
Memory
CP
U
M
em 1 3
or
y
Lin
Li e
ne Ca
Ca rd
rd
1
2 2
4

DMA DMA
CPU

LC1 R LCN R

Without DMA a packet crosses bus 4 times
(2 times with DMA)

©Copyright 2002,

Metanoia, Inc.

with Central Memory
 Blocking if bus b/w or CPU processing < 4.N.R (2.N.R w/ DMA)

 Delay: function of memory I/O speed and CPU processing

 Throughput: upper-bounded by min(bus speed, CPU power)
 Most commercial Ethernet switching platforms -- 1-2 Gb/s backplane
 The most expensive backplanes today could yield up to 20 Gb/s

©Copyright 2002,

Metanoia, Inc.

with Central Memory
Example: Cisco Catalyst 2820 Ethernet Switch (also 1900 family)
 24 10BaseT and 2 100BaseT full-duplex ports (on 2820)
⇒ 440 Mbps x 2 = 880 Mbps min. bus throughput required
 Bus bandwidth : 1 Gb/s
 CPU: Intel 486 with 1 MB of flash
 Central memory: 3 MB of RAM

 Observations:
 10 Mbps ports ⇒ Require 20 Kpps/port for 64B packets
 Available: 14.8 Kpps per port

 Require: 880 Kpps aggregate forwarding perf. (Ethernet + Fast Eth.)
 Available: 450 Kpps

⇒ Performance is CPU limited (not bus bandwidth limited)
 Latency: ~70 us

©Copyright 2002,

Metanoia, Inc.
Switch Architectures: Shared-bus, Critical Systems Thinking™

Distributed Memory & Processing

Buffering
Routing/
kp lane
Bac DMA Lookup DMA
Memory
CP
DM U
A
M
em
Fast Path
or
y
L in
Li e
ne Ca
r
2
Ca
rd
d
2 1
1
Slow Path

Buffering CPU

Full Routing
LC1 R LCN R
Function

©Copyright 2002,

Metanoia, Inc.

with Distributed Memory
 Blocking if bus b/w or CPU processing < 2.N.R (N.R with DMA)

 Delay: function of memory I/O speed and CPU processing

 Packet forwarding via dedicated engines, one per line card (LC)
 Allows line rate forwarding, even with small packets
 Enables design parameter adjustment based on LC type

 Throughput: upper-bounded by min(bus speed, forwarding engine)

©Copyright 2002,

Metanoia, Inc.

with Distributed Memory
Example: 3Com CoreBuilder 5000 Switching System
 17 slot/chassis, 24 10BaseT’s/slot or 4 100BaseT’s/slot (or port)
⇒ 17x24x10 = 4.08 Gb/s minimum bus throughput required!
 Bus bandwidth: 2 Gb/s ⇒ max. 3.9 Mpps @ 64B/packet
 CPU + 18MB DRAM: for address learning, fragmentation, SPT algorithm
 Packet switching:custom ASIC + 4MB DRAM per slot: for forwarding, filtering

 Observations:
 Require: 480 Kpps/slot (Eth.) or 800 Kpps/slot (Fast Eth.)
 Available: 650 Kpps per switching ASIC
⇒ Performance here is bus-bandwidth limited (not forwarding limited)
 Latency: ~45-100 us
 Jitter: ~ 5 us

©Copyright 2002,

Metanoia, Inc.
Switch Architectures: Inter-connect Critical Systems Thinking™

Fabric with Multiple Parallel Paths
e
plan I/F
CPU Memory
Back

Fo Full Routing
rw
I/F ar
di Function
ng
Sw Li
itc ne
Fo h Ca
rw In
ar te rd
d rc 1
CP ing on
U Li n ne
ct
e
M Ca
em rd
or N
y

Interconnect
Forwarding

I/F I/F
Sw
itc
Sw he
planInte I/F
itc
h Mid rc Local
In on
te ne Memory
rc ct
on Fo
n
I/F e rw
Sw ct ar
din
itc g
h
In Fo Lin
te e MAC MAC
rc rw Ca
on ar
ne di
ng
rd
ct CP 1
U Lin
e
LC1 LCN
M Ca
em rd
or N
y
©Copyright 2002,

Metanoia, Inc.
Switch Architectures: Inter-connect Critical Systems Thinking™

Fabric with Multiple Parallel Paths
 Non-blocking (for unicast) if crossbar or shared memory with
adequate bandwidth (2NR)

 Delay: 10s of us (in an unloaded system)

 Throughput: full line rate, subject to queueing discipline
 Provided LC processing & interconnect scheduling keep up
 Note that this is not always the case!

 Applicability: state of the art for many current switches/routers
 Cisco GSR 12000 family (high-end, core router 98-99), Ascend GRF
(mid-end router, 96-97), Cisco Catalyst 8500 (low-end, enterprise router
97-98),
©Copyright 2002,

Metanoia, Inc.

Switch Architectures: Distributed Switch
Route Processor
with Memory
 Interconnect smaller switches, RP Mem
1st, 2nd, or 3rd
each with the architecture of a gen. switch
1st, 2nd, or 3rd generation switch.

 The smaller switches are
usually ASIC based

 Connected in a specific
topology, such as a hypercube
or mesh (more on this ahead)

Distributed
interconnect
©Copyright 2002,

Metanoia, Inc.

Switch Architectures: Distributed Switch
Electrical or Optical
Connections
Switch Core
Line Card 1 Line Card 1

Line Card 2 Line Card 2

Line Card N Line Card N

RP Mem

 Centralized, high-performance switch core, with distributed line cards
 Switch core and line cards may be in different chassis
 Interconnect composed of optical or electronic links
©Copyright 2002,

Metanoia, Inc.
Functional Map of Processing in a Critical Systems Thinking™

Typical IP Router
To Route
Processor Lookup Buffer/State
uP
Tables Memory

To Fabric
Input Lookup Traffic Fabric
Framing Engine Manager I/F
O/E
Packet Processing
Physical
Layer

E/O From
Output Link Fabric Fabric
Framing Scheduler I/F

Buffer/State
Memory
©Copyright 2002,

Metanoia, Inc.
A Canonical Realization of the Critical Systems Thinking™

Functional Map
Lookup Table
To Route Buffer Memory
SDRAM Co-
Processor
LCP Proc. DRAM
PC
I
3.125 Gb/s
SERDES
Input SPI-4 Network Traffic Fabric
Framer Proc. Manager I/F
Trans- SFI-4
ceiver Packet Processing Switch
Fabric
Trans-
ceiver
Output Traffic Fabric
Framer Manager I/F

Buffer/State
Memory
©Copyright 2002,

Metanoia, Inc.

Juniper M40 and M160: A Comparison
M40 M160
Throughput 20 Gb/s 80 Gb/s
Processing @ 40 Mpps 160 Mpps
64B packets (1 pkt. proc.) (4 pkt. procs.)
Back/mid-plane
25.6 Gb/s 102.4 Gb/s
(full duplex)
Data Slots 8 (4 ports/slot) 8 (4 ports/slot)
Data Ports (max.) 8 OC-48 8 OC-192
Power (max.) 1.7 KW 3.4 KW
Weight 280 lb 370 lb
Size Half telco rack Half telco rack

M40 M160 Dimensions
35x19x23.5 35x19x29
(HxWxD in.)

©Copyright 2002,

Metanoia, Inc.

Juniper M-Series System Architecture
Routing
User
Routing Engine Process JUNOS Router OS
Interface
(CPU-based) (routing & signaling
protocols, system
Chassis Routing Interface management)
Mgmt. Table Mgmt.

Computer-scale ASIC-
Forwarding Engine based centralized
(ASIC-based) Packet packet processor
Processing
Line Card Forwarding Line Card
Table

Packets In Packets Out
Line Card Line Card
Switch Fabric

©Copyright 2002,

Metanoia, Inc.
Juniper M-Series Functional System Critical Systems Thinking™

Operation
Forwarding
Table
Internet
Processor II ASIC

Notification
4a 6
Distributed Buffer Distributed Buffer
Manager ASIC Manager ASIC
Backplane or 4b
Midplane 7
5 8
64B Blocks
3

FPC Shared Memory FPC
Controller (distributed on FPCs)
ASIC
Input Port Output Port
1 2 9 10
I/O Manager I/O Manager
ASIC ASIC
PIC PIC
Packets
Packets

©Copyright 2002,

Metanoia, Inc.

Juniper M-Series Module Organization
100 Mb/s JUNOS Internet S/W
Ethernet
Routing Engine
Misc. Control Subsys.
Control Plane
Data Plane
#4
FPC #8

3.2 Gb/s #2
FPC #2 full duplex #1
Switching &
FPC #1 Forwarding Module

Packet
128 Director Distributed uP
MB Buffer Mgr.
#2 PCI
Cntlr. #1 I/O
PIC Manager #1

FT

Cntlr. #4 I/O
Manager Internet
PIC Proc. II
12.8 Gb/s
full duplex

©Copyright 2002,
M160 Midplane (204.8 Gb/s)
M40 Backplane (51.2 Gb/s)

Metanoia, Inc.

Cisco Catalyst 6000 Family: A Comparison
6009 6513
Throughput
32 Gb/s 128 Gb/s
(non-blocking)
Processing @
15 Mpps 100 Mpps (?)
256B packets
128 Gb/s (switch)
Back/mid-plane 32 Gb/s (bus)
32 Gb/s (bus)
Data Slots† 8 10
Data Ports (max.) 128 GbE†† 128 GbE
6000 Family
Power (max.) ~1.3 KW > 2.5 KW
Weight ~166 lb 240 lb
Size >1/3 telco rack ~Half telco rack
Dimensions
25.2x17.2x18.1 33.3x17.2x18.1
(HxWxD in.)

† Only includes usable data slots

† † This number of max. ports means an
oversubscription of 4x (so not non-blocking!)
6513
©Copyright 2002,

Metanoia, Inc.

Cisco Catalyst Family System Architecture
Supervisor Engine

Management Engine Network
Management

Routing Engine Routing
MSFC
(CPU-based) Table Control Plane

Forwarding Engine Forwarding Data Plane
Table PFC
(CPU-based)

Bus

Line Card Line Card

First Generation of Catalyst: Catalyst 6000
©Copyright 2002,

Metanoia, Inc.
Cisco Catalyst Family Functional Critical Systems Thinking™

System Operation
Supervisor
Engine 4
3 Data Bus 32 Gb/s

PFC

Fabric Results Bus
Arbitration
Control Bus

5 5
MSFC

2
Network
Management

#1 64KB 64KB #1
Controller Controller
ASIC ASIC
1 6
#4 #4
448KB 448KB

©Copyright 2002,

Metanoia, Inc.

Cisco Catalyst 6500 System Architecture
Supervisor Engine

Management Engine Network
Management

Routing Engine Routing
Table
MSFC
(CPU-based) Control Plane

Forwarding Engine Forwarding Data Plane
Table PFC
(ASIC-based)

Bus

Headers

Line Card Line Card

Data

Switching Second Generation of Catalyst:
Fabric Catalyst 6500
©Copyright 2002,

Metanoia, Inc.
Cisco Catalyst 6500 Functional Critical Systems Thinking™

System Operation
5
4 Data Bus 32 Gb/s
PFC
Fabric 6
Arb. Results Bus
Control Bus
MSFC
6
Network
Mgt. Line Card
3 Line Card
Supervisor 512KB 512KB
ASIC
Engine
#1
Fabric I/F Fabric I/F
1 ASIC
#1

8
2
7
#1 8 Gb/s 9
ASIC ASIC
#4 #4
#4
16 Gb/s

Switching
Second Generation Fabric
Catalyst: Catalyst 6500
©Copyright 2002,

Metanoia, Inc.

Cisco Catalyst 6500 System Architecture
Supervisor Engine
Third Generation of Catalyst:
Management Network
Catalyst 6500+
Engine Management

Routing
Routing Engine Table
MSFC

Control Plane

Data Plane

Forwarding Forwarding
Engine Engine
Packets In Packets In

PFC PFC

Line Card
Packets Out Packets Out

Switching
Fabric

©Copyright 2002,

Metanoia, Inc.
Cisco Catalyst Family Functional Critical Systems Thinking™

System Operation
Line Card Line Card

512KB 6 512KB
ASIC ASIC
Fabric I/F Fabric I/F #1
1 #1

8
2 7
5

3 9
#1 #1

ASIC DFC DFC ASIC
#4 #4
#4 #4
4

Supervisor Engine

Network
Mgt.

MSFC
Switching
Fabric
Fabric Arb.

Third Generation of
©Copyright 2002,
All Rights Reserved Designing a High-Performance Switch Router Catalyst: Catalyst 6500+ 25

Metanoia, Inc.
Building Very High-Speed Switches Critical Systems Thinking™

from Low-speed Components
Input Output
Scheduler
Queues Queues
Input VOQ Output
Links 1,1
OQ Links
1
1 1 1 1
VOQ 1
1,N 2 2

VOQ
N N,1
OQ
N N
N
N
VOQN, N
Virtual Output
Queues Switch Fabric

 Problem: scale this architecture to handle higher link speeds
 Emulate output queueing
 Provide some measure of perf., such as bounded delay
©Copyright 2002,

Metanoia, Inc.
Building Very High-Speed Switches Critical Systems Thinking™

from Low-speed Components
Global
 Operate parallel switches s. t. they Scheduler
collectively mimic an OQ switch
 Requires
 Speedup in the system D1 M1
S1
 Emulation of shadow OQ switch 1 1

 Iyer, Awadallah & McKeown
Mj
j
Di
i
 Operate parallel switch system
under control of a global scheduler
DN MN
 Requires N N
 No speedup in the system Sk
 No reordering at outputs
 Mneimneh, Sharma & Siu
Input Parallel Output
Demultiplexers Switches Multiplexers
©Copyright 2002,

Metanoia, Inc.
Building Very High-Speed Switches: Critical Systems Thinking™

References
 [SAN00] S. Iyer, A. Awadallah, N. McKeown, ““Analysis of a packet switch with
memories running slower than the line rate,” Proc. IEEE Infocom’00, March 2000.

 [Sun00] S. Iyer, “Analysis of a packet switch with memories running slower than
the line rate,” MS Thesis, Stanford University, May 2000.

 [SuM03] S. Iyer, N. McKeown, “Analysis of the parallel packet switch architecture,”
to appear IEEE/ACM Trans. on Networking, April 2003.

 [MSS01] S. Mneimneh, V. Sharma, K. Y. Siu, “On scheduling using parallel input-
output queued crossbar switches with no speedup,” Proc. IEEE Workshop on High
Performance Switching & Routing (HPSR’01), May 2001.

 [MSS02] S. Mneimneh, V. Sharma, K. Y. Siu, “Switching using parallel input-output
queued switches with no speedup,” IEEE/ACM Trans. on Networking, vol. 10, no. 5,
Oct. 2002.

 [Mne02] S. Mneimneh, “Algorithms for high-speed switching and routing,” Ph.D.
Thesis, MIT, June 2002.
©Copyright 2002,

Approaches to Designing a High-Performance Switch Router

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (16)

Similar a Approaches to Designing a High-Performance Switch Router

Similar a Approaches to Designing a High-Performance Switch Router (20)

Más de Vishal Sharma, Ph.D.

Más de Vishal Sharma, Ph.D. (20)

Último

Último (20)

Approaches to Designing a High-Performance Switch Router

Notas del editor