2. WHITE PAPER: ICOS OpEN API —TCN SmartFlow Application
ICOS-WP100-R 2
Introduction
In recent years, datacenters have experienced an explosion in the need to handle ‘data-and-compute’ intensive applications,
pushing the limits of traditional datacenter IP network design. The increase in east-west traffic flows (i.e., between servers
within a datacenter) has motivated the industry to transition from three-tier topologies to Clos topologies. Despite innovation
at the network topology level, datacenter networks still suffer from limitations in the IP routing protocols. Since the early days
of IETF, Dijkstra's shortest-path first (SPF) algorithm has been chosen as the basis for all the routing algorithms. The
weakness of SPF is that traffic is forwarded on the shortest path even when that path is congested and a less congested
alternate path exists.
This white paper suggests an alternative-route approach that addresses the limitations of SPF and describes an elegant
solution that can be implemented on programmable switches such as StrataXGS®
switches running ICOS. This solution is
demonstrated via a simple application using the APIs exposed by programmable switches.
This paper discusses L3 networks as well as layer2-over-layer3 (L2oL3) networks that have recently emerged to replace
traditional datacenter network topologies, with particular emphasis on the Clos network topology. It also addresses questions
such as:
• Is it possible to enhance traditional routers to be able to use other alternate paths to better utilize bandwidth in the
datacenter?
• Will this enhancement allow IP networks to more effectively utilize network resources to meet an application's
performance service-level agreement (SLA), while supporting large bandwidths to carry large volumes of data?
• What would be the performance improvement in a Clos topology if alternate paths could be utilized?
• Is it possible that a simple modification to the Clos network topology to enhance IP shortest-path routing can yield
significantly better results, both in application response time and traffic throughput?
These questions are answered in this paper via an in-depth discussion of the following topics:
• Network Overview
• Traditional IP Routing—Shortest Path
• ICOS OpEN API, including the advantages in the programmability of ICOS switches
• TCN SmartFlow*, including a case study of a real world network load to analyze its effectiveness
Network Overview
IP networks have continued to grow in size, complexity, and customer reach over the last two decades. IP has gained a firm
foothold as the networking paradigm of choice in both enterprise and wide-area networks. IP networking is, generally, the
preferred underlay network infrastructure for overlay networks such as those based on VxLAN (L2oL3). More recently, the
advent of cloud computing and the introduction of ‘big-data’ analytics have made IP networking a critical component of
datacenter network operations. Today's web-scale datacenters have hundreds of thousands of servers hosting a multitude
of web-based applications running as virtual machines (VMs) that can reside on any server in any rack.
The deployment of web applications and database server applications running as virtual machines (VMs), coupled with the
increased use of clustered applications (such as Hadoop) in modern datacenters, has resulted in an increase in east-west
traffic patterns within datacenter networks. East-west traffic includes server-to-server, server-to-storage, and server-rack-to-
server-rack traffic. This trend is changing the design of datacenter network topologies from oversubscribed, tiered, L2
networks to fast, fat, and flat L2oL3 networks. The Clos network topology has become the topology of choice for today's L3
and L2oL3 datacenter networks due to its nonblocking architecture and its support for equal-cost multipath (ECMP) routing.
3. WHITE PAPER: ICOS OpEN API —TCN SmartFlow Application
3ICOS-WP100-R
FIGURE 1: Clos Topology in a Datacenter Using a Pure Layer-3 Network or an L2oL3 Network
Figure 1 shows a Clos network with four top-of-rack (ToR) switches connected as leaf nodes to each of the four spine nodes.
This configuration allows four ECMP paths between any two ToR switches. The availability of multiple ECMP paths and other
smart features of StrataXGS processors are exploited to distribute traffic intelligently and optimally over the four physically
diverse paths. The Clos network topology goes a long way in addressing the demand for high-quality network performance
that meets stringent SLA criteria, but can lead to a focused overload situation, as is discussed in the next section.
Traditional IP Routing—Shortest Path
Shortest-path (SP) routing is the default mechanism employed by all routing algorithms defined in IETF.
Why SP?
IP employs distributed, decentralized processing, wherein each router independently determines how to forward a packet.
SP routing emerged as a natural choice in this framework to make optimal use of network resources while avoiding network
loops; if each router forwards the packet on the shortest possible path to the destination, then a routing loop is guaranteed
not to occur.
Can We Do Better than SPF?
Big-data applications such as Hadoop require meeting stringent SLA performance demands, primarily focused on latency
but also concerned with packet loss and retransmission. Adding to the challenge, the volume of application data has
increased by several orders of magnitude, so that today, petabytes of data are routinely processed within a few hours. The
dual requirements of handling large bandwidth and meeting stringent SLA performance criteria are significantly stressing IP
network capabilities. One of the main reasons that IP networks have difficulty meeting these requirements is the lack of
flexibility afforded by the shortest-path routing mechanism employed within IP networks.
...
...
L3 ECMP-based meshed/Clos physical network
4. WHITE PAPER: ICOS OpEN API —TCN SmartFlow Application
4ICOS-WP100-R
FIGURE 2: Focused Overload in a Clos Network
Figure 2 shows the Clos network topology from Figure 1 with a focus on a single spine node, Node 3. The spine node has
four bidirectional links, one from each leaf node. We refer to the leaf-node-to-spine-node direction as the uplink and the
spine-node-to-leaf-node direction as the downlink. Under large east-west traffic volumes, for a small time lasting a few
hundred milliseconds to a few seconds, several uplinks can feed traffic to a single downlink, causing severe congestion on
that downlink. Figure 2 shows the congested downlink from Node 3 to ToR 4 in red. Despite the Node 3 – ToR 4 congestion,
under traditional SP routing, Node 3 will continue to forward all traffic to ToR 4 on the congested link, since that is the only
allowed next hop. This requirement causes significant performance degradation for all applications sending traffic on that
link.
If other paths were available from Node 3 to ToR 4, even if longer than the shortest path, and a distributed routing mechanism
were used that can exploit the longer paths while avoiding loops, then this performance degradation could be significantly
alleviated. This type of mechanism would increase network link utilization, significantly improve end-to-end application
performance, and lead to substantial CapEx/OpEx savings. This is the capability TCN SmartFlow affords.
ToR 1 ToR 2 ToR 3 ToR 4
Spine Node 3
...
...
L3 ECMP-based meshed/Clos physical network
5. WHITE PAPER: ICOS OpEN API —TCN SmartFlow Application
5ICOS-WP100-R
ICOS OpEN API
ICOS Introduction
ICOS is a fully hardened networking OS specifically designed to run on Broadcom® StrataXGS switching silicon for
datacenter applications, including spine/leaf deployments. ICOS supports traditional management options including an
industry-standard command line interface, SNMP, and Linux server ecosystem tools such as Chef, Puppet, and Linux shell.
ICOS runs on several control plane processors such as MIPS, PowerPC, and x86, including Open Compute Project (OCP)
platforms. ICOS is designed to be a network OS running as a service on stock server-class Linux operating systems such
as Ubuntu. ICOS is a full-featured networking OS for datacenters, and includes advanced layer-2 features such as MLAG,
layer-3 features such as VRF-Lite, and SDN features such as VxLAN, OpenFlow, and OpEN API.
Extending ICOS Functionality Using the OpEN API
ICOS OpEN API provides programmable APIs that enable independent software vendors to extend the functionality of ICOS.
OpEN API is also available via RESTful APIs to be called from a remote system. The OpEN API feature provides an interface
for third-party applications running in a separate Linux process on the same CPU, or on a remote CPU, to access ICOS
control and status information. OpEN API includes APIs to set and get switch user configuration, monitor and change the
switch operational state, and receive notifications of events generated by ICOS. The third-party applications can be
implemented as Python/Ruby scrips or as 'C' applications.
While OpEN API interfaces are used by some applications in ICOS, it is primarily intended for third-party partners such as
TransCloud Networks.
To facilitate application development, OpEN API provides the following:
• APIs: Programming interfaces to configure and manage ICOS components and receive event notifications.
• Application Development Kit (ADK): Toolkit to allow developers to build their applications using OpEN APIs.
• CLI: Commands to download, install, enable, and monitor applications.
Figure 3 shows the architectural blocks and interfaces that make up the ADK.
6. WHITE PAPER: ICOS OpEN API —TCN SmartFlow Application
ICOS-WP100-R 6
FIGURE 3: ICOS, OPEN API and Third-Party Application Ecosystem
TCN SmartFlow*
TCN SmartFlow solves the downlink congestion problem by modifying the Clos network topology to have interconnecting
spine nodes, so that several paths (that are longer than the shortest path) are available to reach ToRs from the spine nodes.
TCN SmartFlow also provides a method of extending the IP shortest-path routing capability so that the longer paths can be
used without the risk of looping. The TCN SmartFlow solution is completely standards-compatible and is backward-
compatible with all versions of IP and all IP routing protocols (OSPF, IS-IS, RIP, etc.).
Modifying Clos for Better Link Utilization
To run SmartFlow on Clos network topologies, a modification is necessary to allow longer paths to be created from spine
nodes to leaf nodes. This is achieved by interconnecting spine nodes. SmartFlow establishes pre-computed longer paths
that can be activated on-demand to alleviate network congestion caused by the heavy data loads. The longer uncongested
path from Node 3 to ToR 4 is shown in yellow in Figure 4. Making these additional paths available improves network
utilization to more effectively meet an application's stringent performance requirements, while supporting large bandwidths
for carrying large volumes of data.
Linux Kernel Space
Linux User Space
ICOS
CLI
Third Party Apps
Python/Ruby
Scripts
Driver (SDK) Route Table
Hardware
Switching Silicon
Tables
C Applications
Application
Development Kit
Driver (SDK
Switch
Application RPC
* SmartFlow is a registered trademark of TransCloud Networks.
7. WHITE PAPER: ICOS OpEN API —TCN SmartFlow Application
7ICOS-WP100-R
FIGURE 4: The Modified Clos to Showcase the SmartFlow Application
TCN SmartFlow/ICOS OpEN API Schematic
TCN SmartFlow runs as an application on ICOS and performs real-time rerouting of flows in response to link congestion.
StrataXGS provides excellent visibility into detecting the congestion by performing:
• Fine-grained link congestion monitoring on intervals on the order of a few milliseconds.
• Real-time flow monitoring to detect flow start, flow end, and flow expiry conditions.
• Real-time rerouting of all new flows (or a specified subset thereof) from a congested link onto a pre-computed alternate
link that is guaranteed not to cause loops. To compute the alternate path, SmartFlow obtains the necessary topology
and SP routing information from the IP control plane. Figure 5 provides a schematic depiction of how the SmartFlow
application (TCNSF) interacts with ICOS and the switch silicon.
FIGURE 5: Schematic Depiction of SmartFlow Running on ICOS
...
...
L3 ECMP-based meshed/Clos physical network
SmartFlow Rerouting
Around Congested Link
Congested Link (Last
Hop)
ICOS TCNSF
ADK
OpEN API
Linux
LB9A QuantaBCM56530
Software
Switch
Platform
TCN SmartFlow Prototype Software
Architecture Schematic
8. WHITE PAPER: ICOS OpEN API —TCN SmartFlow Application
ICOS-WP100-R 8
TCN SmartFlow POC Results
Figure 6 shows the six-node Clos network used to conduct a proof-of-concept study of SmartFlow. All links are 1 Gbps links,
and the network is ECMP enabled. This network allows a realistic emulation of congestion scenarios that occur in very large
web-scale datacenter networks. The network supports a Hadoop cluster that runs a data ingestion job, which downloads a
1 GB file from each client node C1–C7 to each data node D1–D7. There are three ECMP paths available to carry the traffic
between C1–D1 and C2–D2. Only a single shortest-path link, the ICOS2–ICOS4 link, is available for traffic between C3–D3,
C4–D4, C5–D5, C6–D6, and C7–D7; hence, under traditional SP routing, this link will experience significant congestion,
causing degradation in application performance and traffic throughput. With SmartFlow running on ICOS2, however, traffic
can be rerouted onto the ICOS2-SSE link whenever the primary SP link ICOS2–ICOS4 is congested, leading to significant
improvement in application performance and throughput.
FIGURE 6: Proof of Concept Network Diagram
In Figure 7, the Hadoop response time is plotted as a function of the number of clients actively downloading data on the
congested link. It can be seen that SmartFlow:
• Affords approximately 30% improvement in completion time for the Hadoop data ingestion job.
• Allows, for the same response time, carrying about 40% more traffic.
Thus, the SmartFlow application running on ICOS OpEN API, represents an effective, novel, method for addressing internet
performance bottlenecks caused by IP shortest-path routing.
ICOS1 ICOS4 ICOS3
SSE ICOS2 ICOS5
L3 ECMP-based meshed/Clos physical network
C3
C5
C4 C6
C7
C1
D1
D2
D3 D4 D5
D6
D7
C2
9. WHITE PAPER: ICOS OpEN API —TCN SmartFlow Application
9ICOS-WP100-R
FIGURE 7: Proof-of-Concept Results
Summary
SmartFlow is an elegant solution that is implemented on programmable switches such as StrataXGS. It is a control plane
application that runs on ICOS OpEN API and extends the functionality of ICOS. SmartFlow demonstrates the usefulness of
the OpEN API and how it is enabling innovations in datacenter network traffic handling.
SmartFlow is a fundamental enhancement to IP shortest-path routing that allows new IP flows to be rerouted in real time
when the first packet of a flow encounters a congested link. Greater optimization is achieved by rerouting traffic away from
a congested link onto a pre-computed alternate routing link. Sophisticated feedback controls prevent congestion from
spreading. The pre-computation is performed in a distributed loop-free manner and only requires information provided by
any conventional IP routing protocol such as OSPF, IS-IS, or RIP. ICOS and StrataXGS silicon-based IP networks can use
SmartFlow to more effectively utilize network resources to meet an application's performance SLA while supporting large
bandwidths to carry large volumes of data.
A proof-of-concept study performed using a six-node ECMP-enabled Clos network (to emulate a Web 2.0 datacenter
network) shows that SmartFlow affords: i) approximately 30% improvement in completion time for a Hadoop data ingestion
job, and ii) for the same response time allows about 40% more traffic to be carried. This is expected to translate into
significant CapEx and OpEx savings. SmartFlow is backward-compatible with all versions of IP and works with all IP routing
protocols; it does not require a fork-lift upgrade and can be deployed incrementally in a datacenter to reap the benefits of
reduced CapEx and OpEx.
0
10
20
30
40
50
60
1 2 3 4 5
Av. Response Time
(ms) w/o Re-routing
vs. # of Hadoop Clients
Av. Resoponse Time
(ms) with Re-routing
vs. # of Hadoop Clients
Avg. response time (ms)
without rerouting vs.
number of Hadoop clients
Avg. response time (ms)
with rerouting vs.
number of Hadoop clients