SlideShare a Scribd company logo
1 of 32
Download to read offline
© 2010 Voltaire Inc.
November 19, 2010
Unified Fabric Manager Overview
Ghislain de Jacquelot
© 2010 Voltaire Inc. 2
Voltaire Software Portfolio
Robust RDMA Drivers
Fabric provisioning and
performance monitoring
Robust Drivers
MPI Acceleration
Multicast Acceleration
Storage Access Acceleration
Collective communication
offload
Multicast and TCP transport utilizing
Kernel bypass technology
RDMA based storage iSCSI target
Multicast and TCP
transport utilizing
Kernel bypass
technology
© 2010 Voltaire Inc. 3
Unified Fabric Manager
“So far, I haven't seen any other solutions claiming to be a
"fabric manager" offer the sophisticated insight, resource
management, performance trending, and core fabric function
extension that UFM can … it fully illustrates what a well
architected fabric should be capable of.”
Jeff Boles, Taneja Group, June 2009
© 2010 Voltaire Inc. 4
Infiniband Traditional Management
© 2010 Voltaire Inc. 5
An Infiniband Fabric is not a black box (1/2)
► Requires Hardware management
• Detect failures, communication problems
 Inside the Infiniband Fabric
- Port counters
- Port status (QDR,DDR,SDR – 4X,2X,1X)
- Firmware upgrades (Switch and HCA ASICs)
 Outside the Infiniband Fabric
- Chassis
- Power supplies
- Fans
- Temperature
- Chassis software updates (Switch management)
© 2010 Voltaire Inc. 6
An Infiniband Fabric is not a black box (2/2)
►What about performance ?
►Some embarrassing questions…
• Blocking vs non-blocking fabrics ?
• Influence of routing algorithms ?
• Congestion ?
• Mixing different protocols on the same fabric ?
• Running multiple jobs on the same fabric ?
• Performance monitoring Tools ?
© 2010 Voltaire Inc. 7
UFM Central Management Platform
► In-depth visibility into
fabric health and traffic
• Central Dashboard, Unique
Congestion Map
• Advanced monitoring engine,
threshold based alerts
► Optimize application
performance
• Quality of Service
• Traffic Aware Routing Algorithm
► Efficient operations of
thousands of fabric
components
• Automated configuration of hosts
and switches, group tasks
• Seamless change management
Unified
Fabric
Manager
© 2010 Voltaire Inc. 8
Introducing UFM
UFM Server
CLI
GUI
(Java)
Web
Services
IB-SM
(OpenSM)
Perf Mng
Providers
Device Mng
Providers
SQL
DB
HA
Daemon
Access
Control
Central administration
of multiple switches
(or hosts)
Hierarchal performance
monitoring,
variety of sources
Leverage open
source SM
engine
Transparent
fail-over
Fast retrieval,
historical data
Manage complex
relations and
workflows
Voltaire
Plug-ins
User and
application
interfaces
© 2010 Voltaire Inc. 10
Advanced Monitoring and Analysis
► Monitor & analyze fabric performance
• Bandwidth utilization
• Unique congestion monitoring
• Dashboard for aggregated fabric view
► Real-time fabric-wide health monitoring
• Monitor events and errors through-out the fabric
• Threshold based alarms
• Granular monitoring of host and switch parameters
► Innovative congestion mapping
• One view for fabric-wide congestion and traffic patterns
• Enables root cause analysis for routing, job placement or resource allocation
inefficiencies
► All is managed at the application/aggregation level
• Event effects are clearly visible
• Pro-active measures can be taken
© 2010 Voltaire Inc. 11
Central Dashboard
Resource Utilization
& Status
Congestion Map
Top 10 alerted nodesEvent Pane
Top 10’s
B/W, Congestion
B/W Consumers
© 2010 Voltaire Inc. 12
Advanced Monitoring Engine
Multiple sessions
On demand
Sessions per Logical
Groups – no need to
know physical nodes
Aggregation per
Multiple devices
Various graphs (linear,
bar, historgram, pie…)
Correlate switch and
host information
Formulas (AVG, Max,
Min, Sum)
© 2010 Voltaire Inc. 13
Performance Optimization Cycle with UFM
Characterize
traffic pattern and priorities
Unique logical fabric model
QoS to prioritize critical apps.
Optimize routing with Voltaire’s
Traffic Optimized Routing (TOR)
Show traffic and congestion
information
Unique Congestion Map
Feedback and Analysis
OptionalOrchestrators
& Schedulers
Application Requirements
UFM Optimization
UFM Monitoring
© 2010 Voltaire Inc. 14
Advanced Performance Optimization
Mechanisms
► Fabric virtualization and Quality of Service (QoS)
• Run multiple clusters or multiple jobs on the same infrastructure
• Assure critical applications get priority through QoS policy
• Provide the required isolation for different departments or jobs
► Traffic Aware Routing Algorithm (TARA)
• Voltaire’s major shift from static to traffic aware routing
• Routing enhancements are built on top of OpenSM in a modular plug-in architecture
• Takes into consideration traffic patterns and loads
• Traffic model can be derived automatically from fabric model
or via API with 3rd party schedulers
Applicable to both DDR and QDR Environments
© 2010 Voltaire Inc. 15
Congestion Example
► Degradation due to node oversubscription
• Destination port in saturation (multiple sources)
• Congestion spread across the fabric
• ALL other flows drop to 20% of capacity
• Take time to recover
• Common with storage traffic
drop
recovery
© 2010 Voltaire Inc. 16
Quality of Service Optimization
UFM Enables QoS Optimization
© 2010 Voltaire Inc. 17
Test Environment
► 2 nodes running
a latency critical
job
► 12 nodes
running a
bandwidth
consuming job
► Goal: achieve
best
performance
with Latency
critical tasks
© 2010 Voltaire Inc. 18
W/O Partitioning Latency degradation of ~215%
Latency job running alone
(Latency = ~2.1 us)
Bandwidth job added on
same partition
(Latency = ~4.5 us)
© 2010 Voltaire Inc. 19
UFM Logical Model Creates Partition and Sets
QoS
► 2 Logical Groups
• Latency job
• B/W oriented job
► QoS settings
► UFM creates virtual
NICs, partitions and
assigns Service Levels
on the fabric
© 2010 Voltaire Inc. 20
With UFM QoS
Cross Application Interference fixed
Single job in cluster
(Latency = 2.1us)
2 jobs, UFM optimization
(Latency = 2.2us)
2nd job added
(Latency = 4.5us)
100% Better Performance Through QoS Implementation
© 2010 Voltaire Inc. 21
Optimize performance #2: routing
► Existing routing algorithms
• Are not aware of application communication flow
• They distribute paths evenly across the fabric links
► In real life, fabrics have non uniform usage
• Some endpoints “talk” a lot, some don’t “talk” at all
• Many-to-many (cluster) and any-to-many (storage) topologies
► Result
• Unbalanced fabric
• Congestion is created leading to slower performance and high latency
Congestion = Latency
© 2010 Voltaire Inc. 22
TARA Optimization
► TARA provides the following benefits:
• Reduces competition between fabric resources, thus decreasing congestion
• Increases available bandwidth, resulting in improved fabric utilization
• Delivers lower latency and shorter application runtime
► How ?
• Uses knowledge of cluster usage: logical servers, networks.
• Balances routes depending on usage
• Not based on real-time analysis of bandwidth / congestion
© 2010 Voltaire Inc. 23
Routing ?
► InfiniBand packets are ‘destination routed’ based on the
Destination Logical ID (DLID) field in the header
► In IB: DLID=route (not only remote address)
► DLIDs are 16 bits
• 48K values are used for unicast
• 16K values are used for multicast
► At each switch ASIC, the incoming unicast DLID
is used as an index into a Linear Forwarding
Table (LFT) that returns the outgoing switch
port number
• E.g. the InfiniScale III ASIC supports all 48K possible LFT entries
Out Port #
DLID
0
1
2
3
4
5
6
7
8
9
10
11
© 2010 Voltaire Inc. 24
The real wording should be
« rearrangeably non-blocking »
36p switch
Nodes 1-18
36p switch
Nodes 19-36
36p switch
Nodes 37-54
36p switch
Nodes 55-72
36p switch 36p switch
Each link represents 9 cables
18 uplinks
54 nodes
At boot time, 3 routes are assigned to each uplink, lets assume:
19-37-55 on port #1
20-38-56 on port #2, etc…
What happens if you have a job running on nodes 1-2-3-19-37-55 ?
Unbalanced communication, congestion…
© 2010 Voltaire Inc. 25
TARA Optimization
► TARA provides the following benefits:
• Reduces competition between fabric resources, thus decreasing congestion
• Increases available bandwidth, resulting in improved fabric utilization
• Delivers lower latency and shorter application runtime
► How ?
• Uses knowledge of cluster usage: logical servers, networks.
• Balances routes depending on usage
• Not based on real-time analysis of bandwidth / congestion
© 2010 Voltaire Inc. 26
With TARA
36p switch
Nodes 1-18
36p switch
Nodes 19-36
36p switch
Nodes 37-54
36p switch
Nodes 55-72
36p switch 36p switch
Each link represents 9 cables
18 uplinks
3 nodes
job running on nodes 1-2-3-19-37-55
At job launch time, routes to nodes used by the job are balanced over
all uplinks:
19 on port #1
37 on port #2
55 on port #3
Others are unchanged
© 2010 Voltaire Inc. 30
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1.18
1.28
2.20
2.30
3.22
3.32
4.24
4.34
5.26
6.18
6.28
7.20
7.30
8.22
8.32
9.24
9.34
10.26
11.18
11.28
12.20
12.30
13.22
13.32
14.24
14.34
15.26
16.18
16.28
17.20
17.30
switch.port
portweight
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1.18
1.28
2.20
2.30
3.22
3.32
4.24
4.34
5.26
6.18
6.28
7.20
7.30
8.22
8.32
9.24
9.34
10.26
11.18
11.28
12.20
12.30
13.22
13.32
14.24
14.34
15.26
16.18
16.28
17.20
17.30
switch.port
portweight
Internal ports on the line cards
trafficbandwidth
Traffic Optimized RoutingOpenSM
Job 1
47 nodes
Job 6
46 nodes
Job 2
41 nodes
Job 5
63 nodes
Job 3
71 nodes
Job 4
25 nodes
storage
Nodes (24)
traffic to/from storage
Average of 200MB/s per node
Internal traffic inside each job, 1000 MB/s from each node
Example: TARA with 324 nodes cluster
300 servers
24 storage nodes
Logical
Topology
Physical Topology
© 2010 Voltaire Inc. 31
Scale-out and Maintain Control on Fabric
► Dozens of switches and 1000s of
nodes become a massive
operational burden
► UFM automates I/O and switch
configuration enabling isolation
and QoS
► Central Device Management for
switches and hosts
► High-availability and seamless
failover of SM and UFM
► Advanced API for seamless
integration in existing
environments
Automatic, seamless operations save hours of configuration and set-up work
© 2010 Voltaire Inc. 32
Efficient Troubleshooting
► Dozens of traffic and health
events
• Easy central drill-down to counters, alerts
and events to the port level
► Configurable thresholds
and criticality levels
► GUI and log level alarms
► Alerts correlated to the
application level
► Alerts correlated to the DC
rack level
© 2010 Voltaire Inc. 33
Open system
► Extensible architecture based
on Web-services
► Open API for users or 3rd party extensions
► Expose entire fabric and datacenter object
model
► Allow simple reporting, provisioning,
monitoring, and task automation
► Tools already benefiting from UFM API
 Scheduler integration (e.g. Moab)
 UFM Support tool kit
 Various command line tools/extensions to UFM
 Web fabric portal
 * Provided in UFM Advanced packages
© 2010 Voltaire Inc. 34
UFM Adaptive Suite
- Separate UFM offering integrated with Platform LSF
 Intelligent &
automatic resource
allocation
 Optimize fabric
performance
 Maintain
connectivity upon
changes
 Central monitoring
This is the first integrated solution that correlates network fabric
management and workload management for dynamic data centers
Platform LSF
Service Policy
UFM
Fabric Provisioning
Control & Optimization
© 2010 Voltaire Inc. 35
Integration with Platform LSF
- how does it work ?
Automation and Optimization
© 2010 Voltaire Inc. 36
UFM Benefits
Simple and Automated
Lowers administration tasks
time from days to minutes
Increased Performance
Reduce congestion, lower latency
Quicker application runtime
Little Fabric Visibility
Unnoticed performance degradation
Difficult to assess impact
Low Performing Unutilized Fabrics
Arbitrary routing algorithms, QoS seldom implemented
Congested fabrics, latency affected
Complex and Manual Processes
Needs admin skills
Many options left unused at all
Ineffective Troubleshooting
Long troubleshooting time
Performance issues take days to analyze
Quick Issue Resolution
Dashboard, Alarms, Congestion Map
Reduces downtime, high fabric utilization
In-Depth Visibility and Control
Clear health and performance visualization
Business oriented impact and root analysis
Fabrics w/o UFM UFM Customers

More Related Content

What's hot

Ch 05 --- nfv basics
Ch 05 --- nfv basicsCh 05 --- nfv basics
Ch 05 --- nfv basicsYoram Orzach
 
Quality of Service at the Internet Engineering Task Force
Quality of Service at the Internet Engineering Task ForceQuality of Service at the Internet Engineering Task Force
Quality of Service at the Internet Engineering Task ForceJohn Loughney
 
Enabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault ToleranceEnabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault ToleranceDai Yang
 
Pristine rina-tnc-2016
Pristine rina-tnc-2016Pristine rina-tnc-2016
Pristine rina-tnc-2016ICT PRISTINE
 
Topic : X.25, Frame relay and ATM
Topic :  X.25, Frame relay and ATMTopic :  X.25, Frame relay and ATM
Topic : X.25, Frame relay and ATMDr Rajiv Srivastava
 
PERFORMANCE STUDIES ON THE VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKS
PERFORMANCE STUDIES ON THE  VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKSPERFORMANCE STUDIES ON THE  VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKS
PERFORMANCE STUDIES ON THE VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKSJYoTHiSH o.s
 
Open Source Carrier Networking
Open Source Carrier NetworkingOpen Source Carrier Networking
Open Source Carrier NetworkingDirk Kutscher
 
RINA Introduction, part I
RINA Introduction, part IRINA Introduction, part I
RINA Introduction, part IICT PRISTINE
 
Unifying WiFi and VLANs with the RINA model
Unifying WiFi and VLANs with the RINA modelUnifying WiFi and VLANs with the RINA model
Unifying WiFi and VLANs with the RINA modelARCFIRE ICT
 
The hague rina-workshop-intro-eduard
The hague rina-workshop-intro-eduardThe hague rina-workshop-intro-eduard
The hague rina-workshop-intro-eduardICT PRISTINE
 
RINA motivation, introduction and IRATI goals. IEEE ANTS 2012
RINA motivation, introduction and IRATI goals. IEEE ANTS 2012RINA motivation, introduction and IRATI goals. IEEE ANTS 2012
RINA motivation, introduction and IRATI goals. IEEE ANTS 2012Eleni Trouva
 
PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aq
PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aqPLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aq
PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aqPROIDEA
 
Eucnc rina-tutorial
Eucnc rina-tutorialEucnc rina-tutorial
Eucnc rina-tutorialICT PRISTINE
 
The hageu rina-workshop-security-peter
The hageu rina-workshop-security-peterThe hageu rina-workshop-security-peter
The hageu rina-workshop-security-peterICT PRISTINE
 
Network Function Virtualization : Infrastructure Overview
Network Function Virtualization : Infrastructure OverviewNetwork Function Virtualization : Infrastructure Overview
Network Function Virtualization : Infrastructure Overviewsidneel
 
The hague rina-workshop-mobility-eduard
The hague rina-workshop-mobility-eduardThe hague rina-workshop-mobility-eduard
The hague rina-workshop-mobility-eduardICT PRISTINE
 
EU-Taiwan Workshop on 5G Research, PRISTINE introduction
EU-Taiwan Workshop on 5G Research, PRISTINE introductionEU-Taiwan Workshop on 5G Research, PRISTINE introduction
EU-Taiwan Workshop on 5G Research, PRISTINE introductionICT PRISTINE
 

What's hot (20)

Ch 05 --- nfv basics
Ch 05 --- nfv basicsCh 05 --- nfv basics
Ch 05 --- nfv basics
 
Quality of Service at the Internet Engineering Task Force
Quality of Service at the Internet Engineering Task ForceQuality of Service at the Internet Engineering Task Force
Quality of Service at the Internet Engineering Task Force
 
Enabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault ToleranceEnabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault Tolerance
 
Pristine rina-tnc-2016
Pristine rina-tnc-2016Pristine rina-tnc-2016
Pristine rina-tnc-2016
 
Topic : X.25, Frame relay and ATM
Topic :  X.25, Frame relay and ATMTopic :  X.25, Frame relay and ATM
Topic : X.25, Frame relay and ATM
 
PERFORMANCE STUDIES ON THE VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKS
PERFORMANCE STUDIES ON THE  VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKSPERFORMANCE STUDIES ON THE  VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKS
PERFORMANCE STUDIES ON THE VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKS
 
HIGH SPEED NETWORKS
HIGH SPEED NETWORKSHIGH SPEED NETWORKS
HIGH SPEED NETWORKS
 
Open Source Carrier Networking
Open Source Carrier NetworkingOpen Source Carrier Networking
Open Source Carrier Networking
 
RINA Introduction, part I
RINA Introduction, part IRINA Introduction, part I
RINA Introduction, part I
 
Unifying WiFi and VLANs with the RINA model
Unifying WiFi and VLANs with the RINA modelUnifying WiFi and VLANs with the RINA model
Unifying WiFi and VLANs with the RINA model
 
The hague rina-workshop-intro-eduard
The hague rina-workshop-intro-eduardThe hague rina-workshop-intro-eduard
The hague rina-workshop-intro-eduard
 
RINA motivation, introduction and IRATI goals. IEEE ANTS 2012
RINA motivation, introduction and IRATI goals. IEEE ANTS 2012RINA motivation, introduction and IRATI goals. IEEE ANTS 2012
RINA motivation, introduction and IRATI goals. IEEE ANTS 2012
 
Chapter13
Chapter13Chapter13
Chapter13
 
PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aq
PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aqPLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aq
PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aq
 
Eucnc rina-tutorial
Eucnc rina-tutorialEucnc rina-tutorial
Eucnc rina-tutorial
 
The hageu rina-workshop-security-peter
The hageu rina-workshop-security-peterThe hageu rina-workshop-security-peter
The hageu rina-workshop-security-peter
 
Network Function Virtualization : Infrastructure Overview
Network Function Virtualization : Infrastructure OverviewNetwork Function Virtualization : Infrastructure Overview
Network Function Virtualization : Infrastructure Overview
 
The hague rina-workshop-mobility-eduard
The hague rina-workshop-mobility-eduardThe hague rina-workshop-mobility-eduard
The hague rina-workshop-mobility-eduard
 
Frame Relay
Frame RelayFrame Relay
Frame Relay
 
EU-Taiwan Workshop on 5G Research, PRISTINE introduction
EU-Taiwan Workshop on 5G Research, PRISTINE introductionEU-Taiwan Workshop on 5G Research, PRISTINE introduction
EU-Taiwan Workshop on 5G Research, PRISTINE introduction
 

Similar to Voltaire ufm en_nov10

Voltaire fca en_nov10
Voltaire fca en_nov10Voltaire fca en_nov10
Voltaire fca en_nov10sciecomp
 
Disaggregation, automation and autonomy in optical networking
Disaggregation, automation and autonomy in optical networkingDisaggregation, automation and autonomy in optical networking
Disaggregation, automation and autonomy in optical networkingADVA
 
5G in Brownfield how SDN makes 5G Deployments Work
5G in Brownfield how SDN makes 5G Deployments Work5G in Brownfield how SDN makes 5G Deployments Work
5G in Brownfield how SDN makes 5G Deployments WorkLumina Networks
 
CN Unit 4 - cs8591.pptx
CN Unit 4 - cs8591.pptxCN Unit 4 - cs8591.pptx
CN Unit 4 - cs8591.pptxshamkevin
 
IOT and System Platform From Concepts to Code
IOT and System Platform From Concepts to CodeIOT and System Platform From Concepts to Code
IOT and System Platform From Concepts to CodeAndy Robinson
 
PacNOG 31: Internet Exchange Points
PacNOG 31: Internet Exchange PointsPacNOG 31: Internet Exchange Points
PacNOG 31: Internet Exchange PointsAPNIC
 
PITA 27th AGM & Business Forum Expo 23: Internet Exchange Points
PITA 27th AGM & Business Forum Expo 23: Internet Exchange PointsPITA 27th AGM & Business Forum Expo 23: Internet Exchange Points
PITA 27th AGM & Business Forum Expo 23: Internet Exchange PointsAPNIC
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...Tal Lavian Ph.D.
 
ETE405-lec9.pdf
ETE405-lec9.pdfETE405-lec9.pdf
ETE405-lec9.pdfmashiur
 
Software Defined Optical Networks - Mayur Channegowda
Software Defined Optical Networks - Mayur ChannegowdaSoftware Defined Optical Networks - Mayur Channegowda
Software Defined Optical Networks - Mayur ChannegowdaCPqD
 
Software Defined Optical Networks - Mayur Channegowda
Software Defined Optical Networks - Mayur ChannegowdaSoftware Defined Optical Networks - Mayur Channegowda
Software Defined Optical Networks - Mayur ChannegowdaCPqD
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...Tal Lavian Ph.D.
 
Peering 101 - ABQNOG1 - May2023
Peering 101 - ABQNOG1 - May2023Peering 101 - ABQNOG1 - May2023
Peering 101 - ABQNOG1 - May2023Chris Grundemann
 
Lets talk about QoS by Megis.pdf
Lets talk about QoS by Megis.pdfLets talk about QoS by Megis.pdf
Lets talk about QoS by Megis.pdfssusere31f1c
 
Three years of OFELIA - taking stock
Three years of OFELIA - taking stockThree years of OFELIA - taking stock
Three years of OFELIA - taking stockFIBRE Testbed
 

Similar to Voltaire ufm en_nov10 (20)

The new imperative in the data center with workload centric networking
The new imperative in the data center with workload centric networkingThe new imperative in the data center with workload centric networking
The new imperative in the data center with workload centric networking
 
Voltaire fca en_nov10
Voltaire fca en_nov10Voltaire fca en_nov10
Voltaire fca en_nov10
 
Disaggregation, automation and autonomy in optical networking
Disaggregation, automation and autonomy in optical networkingDisaggregation, automation and autonomy in optical networking
Disaggregation, automation and autonomy in optical networking
 
5G in Brownfield how SDN makes 5G Deployments Work
5G in Brownfield how SDN makes 5G Deployments Work5G in Brownfield how SDN makes 5G Deployments Work
5G in Brownfield how SDN makes 5G Deployments Work
 
CN Unit 4 - cs8591.pptx
CN Unit 4 - cs8591.pptxCN Unit 4 - cs8591.pptx
CN Unit 4 - cs8591.pptx
 
IOT and System Platform From Concepts to Code
IOT and System Platform From Concepts to CodeIOT and System Platform From Concepts to Code
IOT and System Platform From Concepts to Code
 
PacNOG 31: Internet Exchange Points
PacNOG 31: Internet Exchange PointsPacNOG 31: Internet Exchange Points
PacNOG 31: Internet Exchange Points
 
PITA 27th AGM & Business Forum Expo 23: Internet Exchange Points
PITA 27th AGM & Business Forum Expo 23: Internet Exchange PointsPITA 27th AGM & Business Forum Expo 23: Internet Exchange Points
PITA 27th AGM & Business Forum Expo 23: Internet Exchange Points
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
 
Data center network reference architecture with hpe flex fabric
Data center network reference architecture with hpe flex fabricData center network reference architecture with hpe flex fabric
Data center network reference architecture with hpe flex fabric
 
ETE405-lec9.pdf
ETE405-lec9.pdfETE405-lec9.pdf
ETE405-lec9.pdf
 
Software Defined Optical Networks - Mayur Channegowda
Software Defined Optical Networks - Mayur ChannegowdaSoftware Defined Optical Networks - Mayur Channegowda
Software Defined Optical Networks - Mayur Channegowda
 
Software Defined Optical Networks - Mayur Channegowda
Software Defined Optical Networks - Mayur ChannegowdaSoftware Defined Optical Networks - Mayur Channegowda
Software Defined Optical Networks - Mayur Channegowda
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
 
Peering 101 - ABQNOG1 - May2023
Peering 101 - ABQNOG1 - May2023Peering 101 - ABQNOG1 - May2023
Peering 101 - ABQNOG1 - May2023
 
Saidul
SaidulSaidul
Saidul
 
Java One 2001
Java One 2001Java One 2001
Java One 2001
 
Lets talk about QoS by Megis.pdf
Lets talk about QoS by Megis.pdfLets talk about QoS by Megis.pdf
Lets talk about QoS by Megis.pdf
 
Three years of OFELIA - taking stock
Three years of OFELIA - taking stockThree years of OFELIA - taking stock
Three years of OFELIA - taking stock
 
Netw204 Quiz Answers Essay
Netw204 Quiz Answers EssayNetw204 Quiz Answers Essay
Netw204 Quiz Answers Essay
 

Recently uploaded

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 

Recently uploaded (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 

Voltaire ufm en_nov10

  • 1. © 2010 Voltaire Inc. November 19, 2010 Unified Fabric Manager Overview Ghislain de Jacquelot
  • 2. © 2010 Voltaire Inc. 2 Voltaire Software Portfolio Robust RDMA Drivers Fabric provisioning and performance monitoring Robust Drivers MPI Acceleration Multicast Acceleration Storage Access Acceleration Collective communication offload Multicast and TCP transport utilizing Kernel bypass technology RDMA based storage iSCSI target Multicast and TCP transport utilizing Kernel bypass technology
  • 3. © 2010 Voltaire Inc. 3 Unified Fabric Manager “So far, I haven't seen any other solutions claiming to be a "fabric manager" offer the sophisticated insight, resource management, performance trending, and core fabric function extension that UFM can … it fully illustrates what a well architected fabric should be capable of.” Jeff Boles, Taneja Group, June 2009
  • 4. © 2010 Voltaire Inc. 4 Infiniband Traditional Management
  • 5. © 2010 Voltaire Inc. 5 An Infiniband Fabric is not a black box (1/2) ► Requires Hardware management • Detect failures, communication problems  Inside the Infiniband Fabric - Port counters - Port status (QDR,DDR,SDR – 4X,2X,1X) - Firmware upgrades (Switch and HCA ASICs)  Outside the Infiniband Fabric - Chassis - Power supplies - Fans - Temperature - Chassis software updates (Switch management)
  • 6. © 2010 Voltaire Inc. 6 An Infiniband Fabric is not a black box (2/2) ►What about performance ? ►Some embarrassing questions… • Blocking vs non-blocking fabrics ? • Influence of routing algorithms ? • Congestion ? • Mixing different protocols on the same fabric ? • Running multiple jobs on the same fabric ? • Performance monitoring Tools ?
  • 7. © 2010 Voltaire Inc. 7 UFM Central Management Platform ► In-depth visibility into fabric health and traffic • Central Dashboard, Unique Congestion Map • Advanced monitoring engine, threshold based alerts ► Optimize application performance • Quality of Service • Traffic Aware Routing Algorithm ► Efficient operations of thousands of fabric components • Automated configuration of hosts and switches, group tasks • Seamless change management Unified Fabric Manager
  • 8. © 2010 Voltaire Inc. 8 Introducing UFM UFM Server CLI GUI (Java) Web Services IB-SM (OpenSM) Perf Mng Providers Device Mng Providers SQL DB HA Daemon Access Control Central administration of multiple switches (or hosts) Hierarchal performance monitoring, variety of sources Leverage open source SM engine Transparent fail-over Fast retrieval, historical data Manage complex relations and workflows Voltaire Plug-ins User and application interfaces
  • 9. © 2010 Voltaire Inc. 10 Advanced Monitoring and Analysis ► Monitor & analyze fabric performance • Bandwidth utilization • Unique congestion monitoring • Dashboard for aggregated fabric view ► Real-time fabric-wide health monitoring • Monitor events and errors through-out the fabric • Threshold based alarms • Granular monitoring of host and switch parameters ► Innovative congestion mapping • One view for fabric-wide congestion and traffic patterns • Enables root cause analysis for routing, job placement or resource allocation inefficiencies ► All is managed at the application/aggregation level • Event effects are clearly visible • Pro-active measures can be taken
  • 10. © 2010 Voltaire Inc. 11 Central Dashboard Resource Utilization & Status Congestion Map Top 10 alerted nodesEvent Pane Top 10’s B/W, Congestion B/W Consumers
  • 11. © 2010 Voltaire Inc. 12 Advanced Monitoring Engine Multiple sessions On demand Sessions per Logical Groups – no need to know physical nodes Aggregation per Multiple devices Various graphs (linear, bar, historgram, pie…) Correlate switch and host information Formulas (AVG, Max, Min, Sum)
  • 12. © 2010 Voltaire Inc. 13 Performance Optimization Cycle with UFM Characterize traffic pattern and priorities Unique logical fabric model QoS to prioritize critical apps. Optimize routing with Voltaire’s Traffic Optimized Routing (TOR) Show traffic and congestion information Unique Congestion Map Feedback and Analysis OptionalOrchestrators & Schedulers Application Requirements UFM Optimization UFM Monitoring
  • 13. © 2010 Voltaire Inc. 14 Advanced Performance Optimization Mechanisms ► Fabric virtualization and Quality of Service (QoS) • Run multiple clusters or multiple jobs on the same infrastructure • Assure critical applications get priority through QoS policy • Provide the required isolation for different departments or jobs ► Traffic Aware Routing Algorithm (TARA) • Voltaire’s major shift from static to traffic aware routing • Routing enhancements are built on top of OpenSM in a modular plug-in architecture • Takes into consideration traffic patterns and loads • Traffic model can be derived automatically from fabric model or via API with 3rd party schedulers Applicable to both DDR and QDR Environments
  • 14. © 2010 Voltaire Inc. 15 Congestion Example ► Degradation due to node oversubscription • Destination port in saturation (multiple sources) • Congestion spread across the fabric • ALL other flows drop to 20% of capacity • Take time to recover • Common with storage traffic drop recovery
  • 15. © 2010 Voltaire Inc. 16 Quality of Service Optimization UFM Enables QoS Optimization
  • 16. © 2010 Voltaire Inc. 17 Test Environment ► 2 nodes running a latency critical job ► 12 nodes running a bandwidth consuming job ► Goal: achieve best performance with Latency critical tasks
  • 17. © 2010 Voltaire Inc. 18 W/O Partitioning Latency degradation of ~215% Latency job running alone (Latency = ~2.1 us) Bandwidth job added on same partition (Latency = ~4.5 us)
  • 18. © 2010 Voltaire Inc. 19 UFM Logical Model Creates Partition and Sets QoS ► 2 Logical Groups • Latency job • B/W oriented job ► QoS settings ► UFM creates virtual NICs, partitions and assigns Service Levels on the fabric
  • 19. © 2010 Voltaire Inc. 20 With UFM QoS Cross Application Interference fixed Single job in cluster (Latency = 2.1us) 2 jobs, UFM optimization (Latency = 2.2us) 2nd job added (Latency = 4.5us) 100% Better Performance Through QoS Implementation
  • 20. © 2010 Voltaire Inc. 21 Optimize performance #2: routing ► Existing routing algorithms • Are not aware of application communication flow • They distribute paths evenly across the fabric links ► In real life, fabrics have non uniform usage • Some endpoints “talk” a lot, some don’t “talk” at all • Many-to-many (cluster) and any-to-many (storage) topologies ► Result • Unbalanced fabric • Congestion is created leading to slower performance and high latency Congestion = Latency
  • 21. © 2010 Voltaire Inc. 22 TARA Optimization ► TARA provides the following benefits: • Reduces competition between fabric resources, thus decreasing congestion • Increases available bandwidth, resulting in improved fabric utilization • Delivers lower latency and shorter application runtime ► How ? • Uses knowledge of cluster usage: logical servers, networks. • Balances routes depending on usage • Not based on real-time analysis of bandwidth / congestion
  • 22. © 2010 Voltaire Inc. 23 Routing ? ► InfiniBand packets are ‘destination routed’ based on the Destination Logical ID (DLID) field in the header ► In IB: DLID=route (not only remote address) ► DLIDs are 16 bits • 48K values are used for unicast • 16K values are used for multicast ► At each switch ASIC, the incoming unicast DLID is used as an index into a Linear Forwarding Table (LFT) that returns the outgoing switch port number • E.g. the InfiniScale III ASIC supports all 48K possible LFT entries Out Port # DLID 0 1 2 3 4 5 6 7 8 9 10 11
  • 23. © 2010 Voltaire Inc. 24 The real wording should be « rearrangeably non-blocking » 36p switch Nodes 1-18 36p switch Nodes 19-36 36p switch Nodes 37-54 36p switch Nodes 55-72 36p switch 36p switch Each link represents 9 cables 18 uplinks 54 nodes At boot time, 3 routes are assigned to each uplink, lets assume: 19-37-55 on port #1 20-38-56 on port #2, etc… What happens if you have a job running on nodes 1-2-3-19-37-55 ? Unbalanced communication, congestion…
  • 24. © 2010 Voltaire Inc. 25 TARA Optimization ► TARA provides the following benefits: • Reduces competition between fabric resources, thus decreasing congestion • Increases available bandwidth, resulting in improved fabric utilization • Delivers lower latency and shorter application runtime ► How ? • Uses knowledge of cluster usage: logical servers, networks. • Balances routes depending on usage • Not based on real-time analysis of bandwidth / congestion
  • 25. © 2010 Voltaire Inc. 26 With TARA 36p switch Nodes 1-18 36p switch Nodes 19-36 36p switch Nodes 37-54 36p switch Nodes 55-72 36p switch 36p switch Each link represents 9 cables 18 uplinks 3 nodes job running on nodes 1-2-3-19-37-55 At job launch time, routes to nodes used by the job are balanced over all uplinks: 19 on port #1 37 on port #2 55 on port #3 Others are unchanged
  • 26. © 2010 Voltaire Inc. 30 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1.18 1.28 2.20 2.30 3.22 3.32 4.24 4.34 5.26 6.18 6.28 7.20 7.30 8.22 8.32 9.24 9.34 10.26 11.18 11.28 12.20 12.30 13.22 13.32 14.24 14.34 15.26 16.18 16.28 17.20 17.30 switch.port portweight 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1.18 1.28 2.20 2.30 3.22 3.32 4.24 4.34 5.26 6.18 6.28 7.20 7.30 8.22 8.32 9.24 9.34 10.26 11.18 11.28 12.20 12.30 13.22 13.32 14.24 14.34 15.26 16.18 16.28 17.20 17.30 switch.port portweight Internal ports on the line cards trafficbandwidth Traffic Optimized RoutingOpenSM Job 1 47 nodes Job 6 46 nodes Job 2 41 nodes Job 5 63 nodes Job 3 71 nodes Job 4 25 nodes storage Nodes (24) traffic to/from storage Average of 200MB/s per node Internal traffic inside each job, 1000 MB/s from each node Example: TARA with 324 nodes cluster 300 servers 24 storage nodes Logical Topology Physical Topology
  • 27. © 2010 Voltaire Inc. 31 Scale-out and Maintain Control on Fabric ► Dozens of switches and 1000s of nodes become a massive operational burden ► UFM automates I/O and switch configuration enabling isolation and QoS ► Central Device Management for switches and hosts ► High-availability and seamless failover of SM and UFM ► Advanced API for seamless integration in existing environments Automatic, seamless operations save hours of configuration and set-up work
  • 28. © 2010 Voltaire Inc. 32 Efficient Troubleshooting ► Dozens of traffic and health events • Easy central drill-down to counters, alerts and events to the port level ► Configurable thresholds and criticality levels ► GUI and log level alarms ► Alerts correlated to the application level ► Alerts correlated to the DC rack level
  • 29. © 2010 Voltaire Inc. 33 Open system ► Extensible architecture based on Web-services ► Open API for users or 3rd party extensions ► Expose entire fabric and datacenter object model ► Allow simple reporting, provisioning, monitoring, and task automation ► Tools already benefiting from UFM API  Scheduler integration (e.g. Moab)  UFM Support tool kit  Various command line tools/extensions to UFM  Web fabric portal  * Provided in UFM Advanced packages
  • 30. © 2010 Voltaire Inc. 34 UFM Adaptive Suite - Separate UFM offering integrated with Platform LSF  Intelligent & automatic resource allocation  Optimize fabric performance  Maintain connectivity upon changes  Central monitoring This is the first integrated solution that correlates network fabric management and workload management for dynamic data centers Platform LSF Service Policy UFM Fabric Provisioning Control & Optimization
  • 31. © 2010 Voltaire Inc. 35 Integration with Platform LSF - how does it work ? Automation and Optimization
  • 32. © 2010 Voltaire Inc. 36 UFM Benefits Simple and Automated Lowers administration tasks time from days to minutes Increased Performance Reduce congestion, lower latency Quicker application runtime Little Fabric Visibility Unnoticed performance degradation Difficult to assess impact Low Performing Unutilized Fabrics Arbitrary routing algorithms, QoS seldom implemented Congested fabrics, latency affected Complex and Manual Processes Needs admin skills Many options left unused at all Ineffective Troubleshooting Long troubleshooting time Performance issues take days to analyze Quick Issue Resolution Dashboard, Alarms, Congestion Map Reduces downtime, high fabric utilization In-Depth Visibility and Control Clear health and performance visualization Business oriented impact and root analysis Fabrics w/o UFM UFM Customers