Today's data center managers are burdened by a lack of aligned information of multiple layers. Work-flow events like 'job starts' aligned with performance metrics and events extracted from log facilities are low-hanging fruit that is on the edge to become use-able due to open-source software like Graphite, StatsD, logstash and alike.
This talk aims to show off the benefits of merging multiple layers of information within an InfiniBand cluster by using use-cases for level 1/2/3 personnel.
15. Cluster?
5
„A computer cluster consists of a set of loosely connected or tightly connected computers !
that work together so that in many respects they can be viewed as a single system.“ - wikipedia.org
User
16. Cluster?
5
„A computer cluster consists of a set of loosely connected or tightly connected computers !
that work together so that in many respects they can be viewed as a single system.“ - wikipedia.org
User
17. Cluster?
5
„A computer cluster consists of a set of loosely connected or tightly connected computers !
that work together so that in many respects they can be viewed as a single system.“ - wikipedia.org
User
38. !
❖ No way of connecting them
Little Data w/o Connection
9
❖ Multiple data sources
39. !
!
❖ Connecting is manual labour
!
❖ No way of connecting them
Little Data w/o Connection
9
❖ Multiple data sources
40. !
!
!
❖ Experience driven
!
!
❖ Connecting is manual labour
!
❖ No way of connecting them
Little Data w/o Connection
9
❖ Multiple data sources
41. !
!
!
!
❖ Niche solutions misleading
!
!
!
❖ Experience driven
!
!
❖ Connecting is manual labour
!
❖ No way of connecting them
Little Data w/o Connection
9
❖ Multiple data sources
45. Modular Switch
13
❖ Looks like one „switch“!
❖ Composed of a network itself!
❖ Which route is taken is transparent to
application!
❖ LB1<>FB1<>LB4
46. Modular Switch
14
❖ Looks like one „switch“!
❖ Composed of a network itself!
❖ Which route is taken is transparent to
application!
❖ LB1<>FB1<>LB4!
❖ LB1<>FB2<>LB4
47. Modular Switch
15
❖ Looks like one „switch“!
❖ Composed of a network itself!
❖ Which route is taken is transparent to
application!
❖ LB1<>FB1<>LB4!
❖ LB1<>FB2<>LB4!
❖ LB1 ->FB1 ->LB4 / LB1 <-FB2 <-LB4
49. !
❖ 96 port switch
Debug-Nightmare
16
❖ Job seems to fail due to bad internal link
50. !
!
❖ multiple autonomous job-cells
!
❖ 96 port switch
Debug-Nightmare
16
❖ Job seems to fail due to bad internal link
51. !
!
!
❖ Relevant information!
❖ Job status (Resource Scheduler)!
❖ Routes (IB Subnet Manager)!
❖ IB Counter (Command Line)
!
!
❖ multiple autonomous job-cells
!
❖ 96 port switch
Debug-Nightmare
16
❖ Job seems to fail due to bad internal link
52. !
!
!
!
!
!
!
❖ changing one plug, recomputes routes :)
!
!
!
❖ Relevant information!
❖ Job status (Resource Scheduler)!
❖ Routes (IB Subnet Manager)!
❖ IB Counter (Command Line)
!
!
❖ multiple autonomous job-cells
!
❖ 96 port switch
Debug-Nightmare
16
❖ Job seems to fail due to bad internal link
53. Communication Networks
IBPM: Demo OverviewBackground: InfiniBand (IB)
Rate Measurement in IB Networks
IBPM: An Open-Source-Based Framework for
InfiniBand Performance Monitoring
Michael Hoefling1, Michael Menth1, Christian Kniep2, Marcus Camen2
State-of-the art communication technology for interconnection in
high-performance computing data centers
Point-to-point bidirectional links
High throughput (40 Gbit/s with QDR)
Low latency
Dynamic on-line network reconfiguration
in cooperation with
Idea
Extract raw network information from IB network
Analyze output
Derive statistics about performance of the network
Topology Extraction
Subnet discovery using ibnetdiscover
Produces human readable file of network topology
Process output to produce graphical representation of the
network
Remote Counter Readout
Each port has its own set of performance counters
Counters measure, e.g., transferred data, congestion, errors,
link states changes
ibsim-Based Network Simulation
ibsim simulates an IB network
Simple topology changes possible (GUI)
ibsim limitations
No performance simulation possible
No data rate changes possible
Real IB Network
Physical network
Allows performance measurements
GUI controlled traffic scenarios
17
73. Cluster Stack Mock-Up
❖ IB events and metrics are not enough!
❖ How to get real-world behavior?!
❖ Wanted:!
❖ Slurm (Resource Scheduler)!
❖ MPI enabled compute nodes!
❖ As much additional cluster stack as possible
(Graphite,elasticsearch/logstash/kibana, Icinga, Cluster-FS, …)
24
74. Classical Virtualization
❖ Big overhead for simple node!
❖ Resources provisioned in advance!
❖ Host resources allocated
25
83. Master Node
❖ takes care of inventory (etcd)!
❖ provides DNS (+PTR)!
❖ Integrate Rudder, ansible, chef,…?
28
84. Non-Master Nodes (in general)
❖ are started with master as DNS!
❖ mounting /scratch, /chome (sits on SSDs)!
❖ supervisord kicks in and starts services and setup-scripts!
❖ sending metrics to graphite!
❖ logs to logstash
29
107. docker-icinga
40
❖ Icinga to provide !
❖ state-of-the-cluster overview!
❖ bundle with graphite/elk!
❖ no big deal…
108. docker-icinga
40
❖ Icinga to provide !
❖ state-of-the-cluster overview!
❖ bundle with graphite/elk!
❖ no big deal…
!
!
!
!
❖ Is this going to scale?
122. pipework / mininet
❖ Currently all containers are bound to docker0 bridge!
❖ Creating topology with virtual/real switches would be nice!
❖ First iteration might use pipework!
❖ More complete one should use vSwitches (mininet?)
44