SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Slide 1/15
Raise your Uptime
  How to monitor heterogeneous server
             environments with Linux

         LPI Forum Warsaw, 28th September 2012




                                                 Slide 2/15
Agenda

1) Introduction
2) Why monitoring?
3) Icinga Setup and Usage
4) IPMI
5) Conclusions




                            Slide 3/15
1) Introduction


          who I am ...                who
                                    I'm not
Werner      Linux user   Teamlead   Kernel or
Fischer     since 2001    R&D at    H/W dev.




                                            Slide 4/15
2) Why monitoring?


                 You'll get alerts
                      in realtime


                 It tells you the
                     “SOMETHING”


                     It'll save you
                     a lot of time!

                                      Slide 5/15
2) Why monitoring?

●   So why do monitoring?
    ●   Check Availability
        → send realtime alerts

    ●   Check Performance
        → discover trends

    ●   Collect SLA Data
        → prove uptimes


                                 Slide 6/15
2) What can I monitor?

●   Hardware                ●   Services
    ●   Server (IPMI)           ●   eg. DNS, FTP, HTTP
    ●   Storage Systems         ●   SSH, SMTP, …
    ●   Environment             ●   TCP & UDP ports

●   Operating Systems       ●   Applications
    ●   CPU, Memory, Disk       ●   SAP
    ●   Processes               ●   all Databases
    ●   Log files               ●   Directory services
    ●   ...                     ●   ...
                                                         Slide 7/15
3) Icinga Setup

●   To setup your monitoring environment:
    ●   Install Ubuntu 12.04

    ●   sudo apt-get install icinga

●   To get nice diagrams:
    ●   sudo apt-get install pnp4nagios



                                            Slide 8/15
3) Use Icinga

●   Icinga Classic web interface




                                   Slide 9/15
4) IPMI Introduction

●   IPMI = Intelligent Platform Management Interface
    ●   Developed 1998 by Intel, HP, NEC, Dell
    ●   Current IPMI v2.0 since 2004
●   Purpose:
                 Monitoring                 Logging
             (temp, fans,...)        (system event log)

             Recovery Control             Inventory
           (power on/off/reset)      (FRU information)


                                                       Slide 10/15
4) IPMI Introduction

                               access req.
                                                                Remote Mmgt. Card
                               username &                        (KVM over IP, ...)
                                                                                                     ICMB

     LAN
   Connector
                   Serial
                 Connector
                                password                             Auxillary
                                                                  IPMB Connector
                                                                                        ICMB
                                                                                        bridge

                                                                                                               Chassis
                                      PCI mgmt. bus                                    IPMB                     mgmt.
                                                                              NVS Storage                      (Satellite
                                                                                 SDR
                                                                                                              Controller)
    Network
                            LAN                                                  SEL
     (LAN)
                         interface                                               FRU
    Controller
                                              Baseboard                                                     FRU      Temp.
                                                                           Sensors & Controls
                                             Management                                                              sensor
  access req.                                 Controller
                                                                              Fan sensor
                                                                             Temp. sensor
                                                                                                                       …

                                                (BMC)                        Power control
root privileges                                                              Reset control
                                                                                  …
                                                                                                             Chassis board


                   Serial                      BMC
                              Serial/Modem
                    Port                      Serial                       private mgmt. busses             FRU
                                interface
                  Sharing                    Controller
                                                                              FRU             FRU
                                                                                                            Redundant Power
                   M/B                                                                                           board
                                                                            Temp. s.
                  Serial                                  System
                 Controller                               interface         Memory       Processor
                                                                             board         board
                                         System bus
   Motherboard



                                                                                                                             Slide 11/15
4) IPMI Sensor Classes

●   No need to configure threshold values
    Discrete sensors                                   Threshold sensors
    [root@test ~]# ipmitool sdr get "PS2 Status"       [root@test ~]# ipmitool sdr get "Fan 1"
     [root@test ~]# ipmitool sdr get "PS2 Status"       [root@test ~]# ipmitool sdr get "Fan 1"
    Sensor ID              : PS2 Status (0x71)         Sensor ID              : Fan 1 (0x50)
     Sensor ID              : PS2 Status (0x71)         Sensor ID              : Fan 1 (0x50)
     Entity ID             : 10.2 (Power Supply)        Entity ID             : 29.1 (Fan Device)
      Entity ID             : 10.2 (Power Supply)        Entity ID             : 29.1 (Fan Device)
     Sensor Type (Discrete): Power Supply               Sensor Type (Analog)  : Fan
      Sensor Type (Discrete): Power Supply               Sensor Type (Analog)  : Fan
     States Asserted       : Power Supply               Sensor Reading        : 5719 (+/­ 0) RPM
      States Asserted       : Power Supply               Sensor Reading        : 5719 (+/­ 0) RPM
                             [Presence detected]        Status                : ok
                              [Presence detected]        Status                : ok
                             [Power Supply AC lost]     Nominal Reading       : 6708.000
                              [Power Supply AC lost]     Nominal Reading       : 6708.000
     Assertion Events      : Power Supply               Normal Minimum        : 2451.000
      Assertion Events      : Power Supply               Normal Minimum        : 2451.000
                             [Presence detected]        Normal Maximum        : 10965.000
                              [Presence detected]        Normal Maximum        : 10965.000
                             [Power Supply AC lost]     Lower critical        : 1720.000
                              [Power Supply AC lost]     Lower critical        : 1720.000
     Assertions Enabled    : Power Supply               Lower non­critical    : 1978.000
      Assertions Enabled    : Power Supply               Lower non­critical    : 1978.000
                             [Presence detected]        Positive Hysteresis   : 86.000
                              [Presence detected]        Positive Hysteresis   : 86.000
                             [Failure detected]         Negative Hysteresis   : 86.000
                              [Failure detected]         Negative Hysteresis   : 86.000
                             [Predictive failure]       Minimum sensor range  : Unspecified
                              [Predictive failure]       Minimum sensor range  : Unspecified
                             [Power Supply AC lost]     Maximum sensor range  : Unspecified
                              [Power Supply AC lost]     Maximum sensor range  : Unspecified
    [...]                                               Event Message Control : Per­threshold
     [...]                                               Event Message Control : Per­threshold
     Deassertions Enabled  : Power Supply               Readable Thresholds   : lcr lnc 
      Deassertions Enabled  : Power Supply               Readable Thresholds   : lcr lnc 
    [...]                                               Settable Thresholds   : lcr lnc 
     [...]                                               Settable Thresholds   : lcr lnc 
                                                        Threshold Read Mask   : lcr lnc 
                                                         Threshold Read Mask   : lcr lnc 
                                                        Assertion Events      : 
                                                         Assertion Events      : 
                                                        Assertions Enabled    : lnc­ lcr­ 
                                                         Assertions Enabled    : lnc­ lcr­ 
                                                        Deassertions Enabled  : lnc­ lcr­ 
                                                         Deassertions Enabled  : lnc­ lcr­ 




                                                                                                     Slide 12/15
4) IPMI Plugin

●   Developed by
    Thomas Krenn

●   Open Source
    (GPL v3)

●   www.thomas-
    krenn.com/en/oss



                         Slide 13/15
4) IPMI Service Check

●   IPMI service check shows hardware issues:




                                                Slide 14/15
5) Conclusions




   Monitor hardware
    with Icinga & IPMI


     Problems?
    They will tell you!


      It'll save you
      time & money


                          Slide 15/15

Más contenido relacionado

La actualidad más candente

Camera camcorder framework overview(ginger bread)
Camera camcorder framework overview(ginger bread)Camera camcorder framework overview(ginger bread)
Camera camcorder framework overview(ginger bread)
fefe7270
 
Mpc5121 econfs
Mpc5121 econfsMpc5121 econfs
Mpc5121 econfs
Dino, llc
 
Sears Point Racetrack
Sears Point RacetrackSears Point Racetrack
Sears Point Racetrack
Dino, llc
 
labour force
labour forcelabour force
labour force
S S
 
Cellphone land rover using micro controller
Cellphone land rover using micro controllerCellphone land rover using micro controller
Cellphone land rover using micro controller
harideepu
 
20111130 hardware-monitoring-with-the-new-ipmi-plugin-v2
20111130 hardware-monitoring-with-the-new-ipmi-plugin-v220111130 hardware-monitoring-with-the-new-ipmi-plugin-v2
20111130 hardware-monitoring-with-the-new-ipmi-plugin-v2
Werner Fischer
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU Computing
Jeff Larkin
 

La actualidad más candente (20)

Camera camcorder framework overview(ginger bread)
Camera camcorder framework overview(ginger bread)Camera camcorder framework overview(ginger bread)
Camera camcorder framework overview(ginger bread)
 
伺服馬達控制
伺服馬達控制伺服馬達控制
伺服馬達控制
 
Arm 7 nxp
Arm 7 nxpArm 7 nxp
Arm 7 nxp
 
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
 
Training material wmt 2012 ctnc
Training material wmt 2012 ctncTraining material wmt 2012 ctnc
Training material wmt 2012 ctnc
 
Spotlight on Python
Spotlight on PythonSpotlight on Python
Spotlight on Python
 
Mpc5121 econfs
Mpc5121 econfsMpc5121 econfs
Mpc5121 econfs
 
Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processors
 
Sears Point Racetrack
Sears Point RacetrackSears Point Racetrack
Sears Point Racetrack
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
 
Architecture of tms320 f2812
Architecture of tms320 f2812Architecture of tms320 f2812
Architecture of tms320 f2812
 
One day-workshop on tms320 f2812
One day-workshop on tms320 f2812One day-workshop on tms320 f2812
One day-workshop on tms320 f2812
 
Tms320 f2812
Tms320 f2812Tms320 f2812
Tms320 f2812
 
labour force
labour forcelabour force
labour force
 
Cellphone land rover using micro controller
Cellphone land rover using micro controllerCellphone land rover using micro controller
Cellphone land rover using micro controller
 
20111130 hardware-monitoring-with-the-new-ipmi-plugin-v2
20111130 hardware-monitoring-with-the-new-ipmi-plugin-v220111130 hardware-monitoring-with-the-new-ipmi-plugin-v2
20111130 hardware-monitoring-with-the-new-ipmi-plugin-v2
 
Interrupt system f28x
Interrupt system f28xInterrupt system f28x
Interrupt system f28x
 
Nxp jul1311
Nxp jul1311Nxp jul1311
Nxp jul1311
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU Computing
 
Acd 2000 q-ds_120220
Acd 2000 q-ds_120220Acd 2000 q-ds_120220
Acd 2000 q-ds_120220
 

Destacado

Destacado (7)

Linuxcon Barcelon 2012: LXC Best Practices
Linuxcon Barcelon 2012: LXC Best PracticesLinuxcon Barcelon 2012: LXC Best Practices
Linuxcon Barcelon 2012: LXC Best Practices
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source Creativity
 
The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post Formats
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome Economy
 

Similar a Raise your Uptime - How to monitor heterogeneous server environments with Linux

Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Altera Corporation
 
I3 multicore processor
I3 multicore processorI3 multicore processor
I3 multicore processor
Amol Barewar
 
Modeling System Behaviors: A Better Paradigm on Prototyping
Modeling System Behaviors: A Better Paradigm on PrototypingModeling System Behaviors: A Better Paradigm on Prototyping
Modeling System Behaviors: A Better Paradigm on Prototyping
DVClub
 
ARMvisor @ Linux Symposium 2012
ARMvisor @ Linux Symposium 2012ARMvisor @ Linux Symposium 2012
ARMvisor @ Linux Symposium 2012
Peter Chang
 

Similar a Raise your Uptime - How to monitor heterogeneous server environments with Linux (20)

OSMC 2014: Server Hardware Monitoring done right | Werner Fischer
OSMC 2014: Server Hardware Monitoring done right | Werner FischerOSMC 2014: Server Hardware Monitoring done right | Werner Fischer
OSMC 2014: Server Hardware Monitoring done right | Werner Fischer
 
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
 
Ap nr5000 pt file
Ap nr5000 pt fileAp nr5000 pt file
Ap nr5000 pt file
 
I3 multicore processor
I3 multicore processorI3 multicore processor
I3 multicore processor
 
I3
I3I3
I3
 
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
 
Introduction to Microcontroller
Introduction to MicrocontrollerIntroduction to Microcontroller
Introduction to Microcontroller
 
Introduction to Microcontroller
Introduction to MicrocontrollerIntroduction to Microcontroller
Introduction to Microcontroller
 
Modeling System Behaviors: A Better Paradigm on Prototyping
Modeling System Behaviors: A Better Paradigm on PrototypingModeling System Behaviors: A Better Paradigm on Prototyping
Modeling System Behaviors: A Better Paradigm on Prototyping
 
EASA Part-66 Module 5.6 : Basic Computer Structure
EASA Part-66 Module  5.6 : Basic Computer StructureEASA Part-66 Module  5.6 : Basic Computer Structure
EASA Part-66 Module 5.6 : Basic Computer Structure
 
ARMvisor @ Linux Symposium 2012
ARMvisor @ Linux Symposium 2012ARMvisor @ Linux Symposium 2012
ARMvisor @ Linux Symposium 2012
 
A series presentation
A series presentationA series presentation
A series presentation
 
Arm Lecture
Arm LectureArm Lecture
Arm Lecture
 
Hybrid Programmable Forwarding Planes: BoF Session
Hybrid Programmable Forwarding Planes: BoF SessionHybrid Programmable Forwarding Planes: BoF Session
Hybrid Programmable Forwarding Planes: BoF Session
 
Thesis Donato Slides EN
Thesis Donato Slides ENThesis Donato Slides EN
Thesis Donato Slides EN
 
Презентация команды "Обыватели"
Презентация команды "Обыватели"Презентация команды "Обыватели"
Презентация команды "Обыватели"
 
Nvidia tegra K1 Presentation
Nvidia tegra K1 PresentationNvidia tegra K1 Presentation
Nvidia tegra K1 Presentation
 
Cisco Equipment Security
Cisco Equipment SecurityCisco Equipment Security
Cisco Equipment Security
 
Isa scada overview
Isa scada overviewIsa scada overview
Isa scada overview
 
I pv4 multicast
I pv4 multicastI pv4 multicast
I pv4 multicast
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Raise your Uptime - How to monitor heterogeneous server environments with Linux

  • 2. Raise your Uptime How to monitor heterogeneous server environments with Linux LPI Forum Warsaw, 28th September 2012 Slide 2/15
  • 3. Agenda 1) Introduction 2) Why monitoring? 3) Icinga Setup and Usage 4) IPMI 5) Conclusions Slide 3/15
  • 4. 1) Introduction who I am ... who I'm not Werner Linux user Teamlead Kernel or Fischer since 2001 R&D at H/W dev. Slide 4/15
  • 5. 2) Why monitoring? You'll get alerts in realtime It tells you the “SOMETHING” It'll save you a lot of time! Slide 5/15
  • 6. 2) Why monitoring? ● So why do monitoring? ● Check Availability → send realtime alerts ● Check Performance → discover trends ● Collect SLA Data → prove uptimes Slide 6/15
  • 7. 2) What can I monitor? ● Hardware ● Services ● Server (IPMI) ● eg. DNS, FTP, HTTP ● Storage Systems ● SSH, SMTP, … ● Environment ● TCP & UDP ports ● Operating Systems ● Applications ● CPU, Memory, Disk ● SAP ● Processes ● all Databases ● Log files ● Directory services ● ... ● ... Slide 7/15
  • 8. 3) Icinga Setup ● To setup your monitoring environment: ● Install Ubuntu 12.04 ● sudo apt-get install icinga ● To get nice diagrams: ● sudo apt-get install pnp4nagios Slide 8/15
  • 9. 3) Use Icinga ● Icinga Classic web interface Slide 9/15
  • 10. 4) IPMI Introduction ● IPMI = Intelligent Platform Management Interface ● Developed 1998 by Intel, HP, NEC, Dell ● Current IPMI v2.0 since 2004 ● Purpose: Monitoring Logging  (temp, fans,...)  (system event log) Recovery Control Inventory  (power on/off/reset)  (FRU information) Slide 10/15
  • 11. 4) IPMI Introduction access req. Remote Mmgt. Card username & (KVM over IP, ...) ICMB LAN Connector Serial Connector password Auxillary IPMB Connector ICMB bridge Chassis PCI mgmt. bus IPMB mgmt. NVS Storage (Satellite SDR Controller) Network LAN SEL (LAN) interface FRU Controller Baseboard FRU Temp. Sensors & Controls Management sensor access req. Controller Fan sensor Temp. sensor … (BMC) Power control root privileges Reset control … Chassis board Serial BMC Serial/Modem Port Serial private mgmt. busses FRU interface Sharing Controller FRU FRU Redundant Power M/B board Temp. s. Serial System Controller interface Memory Processor board board System bus Motherboard Slide 11/15
  • 12. 4) IPMI Sensor Classes ● No need to configure threshold values Discrete sensors Threshold sensors [root@test ~]# ipmitool sdr get "PS2 Status" [root@test ~]# ipmitool sdr get "Fan 1" [root@test ~]# ipmitool sdr get "PS2 Status" [root@test ~]# ipmitool sdr get "Fan 1" Sensor ID              : PS2 Status (0x71) Sensor ID              : Fan 1 (0x50) Sensor ID              : PS2 Status (0x71) Sensor ID              : Fan 1 (0x50)  Entity ID             : 10.2 (Power Supply)  Entity ID             : 29.1 (Fan Device)  Entity ID             : 10.2 (Power Supply)  Entity ID             : 29.1 (Fan Device)  Sensor Type (Discrete): Power Supply  Sensor Type (Analog)  : Fan  Sensor Type (Discrete): Power Supply  Sensor Type (Analog)  : Fan  States Asserted       : Power Supply  Sensor Reading        : 5719 (+/­ 0) RPM  States Asserted       : Power Supply  Sensor Reading        : 5719 (+/­ 0) RPM                          [Presence detected]  Status                : ok                          [Presence detected]  Status                : ok                          [Power Supply AC lost]  Nominal Reading       : 6708.000                          [Power Supply AC lost]  Nominal Reading       : 6708.000  Assertion Events      : Power Supply  Normal Minimum        : 2451.000  Assertion Events      : Power Supply  Normal Minimum        : 2451.000                          [Presence detected]  Normal Maximum        : 10965.000                          [Presence detected]  Normal Maximum        : 10965.000                          [Power Supply AC lost]  Lower critical        : 1720.000                          [Power Supply AC lost]  Lower critical        : 1720.000  Assertions Enabled    : Power Supply  Lower non­critical    : 1978.000  Assertions Enabled    : Power Supply  Lower non­critical    : 1978.000                          [Presence detected]  Positive Hysteresis   : 86.000                          [Presence detected]  Positive Hysteresis   : 86.000                          [Failure detected]  Negative Hysteresis   : 86.000                          [Failure detected]  Negative Hysteresis   : 86.000                          [Predictive failure]  Minimum sensor range  : Unspecified                          [Predictive failure]  Minimum sensor range  : Unspecified                          [Power Supply AC lost]  Maximum sensor range  : Unspecified                          [Power Supply AC lost]  Maximum sensor range  : Unspecified [...]  Event Message Control : Per­threshold [...]  Event Message Control : Per­threshold  Deassertions Enabled  : Power Supply  Readable Thresholds   : lcr lnc   Deassertions Enabled  : Power Supply  Readable Thresholds   : lcr lnc  [...]  Settable Thresholds   : lcr lnc  [...]  Settable Thresholds   : lcr lnc   Threshold Read Mask   : lcr lnc   Threshold Read Mask   : lcr lnc   Assertion Events      :   Assertion Events      :   Assertions Enabled    : lnc­ lcr­   Assertions Enabled    : lnc­ lcr­   Deassertions Enabled  : lnc­ lcr­   Deassertions Enabled  : lnc­ lcr­  Slide 12/15
  • 13. 4) IPMI Plugin ● Developed by Thomas Krenn ● Open Source (GPL v3) ● www.thomas- krenn.com/en/oss Slide 13/15
  • 14. 4) IPMI Service Check ● IPMI service check shows hardware issues: Slide 14/15
  • 15. 5) Conclusions  Monitor hardware with Icinga & IPMI  Problems? They will tell you!  It'll save you time & money Slide 15/15