SlideShare una empresa de Scribd logo
1 de 22
Beyond Nagios


      NYC DevOps 2011/07/21
Alexis Lê-Quôc - alq@datadoghq.com
Beyond Nagios


      NYC DevOps 2011/07/21
Alexis Lê-Quôc - alq@datadoghq.com
What I’m Going To Talk About

    • Super-quick   Nagios summary

    • Monitoring/Alerting   Pathologies

    • How   to fix it
What Is

• “Industry   Standard in IT Infrastructure Monitoring”

  • For   once it’s true...

• Scheduler    & Notification server
(+) Robust, Mature code-base

(-) Configuration can be daunting

(-) Not human-friendly
“OVERWHELMING”
A “NORMAL” HOUR
THE “OTHER” NAGIOS UI
Process alerts
                  & Fix things




Receive alerts                    Add more checks




     THE HAPPY START
Missed alerts




Ignore Alerts                   Add more checks




 THE SPIRAL OF DEATH
Quality
      of life


Few checks
Few alerts




                 More checks
                 Too many alerts

                                   # of alerts
             FIGHT OR FLIGHT
Effective                                    Checks n^2
 Coverage                                     Fault-tolerant
                                              Less urgency

Few checks
Few alerts
Every host counts




                    More checks
                    Too many alerts
                    Every host still counts             Scale
                                                    Complexity

    THE TROUGH OF DESPAIR
Effective
Coverage




                           Scale
    IF ONLY I ADDED MORE
           CHECKS...
Reset!
Way Out
‣Breathe!
‣Measure
‣Look for Patterns
‣Put Alerts in Context
‣Focus on the Business
Turn Nagios logs into structured data




                            Analyze


              day     | success_pct | warning_pct | error_pct | events
---------------------+-------------+-------------+-----------+--------
           2011-07-12 00:00:00 |       89 |       0|       2 | 9628
           2011-07-13 00:00:00 |       90 |       0|       2 | 9210
           2011-07-14 00:00:00 |       90 |       0|       2 | 9735
           2011-07-15 00:00:00 |       89 |       0|       2 | 9531




                    MEASURE
day     | success_pct | warning_pct | error_pct | events
---------------------+-------------+-------------+-----------+--------
           2011-07-12 00:00:00 |       89 |       0|       2 | 9628
           2011-07-13 00:00:00 |       90 |       0|       2 | 9210
           2011-07-14 00:00:00 |       90 |       0|       2 | 9735
           2011-07-15 00:00:00 |       89 |       0|       2 | 9531




VISUALIZATION MATTERS
In Time




      Flapping




LOOK FOR PATTERNS
PUT ALERTS IN CONTEXT
    https://app.datad0g.com/dash/dash/1000#/date_range/1310682467000.0-1310684267000.0
Ultimate (hard) question
‣Does this alert impact the business?
 ‣If so by how much?
 ‣Assumes that you track business metrics...
 ‣And they can be accessed programatically



FOCUS ON THE BUSINESS
What applies to Nagios...
Applies to other sources too




                       etc...
Thanks


http://datadoghq.com

Más contenido relacionado

Similar a Beyond Nagios

Securing Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsSecuring Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsAdrian Sanabria
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015Shannon Lietz
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck OverviewRundeck
 
How to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SREHow to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SRETzung-Hsien (Shawn) Ho
 
Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Capgemini
 
Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]Haggai Philip Zagury
 
An Introduction to ORYX Software
An Introduction to ORYX SoftwareAn Introduction to ORYX Software
An Introduction to ORYX SoftwareAccountagility
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015Shannon Lietz
 
Information Security in the Gaming World
Information Security in the Gaming WorldInformation Security in the Gaming World
Information Security in the Gaming WorldDimitrios Stergiou
 
Quick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, OpengearQuick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, OpengearMyNOG
 
Ploigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdfPloigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdfBill Bensing
 
Achieving Compliance Through Security
Achieving Compliance Through SecurityAchieving Compliance Through Security
Achieving Compliance Through SecurityEnergySec
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloudMichael Kopp
 
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOpsOSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOpsNETWAYS
 
45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the CloudCloudPassage
 
Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Splunk
 
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...Nagios
 

Similar a Beyond Nagios (20)

Securing Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsSecuring Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These Years
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck Overview
 
How to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SREHow to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SRE
 
Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...
 
Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]
 
An Introduction to ORYX Software
An Introduction to ORYX SoftwareAn Introduction to ORYX Software
An Introduction to ORYX Software
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015
 
DevSecCon Keynote
DevSecCon KeynoteDevSecCon Keynote
DevSecCon Keynote
 
Information Security in the Gaming World
Information Security in the Gaming WorldInformation Security in the Gaming World
Information Security in the Gaming World
 
Q insure
Q insure Q insure
Q insure
 
Quick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, OpengearQuick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, Opengear
 
Ploigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdfPloigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdf
 
EN - Workload Module
EN - 	Workload ModuleEN - 	Workload Module
EN - Workload Module
 
Achieving Compliance Through Security
Achieving Compliance Through SecurityAchieving Compliance Through Security
Achieving Compliance Through Security
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloud
 
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOpsOSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOps
 
45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud
 
Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?
 
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Beyond Nagios

  • 1. Beyond Nagios NYC DevOps 2011/07/21 Alexis Lê-Quôc - alq@datadoghq.com
  • 2. Beyond Nagios NYC DevOps 2011/07/21 Alexis Lê-Quôc - alq@datadoghq.com
  • 3. What I’m Going To Talk About • Super-quick Nagios summary • Monitoring/Alerting Pathologies • How to fix it
  • 4. What Is • “Industry Standard in IT Infrastructure Monitoring” • For once it’s true... • Scheduler & Notification server
  • 5. (+) Robust, Mature code-base (-) Configuration can be daunting (-) Not human-friendly
  • 9. Process alerts & Fix things Receive alerts Add more checks THE HAPPY START
  • 10. Missed alerts Ignore Alerts Add more checks THE SPIRAL OF DEATH
  • 11. Quality of life Few checks Few alerts More checks Too many alerts # of alerts FIGHT OR FLIGHT
  • 12. Effective Checks n^2 Coverage Fault-tolerant Less urgency Few checks Few alerts Every host counts More checks Too many alerts Every host still counts Scale Complexity THE TROUGH OF DESPAIR
  • 13. Effective Coverage Scale IF ONLY I ADDED MORE CHECKS...
  • 15. Way Out ‣Breathe! ‣Measure ‣Look for Patterns ‣Put Alerts in Context ‣Focus on the Business
  • 16. Turn Nagios logs into structured data Analyze day | success_pct | warning_pct | error_pct | events ---------------------+-------------+-------------+-----------+-------- 2011-07-12 00:00:00 | 89 | 0| 2 | 9628 2011-07-13 00:00:00 | 90 | 0| 2 | 9210 2011-07-14 00:00:00 | 90 | 0| 2 | 9735 2011-07-15 00:00:00 | 89 | 0| 2 | 9531 MEASURE
  • 17. day | success_pct | warning_pct | error_pct | events ---------------------+-------------+-------------+-----------+-------- 2011-07-12 00:00:00 | 89 | 0| 2 | 9628 2011-07-13 00:00:00 | 90 | 0| 2 | 9210 2011-07-14 00:00:00 | 90 | 0| 2 | 9735 2011-07-15 00:00:00 | 89 | 0| 2 | 9531 VISUALIZATION MATTERS
  • 18. In Time Flapping LOOK FOR PATTERNS
  • 19. PUT ALERTS IN CONTEXT https://app.datad0g.com/dash/dash/1000#/date_range/1310682467000.0-1310684267000.0
  • 20. Ultimate (hard) question ‣Does this alert impact the business? ‣If so by how much? ‣Assumes that you track business metrics... ‣And they can be accessed programatically FOCUS ON THE BUSINESS
  • 21. What applies to Nagios... Applies to other sources too etc...

Notas del editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n