Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Monitoring Is Never Done

3.085 visualizaciones

Publicado el

Our monitoring team works in a cycle of 4 phases: Definition, Collection, Visualization and Action. We've found it effective to be clear about what phase we are in to help communicate our needs as well as our progress. This talk was presented as a lightning talk at Monitorama 2015 by Melanie Cey

Publicado en: Tecnología
  • Sé el primero en comentar

Monitoring Is Never Done

  1. 1. Monitoring is Never “Done” @melaniemj
  2. 2. Responsibilities @ Yardi Implementation and administration of monitoring, alerting, and log aggregation/analysis tools. o 15,000+ Devices o 9 Datacenters o 5000+ Customer Installations o We monitor windows envs with linux envs
  3. 3. This was me in 2008 @ Point2
  4. 4. How code is delivered
  5. 5. How code operates in production
  6. 6. A good problem to have Everyone wants “the monitoring” so they can say “it’s monitored”
  7. 7. Communicating Work o Classify o Quantify o Qualify
  8. 8. Words.... o Logging o Alerting o Dashboards o Reports o 4-9s o 24x7x365 this shit can’t go down
  9. 9. Can it be this simple? Let’s talk about “the monitoring” for X Be awesome X is monitored
  10. 10. DCVA (OODA)
  11. 11. 1. Definition I can hit this one page so it’s up right? No thanks, let’s redefine status
  12. 12. 1. Definition o What questions are you trying to answer? o What information do you need when a failure occurs? o What are the most common failures? o Who is the audience for the information?
  13. 13. 2. Checks & Collections o Environment & Code o Data points o Detailed logs o Current state
  14. 14. 3. Visualization o Analysis o Dashboards o Correlations
  15. 15. 4. Action o Fault detection o Alerting o RCA
  16. 16. Cycle (What to collect) (Inform on failure) (How to collect) (Make collections pretty)
  17. 17. Team Time Distribution
  18. 18. Time Distribution (Desired)
  19. 19. Is “X” monitored? When “X” goes into some degraded state o The right people know. o They have enough information to find the problem, recover, and later to do RCA. o If they don’t they will revisit definition.
  20. 20. How does your team o Classify o Quantify o Qualify
  21. 21. Monitoring is Never “Done” Melanie Cey @melaniemj Senior Systems Analyst Systems Reliability Engineering @ Yardi