SlideShare una empresa de Scribd logo
1 de 61
Descargar para leer sin conexión
Winning the metrics battle (finally)
Winning the metrics battle
         (finally)
       Simon Hildrew           Nick Satterly
  Infrastructure Developer   Monitoring Engineer
        The Guardian           The Guardian
The metrics battlefield
Total metrics


                                180,000




                       50,000

1,400   2,800
http://www.flickr.com/photos/ghostsigns/6676069121



                                              5 minutes


                                                every 15
                                                 seconds

                                                           http://www.flickr.com/photos/millynet/134071210
developer dashboards
Physical screens   Screensaver hacks
20


15


10


 5


 0
dev


hack
business dashboards
metrics + dashboards = culture change
http://www.flickr.com/photos/chrisjames_taylor/5454315456
our approach
         Side project    ➡   Prioritise
Incremental upgrade      ➡   Understand the real problem
Use off the shelf tool   ➡   Question the tools
  Pragmatic solution     ➡   Be ambitious
      Done in a year     ➡   Keep learning
Prioritise
drowning in work




http://www.flickr.com/photos/iampeas/246738971
a dedicated monitoring and
     metrics engineer
Understand the
 real problem
Urgent issue -
current tool end of life
The story so far...
metrics were not helping us
 solve production outages
ballooning number of
     applications
but... difficult to instrument applications
T.T. Detect
                      +
T.T. Fix   =   T.T. Diagnose
                      +
                T.T. Resolve
inaccessible tools




             http://www.flickr.com/photos/kdashy/2678539087
inconsistent data



http://www.flickr.com/photos/sybrenstuvel/2468506922
hypothesising & arguing
 easier than measuring


               http://www.flickr.com/photos/nouqraz/200049988
The ‘right’ thing
• measure everything
• measure frequently
• measure each data point once
• input and output must be open
Question the tools
Brute force?




http://www.flickr.com/photos/epublicist/3546059144
The safe option?




http://www.flickr.com/photos/alicebartlett/2361209195
Unintuitive?




http://www.flickr.com/photos/merlijnhoek/2841785343
Imposing a flawed model?
http://www.flickr.com/photos/evansville/8953838/
Too difficult / no progress?
http://www.flickr.com/photos/ginja_andy/4165849136/
Nagios


•   the “IBM” of monitoring tools

•   compromise over quantity and frequency of checks

•   < insert your criticism of nagios here >
Zabbix


•   metric collection tightly coupled to monitoring tool

•   confusing UI with poor visualisation

•   needed brute force to make limited API work
The ‘right’ thing
• measure everything
• measure frequently
• measure each data point once
• input and output must be open
don’t compromise
Be ambitious
http://www.flickr.com/photos/mugley/2961131550




                                 Throw work away
Draw your dream
http://www.flickr.com/photos/sk8geek/7358702704




                             Get as far as you can
screens           users
                                            db?             alerting?


 Etsy dashboard
                                                         message queue




              graphite                                   SNMP?           syslog?



 FITB                     ganglia                 api?



network      hosts           applications
Develop missing pieces




              http://www.flickr.com/photos/kalexanderson/5969012589
screens           users
                                            mongodb                   alerta       elastic
                                                                                   search


 Etsy dashboard
                                                                 message queue



                                                                          syslog     SNMP
              graphite                          ganglia alerts
                                                                          alerts     alerts




 FITB                     ganglia                ganglia-api




network      hosts           applications
Guardian Management
https://github.com/guardian/guardian-management
Ganglia API
https://github.com/guardian/ganglia-api
rescale image???




                       Alerta
https://github.com/guardian/alerta
Current stack
• Ganglia             • Guardian management
                        https://github.com/guardian/guardian-management


• FITB                • Guardian ganglia-api
                        https://github.com/guardian/ganglia-api
• Graphite
                      • Guardian alerta
• Etsy dashboards       https://github.com/guardian/alerta
Keep learning
we are not there yet
Watch the cultural changes
detecting
diagnosis
diagnosis
performance testing
confirmation
#monitoringsucks
➡ Prioritise
➡ Understand the real problem
➡ Question the tools
➡ Be ambitious
➡ Keep learning
tools can change culture
Thank you
               http://github.com/guardian
                 http://gu.com/p/3ap5f
       Simon Hildrew                    Nick Satterly
            @sihil                      @nicksatterly
simon.hildrew@guardian.co.uk    nick.satterly@guardian.co.uk

Más contenido relacionado

Similar a Winning the metrics battle

Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidDatabricks
 
Crossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at ScaleCrossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at Scalejgoulah
 
Google Cloud: Next'19 Extended Hanoi
Google Cloud: Next'19 Extended HanoiGoogle Cloud: Next'19 Extended Hanoi
Google Cloud: Next'19 Extended HanoiGCPUserGroupVietnam
 
Move out from AppEngine, and Python PaaS alternatives
Move out from AppEngine, and Python PaaS alternativesMove out from AppEngine, and Python PaaS alternatives
Move out from AppEngine, and Python PaaS alternativestzang ms
 
Google Wave: Ripple or Tsunami for Research
Google Wave: Ripple or Tsunami for ResearchGoogle Wave: Ripple or Tsunami for Research
Google Wave: Ripple or Tsunami for ResearchCameron Neylon
 
Honeypots for Active Defense
Honeypots for Active DefenseHoneypots for Active Defense
Honeypots for Active DefenseGreg Foss
 
Hyperleger Fabric Workshop - Denver Blockchain Week
Hyperleger Fabric Workshop - Denver Blockchain WeekHyperleger Fabric Workshop - Denver Blockchain Week
Hyperleger Fabric Workshop - Denver Blockchain WeekHorea Porutiu
 
Performance - a challenging craft
Performance  - a challenging craftPerformance  - a challenging craft
Performance - a challenging craftFabian Lange
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Chun-Yu Tseng
 
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...PROIDEA
 
Blue team reboot - HackFest
Blue team reboot - HackFest Blue team reboot - HackFest
Blue team reboot - HackFest Haydn Johnson
 
How to fully automate a store.pptx
How to fully automate a store.pptxHow to fully automate a store.pptx
How to fully automate a store.pptxIgor Moiseev
 
Introduzione alle metodologie di sviluppo agile
Introduzione alle metodologie di sviluppo agileIntroduzione alle metodologie di sviluppo agile
Introduzione alle metodologie di sviluppo agileStefano Valle
 
Accessibility and web innovation. (no notes)
Accessibility and web innovation. (no notes)Accessibility and web innovation. (no notes)
Accessibility and web innovation. (no notes)Christian Heilmann
 
Using Blockchain to Increase Supply Chain Transparency
Using Blockchain to Increase Supply Chain TransparencyUsing Blockchain to Increase Supply Chain Transparency
Using Blockchain to Increase Supply Chain TransparencyHorea Porutiu
 
Adoption of AI: The Great Opportunities for Everyone
Adoption of AI: The Great Opportunities for EveryoneAdoption of AI: The Great Opportunities for Everyone
Adoption of AI: The Great Opportunities for EveryoneKan Ouivirach, Ph.D.
 
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018Andy Davies
 

Similar a Winning the metrics battle (20)

Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
Crossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at ScaleCrossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at Scale
 
Google Cloud: Next'19 Extended Hanoi
Google Cloud: Next'19 Extended HanoiGoogle Cloud: Next'19 Extended Hanoi
Google Cloud: Next'19 Extended Hanoi
 
Move out from AppEngine, and Python PaaS alternatives
Move out from AppEngine, and Python PaaS alternativesMove out from AppEngine, and Python PaaS alternatives
Move out from AppEngine, and Python PaaS alternatives
 
Google Wave: Ripple or Tsunami for Research
Google Wave: Ripple or Tsunami for ResearchGoogle Wave: Ripple or Tsunami for Research
Google Wave: Ripple or Tsunami for Research
 
Honeypots for Active Defense
Honeypots for Active DefenseHoneypots for Active Defense
Honeypots for Active Defense
 
Hyperleger Fabric Workshop - Denver Blockchain Week
Hyperleger Fabric Workshop - Denver Blockchain WeekHyperleger Fabric Workshop - Denver Blockchain Week
Hyperleger Fabric Workshop - Denver Blockchain Week
 
Performance - a challenging craft
Performance  - a challenging craftPerformance  - a challenging craft
Performance - a challenging craft
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
Ultimate Git Workflow - Seoul 2015
Ultimate Git Workflow - Seoul 2015Ultimate Git Workflow - Seoul 2015
Ultimate Git Workflow - Seoul 2015
 
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
 
Blue team reboot - HackFest
Blue team reboot - HackFest Blue team reboot - HackFest
Blue team reboot - HackFest
 
How to fully automate a store.pptx
How to fully automate a store.pptxHow to fully automate a store.pptx
How to fully automate a store.pptx
 
Introduzione alle metodologie di sviluppo agile
Introduzione alle metodologie di sviluppo agileIntroduzione alle metodologie di sviluppo agile
Introduzione alle metodologie di sviluppo agile
 
Accessibility and web innovation. (no notes)
Accessibility and web innovation. (no notes)Accessibility and web innovation. (no notes)
Accessibility and web innovation. (no notes)
 
Using Blockchain to Increase Supply Chain Transparency
Using Blockchain to Increase Supply Chain TransparencyUsing Blockchain to Increase Supply Chain Transparency
Using Blockchain to Increase Supply Chain Transparency
 
Adoption of AI: The Great Opportunities for Everyone
Adoption of AI: The Great Opportunities for EveryoneAdoption of AI: The Great Opportunities for Everyone
Adoption of AI: The Great Opportunities for Everyone
 
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018
 
Micro Frontends
Micro FrontendsMicro Frontends
Micro Frontends
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Winning the metrics battle