SlideShare una empresa de Scribd logo
1 de 44
Please, no More Minutes, Milliseconds,
Monoliths... or Monitoring Tools!
Adrian Cockcroft @adrianco #Monitorama May 2014
2 | Battery Ventures
3 | Battery Ventures
Enterprise IT Adoption of Cloud
By Simon Wardley http://enterpriseitadoption.com/
You Are
Here
4 | Battery Ventures
Why am I at Monitorama?
5 | Battery Ventures
Twenty Years of Free and Open Source Monitoring
● 1994 The “SE Toolkit” and virtual_adrian.se
● 1998 Sun Performance Tuning, Java & The Internet Book
● 1999 Resource Management Sun Blueprint Book
● 2000 Capacity Planning for Web Services Sun Blueprint Book
● 2007 A. A. Michelson Award for Outstanding Contribution to
Computer Metrics, by the Computer Measurement Group
● 2004-2008 Capacity Planning with Free Tools Workshop at CMG
● 2014 Monitorama!
6 | Battery Ventures
State of the Art for Free Tools in 2008
http://www.slideshare.net/adrianco/capacity-planning-with-free-tools
7 | Battery Ventures
History Lesson
http://sourceforge.net/projects/setoolkit/
SE is a C interpreter with built-in access to all Solaris metric data sources
8 | Battery Ventures
Topics for Today
Minutes
Monoliths
Milliseconds
Monitoring tools
Challenges for monitoring
Continuous delivery & microservices
Analysis and closed loop control systems
Tools for developers who operate code in production
Challenges of dynamic, ephemeral, distributed cloud applications
9 | Battery Ventures
No more monitoring tools?
10 | Battery Ventures
We have too many of them already…
What’s needed is more analysis tools.
11 | Battery Ventures
#Analysorama?
12 | Battery Ventures
Rule #1: Spend more time working on code
that analyzes the meaning of metrics, than
code that collects, moves, stores and
displays metrics.
13 | Battery Ventures
What’s wrong with minutes?
14 | Battery Ventures
What’s wrong with minutes?
Takes too long to see a problem
0
1
2
3
4
5
Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7
Metric Threshold
Something
broke at 2m20
40s of failure
didn’t trigger
1st high metric
seen at agent
on instance
1st high metric arrives at
monitoring system
1st high metric
processed
(maybe)
1st high metric
seen on graph
Three datapoints
on user graph so
looks bad at 8m00.
15 | Battery Ventures
Whoops! I didn’t mean that! Reverting…
Not cool if it takes 5 minutes to see it failed and 5 more to see a fix
No-one notices if it only takes 5 seconds to detect and 5 to see a fix
16 | Battery Ventures
Try that again by the second
More confidence more quickly
0
1
2
3
4
Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7
Threshold
ThresholdSomething
broke at 2m20
Measurable
in 1s
1st high metric
seen at agent
on instance
1st high metric arrives at
monitoring system
1st high metric
processed
1st high metric
seen on graph
Three datapoints
on user graph so
looks bad at 2m25.
17 | Battery Ventures
Continuous Delivery and DevOps Implications
●Changes are smaller but more frequent
●Individual changes more likely to be broken
●Changes likely to be deployed by developers
●Instant detection and rollback matters much
more
18 | Battery Ventures
SaaS Based Products Show What Can Be Done
www.vividcortex.com and www.boundary.com
Seeing Problems In Seconds
19 | Battery Ventures
NetflixOSS Hystrix / Turbine Circuit Breaker Monitoring
http://techblog.netflix.com/2012/12/hystrix-dashboard-and-turbine.html
Streaming metrics directly from front end services to a web browser
20 | Battery Ventures
Rule #2: Metric to display latency needs to
be less than human attention span (~10s)
21 | Battery Ventures
What’s Wrong With Milliseconds?
22 | Battery Ventures
A Millisecond is a Very Long Time!
● Some JVM based tools measure response times in ms
Network round trip within a datacenter/zone is less than 1ms
SSD access latency is usually less than 1ms
Cassandra (a Java app) response times can be less than 1ms
● Rounding Errors
Quantization loses too much information
Automated threshold warning “One is infinitely larger than zero”!
JVM does have nanosecond resolution times available
23 | Battery Ventures
Rule #3: Validate that your measurement
system has enough accuracy and precision.
Gauge Repeatability and Reproducibility matters, see
http://en.wikipedia.org/wiki/ANOVA_gauge_R%26R
24 | Battery Ventures
Monolithic Monitoring Systems
Simple to build and install, but problematic…
Services Being Monitored
Monolithic Monitoring System
Services Being Monitored
Distributed Collection Systems
Analysis / Display Aggregators
25 | Battery Ventures
Monolithic Monitoring Issues
● Scalability
Problems scaling data collection, analysis and reporting throughput
Limitations on number of distinct metrics that can be collected
Traffic storms can overload the system and take it down
● Availability
Monitoring system needs to stay up when everything else dies!
Downtime for upgrades is always inconvenient
Gaps in the metric history can trigger alarms and lose confidence
26 | Battery Ventures
In-Band, Out-of-Band, or Both?
In-band means deployed using same tools and infrastructure as your services
Dependencies lead to common mode failures that can leave you blind
Best option is both in-house in-band, and external SaaS
Services
Monitoring
System Monitoring
System
SaaS Based Monitoring
In-Band Monitoring
Very unlikely to have both fail at the same time
27 | Battery Ventures
Rule #4: Monitoring systems need to be
more available and scalable than the
systems being monitored.
28 | Battery Ventures
Continuous Delivery
29 | Battery Ventures
Issues with Continuous Delivery and Microservices
● High rate of change
Code pushes can cause floods of new instances and metrics
Short baseline for alert threshold analysis – everything looks unusual
● Ephemeral Configurations
Short lifetimes make it hard to aggregate historical views
Hand tweaked monitoring tools take too much work to keep running
● Microservices with complex calling patterns
End-to-end request flow measurements are very important
Request flow visualizations get overwhelmed
30 | Battery Ventures
Microservice Based Architectures
See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
From a Gilt Groupe Presentation
31 | Battery Ventures
“Death Star” Architecture Diagrams
As visualized by Appdynamics, Boundary.com and Twitter internal tools
Netflix Gilt Groupe (12 of 450) Twitter
32 | Battery Ventures
Closed Loop Control Systems
33 | Battery Ventures
Autoscaled Ephemeral Instances at Netflix (the old way)
● Largest services use autoscaled red/black code pushes
● Average lifetime of an instance is 36 hours
P
u
s
h
Autoscale Up
Autoscale Down
34 | Battery Ventures
Scryer - Predictive Auto-scaling at Netflix
See http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html
and http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html
More morning load
Sat/Sun high traffic
Lower load on Weds 24 Hours predicted traffic vs. actual
FFT based prediction driving AWS Autoscaler to plan minimum capacity
35 | Battery Ventures
Netflix Automatic Code Deployment Canary - Bad Signature
36 | Battery Ventures
Happy Canary Signature
37 | Battery Ventures
Monitoring Tools for Developers
● Most monitoring tools are built to be used by operations people
Focus on individual systems rather than applications
Focus on utilization rather than throughput and response time
Fiefdoms of sysadmin, network admin, storage admin, database admin…
Hard to integrate and extend
● Developer oriented monitoring tools
Application Performance Measurement (APM) and Analysis
Business transactions, response time, JVM internal metrics
Logging business metrics directly (NetflixOSS Servo, Yammer Metrics)
APIs for integration, data extraction, deep linking and embedding
http://techblog.netflix.com/2012/02/announcing-servo.html and http://metrics.codahale.com/
38 | Battery Ventures
Challenges of Dynamic, Ephemeral,
Distributed Cloud Applications
39 | Battery Ventures
Dynamic and Ephemeral Challenges
● Datacenter Assets
Arrive infrequently, disappear infrequently
Stick around for three years or so before they get retired
Have unique IP and Mac addresses
● Cloud Assets
Arrive in bursts – a Netflix code push creates over a hundred per minute
Stick around for a few hours before they get retired
Often re-use the IP and Mac address that was just vacated!
Use NetflixOSS Edda to record a full history of your configuration
http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html
40 | Battery Ventures
Cloud Native Architectures
41 | Battery Ventures
Traditional vs. Cloud Native Storage Architectures
Business
Logic
Database
Master
Fabric
Storage
Arrays
Database
Slave
Fabric
Storage
Arrays
Business
Logic
Cassandra
Zone A nodes
Cassandra
Zone B nodes
Cassandra
Zone C nodes
Cloud Object
Store Backups
42 | Battery Ventures
Distributed Cloud Applications Challenges
● Cloud provider data stores don’t have the usual monitoring hooks
e.g. no way to install an agent on AWS RDS MySQL, AWS DynamoDB
● Dependency on web services as well as code on instances
Integration of data sources like CloudWatch, measure use of S3 etc.
● Cloud applications span zones and regions
Monitoring tools need to span and aggregate zones and regions too!
● NoSQL data stores introduce new protocols and metrics
e.g. cross zone and cross regions replication traffic for Cassandra
43 | Battery Ventures
Monitoring “New Rules” by @adrianco
1. Spend more time on analysis than data collection and display
2. Reduce key business metric latency to less than 10s
3. Validate your measurement system precision and accuracy
4. Be more available and scalable than the services being monitored
5. Optimize for distributed, ephemeral cloud native applications
44 | Battery Ventures
Any Questions?
● Battery Ventures http://www.battery.com
● Adrian’s Blog http://perfcap.blogspot.com
● Slideshare http://slideshare.com/adriancockcroft
Appearances by @adrianco
● Migrating to Microservices – Qcon London - March 6th, 2014
● Monitorama Opening Keynote Portland OR - May 7th, 2014
● GOTO Chicago Opening Keynote May 20th, 2014
● DevOps Summit at Cloud Expo New York – June 10th, 2014
● Qcon New York – June 11th, 2014
● GOTO Copenhagen/Aarhus – Denmark – Oct 25th, 2014
Find me on LinkedIn or Twitter @adrianco

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Nutanix
NutanixNutanix
Nutanix
 
Experience the AI-Driven Enterprise
Experience the AI-Driven EnterpriseExperience the AI-Driven Enterprise
Experience the AI-Driven Enterprise
 
Cloud Deployment
Cloud DeploymentCloud Deployment
Cloud Deployment
 
Soluciones Dynatrace
Soluciones DynatraceSoluciones Dynatrace
Soluciones Dynatrace
 
Dynatrace
DynatraceDynatrace
Dynatrace
 
CloudStack Overview
CloudStack OverviewCloudStack Overview
CloudStack Overview
 
Campus_Network_Design_with_ArubaOS-CX_-_Leading_Practices
Campus_Network_Design_with_ArubaOS-CX_-_Leading_PracticesCampus_Network_Design_with_ArubaOS-CX_-_Leading_Practices
Campus_Network_Design_with_ArubaOS-CX_-_Leading_Practices
 
WebLogic Scripting Tool
WebLogic Scripting ToolWebLogic Scripting Tool
WebLogic Scripting Tool
 
Azure Security Overview
Azure Security OverviewAzure Security Overview
Azure Security Overview
 
Microservice Architecture
Microservice ArchitectureMicroservice Architecture
Microservice Architecture
 
Virtualization Architecture & KVM
Virtualization Architecture & KVMVirtualization Architecture & KVM
Virtualization Architecture & KVM
 
App Modernisation with Microsoft Azure
App Modernisation with Microsoft AzureApp Modernisation with Microsoft Azure
App Modernisation with Microsoft Azure
 
Open Service Broker APIとKubernetes Service Catalog #k8sjp
Open Service Broker APIとKubernetes Service Catalog #k8sjpOpen Service Broker APIとKubernetes Service Catalog #k8sjp
Open Service Broker APIとKubernetes Service Catalog #k8sjp
 
Microservices and SOA
Microservices and SOAMicroservices and SOA
Microservices and SOA
 
Circuit Breaker Pattern
Circuit Breaker PatternCircuit Breaker Pattern
Circuit Breaker Pattern
 
VMware Tanzu Kubernetes Connect
VMware Tanzu Kubernetes ConnectVMware Tanzu Kubernetes Connect
VMware Tanzu Kubernetes Connect
 
Advanced Concepts of Cloud Computing
Advanced Concepts of Cloud ComputingAdvanced Concepts of Cloud Computing
Advanced Concepts of Cloud Computing
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Building and Successfully Selling ISV Solutions with AWS Partner-Summit-Singa...
Building and Successfully Selling ISV Solutions with AWS Partner-Summit-Singa...Building and Successfully Selling ISV Solutions with AWS Partner-Summit-Singa...
Building and Successfully Selling ISV Solutions with AWS Partner-Summit-Singa...
 
DevOps Presentation.pptx
DevOps Presentation.pptxDevOps Presentation.pptx
DevOps Presentation.pptx
 

Destacado

Destacado (20)

Protei by Cesar Harada @ ENSCI Paris 20121010
Protei by Cesar Harada @ ENSCI Paris 20121010Protei by Cesar Harada @ ENSCI Paris 20121010
Protei by Cesar Harada @ ENSCI Paris 20121010
 
Full Stack Automation with Katello & The Foreman
Full Stack Automation with Katello & The ForemanFull Stack Automation with Katello & The Foreman
Full Stack Automation with Katello & The Foreman
 
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception data
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Enterprise Architecture Case in PHP (MUZIK Online)
Enterprise Architecture Case in PHP (MUZIK Online)Enterprise Architecture Case in PHP (MUZIK Online)
Enterprise Architecture Case in PHP (MUZIK Online)
 
Foreman in Your Data Center :OSDC 2015
Foreman in Your Data Center :OSDC 2015Foreman in Your Data Center :OSDC 2015
Foreman in Your Data Center :OSDC 2015
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
 
2015年GMOペパボ新卒エンジニア研修 Webオペレーション研修イントロダクション
2015年GMOペパボ新卒エンジニア研修 Webオペレーション研修イントロダクション2015年GMOペパボ新卒エンジニア研修 Webオペレーション研修イントロダクション
2015年GMOペパボ新卒エンジニア研修 Webオペレーション研修イントロダクション
 
Data Logging and Telemetry
Data Logging and TelemetryData Logging and Telemetry
Data Logging and Telemetry
 
Deep-Dive to Application Insights
Deep-Dive to Application Insights Deep-Dive to Application Insights
Deep-Dive to Application Insights
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
 
Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Maintaining the Front Door to Netflix : The Netflix API
Maintaining the Front Door to Netflix : The Netflix APIMaintaining the Front Door to Netflix : The Netflix API
Maintaining the Front Door to Netflix : The Netflix API
 
Goto Berlin - Migrating to Microservices (Fast Delivery)
Goto Berlin - Migrating to Microservices (Fast Delivery)Goto Berlin - Migrating to Microservices (Fast Delivery)
Goto Berlin - Migrating to Microservices (Fast Delivery)
 
Intel SoC as a Platform to Connect Sensor Data to AWS
Intel SoC as a Platform to Connect Sensor Data to AWSIntel SoC as a Platform to Connect Sensor Data to AWS
Intel SoC as a Platform to Connect Sensor Data to AWS
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 

Similar a Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring Tools

TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6
Sravanthi N
 
Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at Databricks
Matei Zaharia
 
Monitoring Virtualized Environments
Monitoring Virtualized EnvironmentsMonitoring Virtualized Environments
Monitoring Virtualized Environments
Ahmad Khalid Nasrat
 

Similar a Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring Tools (20)

TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6
 
Wavefront-by-VMware-April-2019
Wavefront-by-VMware-April-2019Wavefront-by-VMware-April-2019
Wavefront-by-VMware-April-2019
 
10 Steps to Architecting a Sustainable SCADA System
10 Steps to Architecting a Sustainable SCADA System10 Steps to Architecting a Sustainable SCADA System
10 Steps to Architecting a Sustainable SCADA System
 
10 Steps to Architecting a Sustainable SCADA System
10 Steps to Architecting a Sustainable SCADA System10 Steps to Architecting a Sustainable SCADA System
10 Steps to Architecting a Sustainable SCADA System
 
Visualizing Your Network Health - Know your Network
Visualizing Your Network Health - Know your NetworkVisualizing Your Network Health - Know your Network
Visualizing Your Network Health - Know your Network
 
Testing Applications—For the Cloud and in the Cloud
Testing Applications—For the Cloud and in the CloudTesting Applications—For the Cloud and in the Cloud
Testing Applications—For the Cloud and in the Cloud
 
The Business Case for Cloud Management - RightScale Compute 2013
The Business Case for Cloud Management - RightScale Compute 2013The Business Case for Cloud Management - RightScale Compute 2013
The Business Case for Cloud Management - RightScale Compute 2013
 
Scaling Your SaaS with Analytics-Driven Insights and Wavefront Integrations f...
Scaling Your SaaS with Analytics-Driven Insights and Wavefront Integrations f...Scaling Your SaaS with Analytics-Driven Insights and Wavefront Integrations f...
Scaling Your SaaS with Analytics-Driven Insights and Wavefront Integrations f...
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management Solutions
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A Challenge
 
Role of Connectivity - IoT - Cloud in Industry 4.0
Role of Connectivity - IoT - Cloud in Industry 4.0Role of Connectivity - IoT - Cloud in Industry 4.0
Role of Connectivity - IoT - Cloud in Industry 4.0
 
Cloud Native DevOps
Cloud Native DevOpsCloud Native DevOps
Cloud Native DevOps
 
CA Spectrum® Just Keeps Getting Better and Better
CA Spectrum® Just Keeps Getting Better and BetterCA Spectrum® Just Keeps Getting Better and Better
CA Spectrum® Just Keeps Getting Better and Better
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at Databricks
 
Monitoring Virtualized Environments
Monitoring Virtualized EnvironmentsMonitoring Virtualized Environments
Monitoring Virtualized Environments
 
Visualizing Your Network Health - Driving Visibility in Increasingly Complex...
Visualizing Your Network Health -  Driving Visibility in Increasingly Complex...Visualizing Your Network Health -  Driving Visibility in Increasingly Complex...
Visualizing Your Network Health - Driving Visibility in Increasingly Complex...
 
Getting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of ConceptsGetting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of Concepts
 
Wavefront presentation-May-2019
Wavefront presentation-May-2019Wavefront presentation-May-2019
Wavefront presentation-May-2019
 
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
 

Más de Adrian Cockcroft

Más de Adrian Cockcroft (20)

Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016
 
Gophercon 2016 Communicating Sequential Goroutines
Gophercon 2016 Communicating Sequential GoroutinesGophercon 2016 Communicating Sequential Goroutines
Gophercon 2016 Communicating Sequential Goroutines
 
Monitoring Challenges - Monitorama 2016 - Monitoringless
Monitoring Challenges - Monitorama 2016 - MonitoringlessMonitoring Challenges - Monitorama 2016 - Monitoringless
Monitoring Challenges - Monitorama 2016 - Monitoringless
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Microservices Workshop - Craft Conference
Microservices Workshop - Craft ConferenceMicroservices Workshop - Craft Conference
Microservices Workshop - Craft Conference
 
Evolution of Microservices - Craft Conference
Evolution of Microservices - Craft ConferenceEvolution of Microservices - Craft Conference
Evolution of Microservices - Craft Conference
 
Microservices: What's Missing - O'Reilly Software Architecture New York
Microservices: What's Missing - O'Reilly Software Architecture New YorkMicroservices: What's Missing - O'Reilly Software Architecture New York
Microservices: What's Missing - O'Reilly Software Architecture New York
 
What's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at CiscoWhat's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at Cisco
 
In Search of Segmentation
In Search of SegmentationIn Search of Segmentation
In Search of Segmentation
 
Microxchg Analyzing Response Time Distributions for Microservices
Microxchg Analyzing Response Time Distributions for MicroservicesMicroxchg Analyzing Response Time Distributions for Microservices
Microxchg Analyzing Response Time Distributions for Microservices
 
Innovation and Architecture
Innovation and ArchitectureInnovation and Architecture
Innovation and Architecture
 
Cloud Trends Nov2015 Structure
Cloud Trends Nov2015 StructureCloud Trends Nov2015 Structure
Cloud Trends Nov2015 Structure
 
Openstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock InOpenstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock In
 
When Developers Operate and Operators Develop
When Developers Operate and Operators DevelopWhen Developers Operate and Operators Develop
When Developers Operate and Operators Develop
 
Dockercon 2015 - Faster Cheaper Safer
Dockercon 2015 - Faster Cheaper SaferDockercon 2015 - Faster Cheaper Safer
Dockercon 2015 - Faster Cheaper Safer
 
Microservices the Good Bad and the Ugly
Microservices the Good Bad and the UglyMicroservices the Good Bad and the Ugly
Microservices the Good Bad and the Ugly
 
Software Architecture Conference - Monitoring Microservices - A Challenge
Software Architecture Conference -  Monitoring Microservices - A ChallengeSoftware Architecture Conference -  Monitoring Microservices - A Challenge
Software Architecture Conference - Monitoring Microservices - A Challenge
 
Microxchg Microservices
Microxchg MicroservicesMicroxchg Microservices
Microxchg Microservices
 
Cloud Native Cost Optimization UCC
Cloud Native Cost Optimization UCCCloud Native Cost Optimization UCC
Cloud Native Cost Optimization UCC
 
Dockercon State of the Art in Microservices
Dockercon State of the Art in MicroservicesDockercon State of the Art in Microservices
Dockercon State of the Art in Microservices
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring Tools

  • 1. Please, no More Minutes, Milliseconds, Monoliths... or Monitoring Tools! Adrian Cockcroft @adrianco #Monitorama May 2014
  • 2. 2 | Battery Ventures
  • 3. 3 | Battery Ventures Enterprise IT Adoption of Cloud By Simon Wardley http://enterpriseitadoption.com/ You Are Here
  • 4. 4 | Battery Ventures Why am I at Monitorama?
  • 5. 5 | Battery Ventures Twenty Years of Free and Open Source Monitoring ● 1994 The “SE Toolkit” and virtual_adrian.se ● 1998 Sun Performance Tuning, Java & The Internet Book ● 1999 Resource Management Sun Blueprint Book ● 2000 Capacity Planning for Web Services Sun Blueprint Book ● 2007 A. A. Michelson Award for Outstanding Contribution to Computer Metrics, by the Computer Measurement Group ● 2004-2008 Capacity Planning with Free Tools Workshop at CMG ● 2014 Monitorama!
  • 6. 6 | Battery Ventures State of the Art for Free Tools in 2008 http://www.slideshare.net/adrianco/capacity-planning-with-free-tools
  • 7. 7 | Battery Ventures History Lesson http://sourceforge.net/projects/setoolkit/ SE is a C interpreter with built-in access to all Solaris metric data sources
  • 8. 8 | Battery Ventures Topics for Today Minutes Monoliths Milliseconds Monitoring tools Challenges for monitoring Continuous delivery & microservices Analysis and closed loop control systems Tools for developers who operate code in production Challenges of dynamic, ephemeral, distributed cloud applications
  • 9. 9 | Battery Ventures No more monitoring tools?
  • 10. 10 | Battery Ventures We have too many of them already… What’s needed is more analysis tools.
  • 11. 11 | Battery Ventures #Analysorama?
  • 12. 12 | Battery Ventures Rule #1: Spend more time working on code that analyzes the meaning of metrics, than code that collects, moves, stores and displays metrics.
  • 13. 13 | Battery Ventures What’s wrong with minutes?
  • 14. 14 | Battery Ventures What’s wrong with minutes? Takes too long to see a problem 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Metric Threshold Something broke at 2m20 40s of failure didn’t trigger 1st high metric seen at agent on instance 1st high metric arrives at monitoring system 1st high metric processed (maybe) 1st high metric seen on graph Three datapoints on user graph so looks bad at 8m00.
  • 15. 15 | Battery Ventures Whoops! I didn’t mean that! Reverting… Not cool if it takes 5 minutes to see it failed and 5 more to see a fix No-one notices if it only takes 5 seconds to detect and 5 to see a fix
  • 16. 16 | Battery Ventures Try that again by the second More confidence more quickly 0 1 2 3 4 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Threshold ThresholdSomething broke at 2m20 Measurable in 1s 1st high metric seen at agent on instance 1st high metric arrives at monitoring system 1st high metric processed 1st high metric seen on graph Three datapoints on user graph so looks bad at 2m25.
  • 17. 17 | Battery Ventures Continuous Delivery and DevOps Implications ●Changes are smaller but more frequent ●Individual changes more likely to be broken ●Changes likely to be deployed by developers ●Instant detection and rollback matters much more
  • 18. 18 | Battery Ventures SaaS Based Products Show What Can Be Done www.vividcortex.com and www.boundary.com Seeing Problems In Seconds
  • 19. 19 | Battery Ventures NetflixOSS Hystrix / Turbine Circuit Breaker Monitoring http://techblog.netflix.com/2012/12/hystrix-dashboard-and-turbine.html Streaming metrics directly from front end services to a web browser
  • 20. 20 | Battery Ventures Rule #2: Metric to display latency needs to be less than human attention span (~10s)
  • 21. 21 | Battery Ventures What’s Wrong With Milliseconds?
  • 22. 22 | Battery Ventures A Millisecond is a Very Long Time! ● Some JVM based tools measure response times in ms Network round trip within a datacenter/zone is less than 1ms SSD access latency is usually less than 1ms Cassandra (a Java app) response times can be less than 1ms ● Rounding Errors Quantization loses too much information Automated threshold warning “One is infinitely larger than zero”! JVM does have nanosecond resolution times available
  • 23. 23 | Battery Ventures Rule #3: Validate that your measurement system has enough accuracy and precision. Gauge Repeatability and Reproducibility matters, see http://en.wikipedia.org/wiki/ANOVA_gauge_R%26R
  • 24. 24 | Battery Ventures Monolithic Monitoring Systems Simple to build and install, but problematic… Services Being Monitored Monolithic Monitoring System Services Being Monitored Distributed Collection Systems Analysis / Display Aggregators
  • 25. 25 | Battery Ventures Monolithic Monitoring Issues ● Scalability Problems scaling data collection, analysis and reporting throughput Limitations on number of distinct metrics that can be collected Traffic storms can overload the system and take it down ● Availability Monitoring system needs to stay up when everything else dies! Downtime for upgrades is always inconvenient Gaps in the metric history can trigger alarms and lose confidence
  • 26. 26 | Battery Ventures In-Band, Out-of-Band, or Both? In-band means deployed using same tools and infrastructure as your services Dependencies lead to common mode failures that can leave you blind Best option is both in-house in-band, and external SaaS Services Monitoring System Monitoring System SaaS Based Monitoring In-Band Monitoring Very unlikely to have both fail at the same time
  • 27. 27 | Battery Ventures Rule #4: Monitoring systems need to be more available and scalable than the systems being monitored.
  • 28. 28 | Battery Ventures Continuous Delivery
  • 29. 29 | Battery Ventures Issues with Continuous Delivery and Microservices ● High rate of change Code pushes can cause floods of new instances and metrics Short baseline for alert threshold analysis – everything looks unusual ● Ephemeral Configurations Short lifetimes make it hard to aggregate historical views Hand tweaked monitoring tools take too much work to keep running ● Microservices with complex calling patterns End-to-end request flow measurements are very important Request flow visualizations get overwhelmed
  • 30. 30 | Battery Ventures Microservice Based Architectures See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture From a Gilt Groupe Presentation
  • 31. 31 | Battery Ventures “Death Star” Architecture Diagrams As visualized by Appdynamics, Boundary.com and Twitter internal tools Netflix Gilt Groupe (12 of 450) Twitter
  • 32. 32 | Battery Ventures Closed Loop Control Systems
  • 33. 33 | Battery Ventures Autoscaled Ephemeral Instances at Netflix (the old way) ● Largest services use autoscaled red/black code pushes ● Average lifetime of an instance is 36 hours P u s h Autoscale Up Autoscale Down
  • 34. 34 | Battery Ventures Scryer - Predictive Auto-scaling at Netflix See http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html and http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html More morning load Sat/Sun high traffic Lower load on Weds 24 Hours predicted traffic vs. actual FFT based prediction driving AWS Autoscaler to plan minimum capacity
  • 35. 35 | Battery Ventures Netflix Automatic Code Deployment Canary - Bad Signature
  • 36. 36 | Battery Ventures Happy Canary Signature
  • 37. 37 | Battery Ventures Monitoring Tools for Developers ● Most monitoring tools are built to be used by operations people Focus on individual systems rather than applications Focus on utilization rather than throughput and response time Fiefdoms of sysadmin, network admin, storage admin, database admin… Hard to integrate and extend ● Developer oriented monitoring tools Application Performance Measurement (APM) and Analysis Business transactions, response time, JVM internal metrics Logging business metrics directly (NetflixOSS Servo, Yammer Metrics) APIs for integration, data extraction, deep linking and embedding http://techblog.netflix.com/2012/02/announcing-servo.html and http://metrics.codahale.com/
  • 38. 38 | Battery Ventures Challenges of Dynamic, Ephemeral, Distributed Cloud Applications
  • 39. 39 | Battery Ventures Dynamic and Ephemeral Challenges ● Datacenter Assets Arrive infrequently, disappear infrequently Stick around for three years or so before they get retired Have unique IP and Mac addresses ● Cloud Assets Arrive in bursts – a Netflix code push creates over a hundred per minute Stick around for a few hours before they get retired Often re-use the IP and Mac address that was just vacated! Use NetflixOSS Edda to record a full history of your configuration http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html
  • 40. 40 | Battery Ventures Cloud Native Architectures
  • 41. 41 | Battery Ventures Traditional vs. Cloud Native Storage Architectures Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays Business Logic Cassandra Zone A nodes Cassandra Zone B nodes Cassandra Zone C nodes Cloud Object Store Backups
  • 42. 42 | Battery Ventures Distributed Cloud Applications Challenges ● Cloud provider data stores don’t have the usual monitoring hooks e.g. no way to install an agent on AWS RDS MySQL, AWS DynamoDB ● Dependency on web services as well as code on instances Integration of data sources like CloudWatch, measure use of S3 etc. ● Cloud applications span zones and regions Monitoring tools need to span and aggregate zones and regions too! ● NoSQL data stores introduce new protocols and metrics e.g. cross zone and cross regions replication traffic for Cassandra
  • 43. 43 | Battery Ventures Monitoring “New Rules” by @adrianco 1. Spend more time on analysis than data collection and display 2. Reduce key business metric latency to less than 10s 3. Validate your measurement system precision and accuracy 4. Be more available and scalable than the services being monitored 5. Optimize for distributed, ephemeral cloud native applications
  • 44. 44 | Battery Ventures Any Questions? ● Battery Ventures http://www.battery.com ● Adrian’s Blog http://perfcap.blogspot.com ● Slideshare http://slideshare.com/adriancockcroft Appearances by @adrianco ● Migrating to Microservices – Qcon London - March 6th, 2014 ● Monitorama Opening Keynote Portland OR - May 7th, 2014 ● GOTO Chicago Opening Keynote May 20th, 2014 ● DevOps Summit at Cloud Expo New York – June 10th, 2014 ● Qcon New York – June 11th, 2014 ● GOTO Copenhagen/Aarhus – Denmark – Oct 25th, 2014 Find me on LinkedIn or Twitter @adrianco