SlideShare una empresa de Scribd logo
1 de 57
Descargar para leer sin conexión
Monitoring Microservices &
Containers: A Challenge
Adrian Cockcroft @adrianco
Technology Fellow - Battery Ventures
May 2015
Monitoring
!
Update of my monitoring rules from Monitorama 2014
Rule #1: Spend more time working on code
that analyzes the meaning of metrics, than
code that collects, moves, stores and
displays metrics.
Rule #2: Metric to display latency needs to
be less than human attention span (~10s)
Rule #3: Validate that your measurement
system has enough accuracy and precision.
Collect histograms of response time.
Rule #4: Monitoring systems need to be
more available and scalable than the
systems being monitored.
Rule #5: Optimize for distributed,
ephemeral, cloud native, containerized
microservices.
Rule #6: Fit metrics to models to understand
relationships. (New rule)
Container
Instance
e.g. Machine
failure affects
all instances
and containers
inside itZone/DC
Region
Microservice
Model Infrastructure as a
Containment Hierarchy
Machine
Many tools use a naming scheme to imply this model, but
most can’t reason about the relationships
Request
Model Applications and Networks
as a Dataflow Graph
APM Tools often model these as business transactions
Microservice Zone/DC
Region
Developer Developer
Model Deployment Ownership
and Support
Developer Developer
Developer Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Developer Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Monitoring
Tools
DeveloperDeveloper Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Monitoring
Tools
DeveloperDeveloper Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Site
Reliability
Monitoring
Tools
Availability
Metrics
99.95% customer
success rate
DeveloperDeveloper Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Manager Manager
Site
Reliability
Monitoring
Tools
Availability
Metrics
99.95% customer
success rate
DeveloperDeveloper Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Manager Manager
VP
Engineering
Site
Reliability
Monitoring
Tools
Availability
Metrics
99.95% customer
success rate
Infrastructure, flow and ownership models
are orthogonal and need to be linked to
make sense of the metrics
Monitoring Rules by @adrianco
1. Spend more time on analysis than data collection and display
2. Reduce key business metric latency to less than 10s
3. Validate your measurement system, use histograms
4. Be more available and scalable than the services being monitored
5. Optimize for distributed, ephemeral cloud native applications
6. Fit metrics to models to understand relationships
Microservices
Microservices
@ideavist
A Microservice Definition
!
Loosely coupled service oriented
architecture with bounded contexts
A Microservice Definition
!
Loosely coupled service oriented
architecture with bounded contexts
If every service has to be
updated at the same time
it’s not loosely coupled
A Microservice Definition
!
Loosely coupled service oriented
architecture with bounded contexts
If every service has to be
updated at the same time
it’s not loosely coupled
If you have to know too much about surrounding
services you don’t have a bounded context. See the
Domain Driven Design book by Eric Evans.
Complexity
Monolithic apps have unlimited invisible
internal dependencies
!
Vastly more complex than explicit visible
microservice dependencies
Speed
Speeding Up Deployments
Datacenter Snowflakes
• Deploy in months
• Live for years
Speeding Up Deployments
Datacenter Snowflakes
• Deploy in months
• Live for years
Virtualized and Cloud
• Deploy in minutes
• Live for weeks
Speeding Up Deployments
Datacenter Snowflakes
• Deploy in months
• Live for years
Virtualized and Cloud
• Deploy in minutes
• Live for weeks
Container Deployments
• Deploy in seconds
• Live for minutes/hours
Speeding Up Deployments
Datacenter Snowflakes
• Deploy in months
• Live for years
Virtualized and Cloud
• Deploy in minutes
• Live for weeks
Container Deployments
• Deploy in seconds
• Live for minutes/hours
AWS Lambda Events
• Respond in milliseconds
• Live for seconds
Speeding Up Deployments
Measuring CPU usage once a minute makes no sense for containers…
Coping with rate of change is a big challenge for monitoring tools.
Datacenter Snowflakes
• Deploy in months
• Live for years
Virtualized and Cloud
• Deploy in minutes
• Live for weeks
Container Deployments
• Deploy in seconds
• Live for minutes/hours
AWS Lambda Events
• Respond in milliseconds
• Live for seconds
Scale
A Possible Hierarchy
Continents
Regions
Zones
Services
Versions
Containers
Instances
How Many?
3 to 5
2-4 per Continent
1-5 per Region
100’s per Zone
Many per Service
1000’s per Version
10,000’s
It’s much more challenging
than just a large number of
machines
Flow
Some tools can show
the request flow
across a few services
But interesting
architectures have a
lot of microservices!
Flow visualization is
a challenge.
See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
Failures
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash Data
Access Layer
Priam Cassandra
Datastore
Simple NetflixOSS
style microservices
architecture on three
AWS Availability Zones
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash Data
Access Layer
Priam Cassandra
Datastore
Simple NetflixOSS
style microservices
architecture on three
AWS Availability Zones
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash Data
Access Layer
Priam Cassandra
Datastore
Simple NetflixOSS
style microservices
architecture on three
AWS Availability Zones
Zone partition/failure
What should you do?
What should monitors show?
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash Data
Access Layer
Priam Cassandra
Datastore
Simple NetflixOSS
style microservices
architecture on three
AWS Availability Zones
Zone partition/failure
What should you do?
What should monitors show?
By design, everything works
with 2 of 3 zones running.
This is not an outage, inform
but don’t touch anything!
Halt deployments perhaps?
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash Data
Access Layer
Priam Cassandra
Datastore
Simple NetflixOSS
style microservices
architecture on three
AWS Availability Zones
Zone partition/failure
What should you do?
What should monitors show?
By design, everything works
with 2 of 3 zones running.
This is not an outage, inform
but don’t touch anything!
Halt deployments perhaps?
Challenge: understand and
communicate common
microservice failure patterns.
Testing
Testing monitoring tools at scale
gets expensive quickly…
Simulation
Simulated Microservices
Model and visualize microservices
Simulate interesting architectures
Generate large scale configurations
Eventually stress test real tools
!
See github.com/adrianco/spigo
Simulate Protocol Interactions in Go
Visualize with D3
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash
Data
Access
Layer
Priam Cassandra
Datastore
Three
Availability
Zones
netflixoss.go architecture
!!!!!!!!!asgard.Create(cname, asgard.PriamCassandraPkg, regions, priamCassandracount, "eureka", cname)
asgard.Create(tname, asgard.StaashPkg, regions, staashcount, cname)
asgard.Create(jname, asgard.KaryonPkg, regions, javacount, tname)
asgard.Create(nname, asgard.KaryonPkg, regions, nodecount, jname)
asgard.Create(zuname, asgard.ZuulPkg, regions, zuulcount, nname)
asgard.Create(elbname, asgard.ElbPkg, regions, 0, zuname)
asgard.Run(asgard.Create(dns, asgard.DenominatorPkg, 0, 0, elbname), jname) // victimize a javaweb
Tooling
New tier
name
Tier
package
Region
count: 1
Node
count
List of tier
dependencies
Run and log results to json
$ spigo -a netflixoss -d 10 -j
2015/05/21 00:05:32 netflixoss: scaling to 100%
2015/05/21 00:05:32 netflixoss.edda: starting
2015/05/21 00:05:32 netflixoss.us-east-1.zoneA.eureka.eureka.eureka0: starting
2015/05/21 00:05:32 netflixoss.us-east-1.zoneB.eureka.eureka.eureka1: starting
2015/05/21 00:05:32 netflixoss.us-east-1.zoneC.eureka.eureka.eureka2: starting
2015/05/21 00:05:32 netflixoss.*.*.www.denominator.www0 activity rate 10ms
2015/05/21 00:05:37 chaosmonkey delete: netflixoss.us-east-1.zoneC.javaweb.karyon.javaweb14
2015/05/21 00:05:42 asgard: Shutdown
2015/05/21 00:05:42 netflixoss.us-east-1.zoneB.eureka.eureka.eureka1: closing
2015/05/21 00:05:42 netflixoss.us-east-1.zoneA.eureka.eureka.eureka0: closing
2015/05/21 00:05:42 netflixoss.us-east-1.zoneC.eureka.eureka.eureka2: closing
2015/05/21 00:05:42 spigo: complete
2015/05/21 00:05:42 netflixoss.edda: closing
10 sec
run time
edda.go logs
config to json
eureka.go
service
registry per
zone
Chaos
monkey
victim!
Simianviz from json logs
http://simianviz.divshot.io/netflixoss/1
ELB splits
traffic over
zones in
single region
microservices
Cassandra
Cluster
Six regions
Big thanks to @kurtiskemple
Why Build Spigo?
Generate test microservice configurations at scale
Stress monitoring tools and simulated game day training
!
Eventually (i.e. not implemented yet)
Dynamically vary configuration: autoscale, code push
Chaos gorilla for zone, region failures and partitions
Websocket connection between spigo and simianviz display
!
My challenge to you:
Build your architecture in Spigo.
Stress monitoring tools with it.
Help fix monitoring for microservices!
!
@mgroeniger
Questions?
Disclosure: some of the companies mentioned may be Battery Ventures Portfolio Companies
See www.battery.com for a list of portfolio investments
● Microservices Challenges
● Speed and Scale
● Flow and Failures
● Testing and Simulation
!
● Battery Ventures http://www.battery.com
● Adrian’s Tweets @adrianco and Blog http://perfcap.blogspot.com
● Slideshare http://slideshare.com/adriancockcroft
● Github http://github.com/adrianco/spigo
What does @adrianco do?
@adrianco
Technology Due
Diligence on Deals
Presentations at
Conferences
Presentations at
Companies
Technical Advice
for Portfolio
Companies
Program
Committee for
Conferences
Networking with
Interesting PeopleTinkering with
Technologies
Maintain Deep
Relationship with
Cloud Vendors
| Battery Ventures
Portfolio Companies for Enterprise IT
Security
Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested.
Palo Alto Networks
Enterprise IT
Operations &
Management
Big DataCompute
Networking
Storage

Más contenido relacionado

La actualidad más candente

The elements of kubernetes
The elements of kubernetesThe elements of kubernetes
The elements of kubernetes
Aaron Schlesinger
 
Achieving Cost and Resource Efficiency through Docker, OpenShift and Kubernetes
Achieving Cost and Resource Efficiency through Docker, OpenShift and KubernetesAchieving Cost and Resource Efficiency through Docker, OpenShift and Kubernetes
Achieving Cost and Resource Efficiency through Docker, OpenShift and Kubernetes
Dean Delamont
 
DockerCon 18 Cool Hacks: solo.io
DockerCon 18 Cool Hacks:  solo.ioDockerCon 18 Cool Hacks:  solo.io
DockerCon 18 Cool Hacks: solo.io
Docker, Inc.
 

La actualidad más candente (20)

DockerCon 2017: Docker in China
DockerCon 2017: Docker in ChinaDockerCon 2017: Docker in China
DockerCon 2017: Docker in China
 
Docker Federal Summit 2017 General Session
Docker Federal Summit 2017 General SessionDocker Federal Summit 2017 General Session
Docker Federal Summit 2017 General Session
 
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...
 
Driving Digital Transformation With Containers And Kubernetes Complete Deck
Driving Digital Transformation With Containers And Kubernetes Complete DeckDriving Digital Transformation With Containers And Kubernetes Complete Deck
Driving Digital Transformation With Containers And Kubernetes Complete Deck
 
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
 
The elements of kubernetes
The elements of kubernetesThe elements of kubernetes
The elements of kubernetes
 
Docker and Devops
Docker and DevopsDocker and Devops
Docker and Devops
 
2015 DockeCon monitoring presentation
2015 DockeCon monitoring presentation2015 DockeCon monitoring presentation
2015 DockeCon monitoring presentation
 
Why cloud native matters
Why cloud native mattersWhy cloud native matters
Why cloud native matters
 
Achieving Cost and Resource Efficiency through Docker, OpenShift and Kubernetes
Achieving Cost and Resource Efficiency through Docker, OpenShift and KubernetesAchieving Cost and Resource Efficiency through Docker, OpenShift and Kubernetes
Achieving Cost and Resource Efficiency through Docker, OpenShift and Kubernetes
 
Empower Your Docker Containers with Watson - DockerCon 2017 Austin
Empower Your Docker Containers with Watson - DockerCon 2017 AustinEmpower Your Docker Containers with Watson - DockerCon 2017 Austin
Empower Your Docker Containers with Watson - DockerCon 2017 Austin
 
CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018
 
Making Friendly Microservices by Michele Titlol
Making Friendly Microservices by Michele TitlolMaking Friendly Microservices by Michele Titlol
Making Friendly Microservices by Michele Titlol
 
DCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to MicroservicesDCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to Microservices
 
DockerCon 2017 - General Session Day 2 - Ben Golub
DockerCon 2017 - General Session Day 2 - Ben GolubDockerCon 2017 - General Session Day 2 - Ben Golub
DockerCon 2017 - General Session Day 2 - Ben Golub
 
Containerized Cloud Computing - Redhat
Containerized Cloud Computing - RedhatContainerized Cloud Computing - Redhat
Containerized Cloud Computing - Redhat
 
What's New in Docker
What's New in DockerWhat's New in Docker
What's New in Docker
 
DockerCon 18 Cool Hacks: solo.io
DockerCon 18 Cool Hacks:  solo.ioDockerCon 18 Cool Hacks:  solo.io
DockerCon 18 Cool Hacks: solo.io
 
Cloud Native Development
Cloud Native DevelopmentCloud Native Development
Cloud Native Development
 
Clocker, Calico and Docker
Clocker, Calico and DockerClocker, Calico and Docker
Clocker, Calico and Docker
 

Destacado

Service tax
Service taxService tax
Service tax
has10nas
 
Intro to radiography 1_2(NDT)
Intro to radiography 1_2(NDT)Intro to radiography 1_2(NDT)
Intro to radiography 1_2(NDT)
Ravi Shekhar
 
Book review the alchemist
Book review  the alchemistBook review  the alchemist
Book review the alchemist
Rohit Patel
 

Destacado (20)

Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
UXSpeakeasy - How To Get A Great UX Job
UXSpeakeasy - How To Get A Great UX JobUXSpeakeasy - How To Get A Great UX Job
UXSpeakeasy - How To Get A Great UX Job
 
Schedule Review
Schedule ReviewSchedule Review
Schedule Review
 
Service tax
Service taxService tax
Service tax
 
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...
 
Web Trends to Watch in 2014
Web Trends to Watch in 2014Web Trends to Watch in 2014
Web Trends to Watch in 2014
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Intro to radiography 1_2(NDT)
Intro to radiography 1_2(NDT)Intro to radiography 1_2(NDT)
Intro to radiography 1_2(NDT)
 
Henry murray
Henry murrayHenry murray
Henry murray
 
Gene transfer technologies
Gene transfer technologiesGene transfer technologies
Gene transfer technologies
 
New forever clean 9 booklet
New forever clean 9 bookletNew forever clean 9 booklet
New forever clean 9 booklet
 
Engaging Learners with Technology
Engaging Learners with TechnologyEngaging Learners with Technology
Engaging Learners with Technology
 
SDH/SONET alarms & performance monitoring
SDH/SONET alarms & performance monitoringSDH/SONET alarms & performance monitoring
SDH/SONET alarms & performance monitoring
 
Book review the alchemist
Book review  the alchemistBook review  the alchemist
Book review the alchemist
 
The Philippine Civil Service Commission
The Philippine Civil Service CommissionThe Philippine Civil Service Commission
The Philippine Civil Service Commission
 
Human Resource planning
Human Resource planningHuman Resource planning
Human Resource planning
 
Meningitis And Encephalitis
Meningitis And EncephalitisMeningitis And Encephalitis
Meningitis And Encephalitis
 
Learning c - An extensive guide to learn the C Language
Learning c - An extensive guide to learn the C LanguageLearning c - An extensive guide to learn the C Language
Learning c - An extensive guide to learn the C Language
 
La casbah d'Alger
La casbah d'AlgerLa casbah d'Alger
La casbah d'Alger
 

Similar a Gluecon Monitoring Microservices and Containers: A Challenge

Similar a Gluecon Monitoring Microservices and Containers: A Challenge (20)

Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsBattery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
 
Software Architecture Conference - Monitoring Microservices - A Challenge
Software Architecture Conference -  Monitoring Microservices - A ChallengeSoftware Architecture Conference -  Monitoring Microservices - A Challenge
Software Architecture Conference - Monitoring Microservices - A Challenge
 
The Future of Cloud Innovation, featuring Adrian Cockcroft
The Future of Cloud Innovation, featuring Adrian CockcroftThe Future of Cloud Innovation, featuring Adrian Cockcroft
The Future of Cloud Innovation, featuring Adrian Cockcroft
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Microservices Architecture, Monolith Migration Patterns
Microservices Architecture, Monolith Migration PatternsMicroservices Architecture, Monolith Migration Patterns
Microservices Architecture, Monolith Migration Patterns
 
Evolution of Microservices - Craft Conference
Evolution of Microservices - Craft ConferenceEvolution of Microservices - Craft Conference
Evolution of Microservices - Craft Conference
 
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
 
Internet Scale Architecture
Internet Scale ArchitectureInternet Scale Architecture
Internet Scale Architecture
 
Breaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersBreaking the Monolith Road to Containers
Breaking the Monolith Road to Containers
 
stackconf 2023 | Infrastructure-From-Code and the end of Microservices by Ala...
stackconf 2023 | Infrastructure-From-Code and the end of Microservices by Ala...stackconf 2023 | Infrastructure-From-Code and the end of Microservices by Ala...
stackconf 2023 | Infrastructure-From-Code and the end of Microservices by Ala...
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservices
 
Microservices: State of the Union
Microservices: State of the UnionMicroservices: State of the Union
Microservices: State of the Union
 
Microservice Pattern Launguage
Microservice Pattern LaunguageMicroservice Pattern Launguage
Microservice Pattern Launguage
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolithStay productive while slicing up the monolith
Stay productive while slicing up the monolith
 
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
 
Microservices: Living Large in Your Castle Made of Sand
Microservices: Living Large in Your Castle Made of SandMicroservices: Living Large in Your Castle Made of Sand
Microservices: Living Large in Your Castle Made of Sand
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolithStay productive while slicing up the monolith
Stay productive while slicing up the monolith
 
Cloud computing What Why How
Cloud computing What Why HowCloud computing What Why How
Cloud computing What Why How
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SRE
 

Más de Adrian Cockcroft

Más de Adrian Cockcroft (20)

Gophercon 2016 Communicating Sequential Goroutines
Gophercon 2016 Communicating Sequential GoroutinesGophercon 2016 Communicating Sequential Goroutines
Gophercon 2016 Communicating Sequential Goroutines
 
Monitoring Challenges - Monitorama 2016 - Monitoringless
Monitoring Challenges - Monitorama 2016 - MonitoringlessMonitoring Challenges - Monitorama 2016 - Monitoringless
Monitoring Challenges - Monitorama 2016 - Monitoringless
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Microservices Workshop - Craft Conference
Microservices Workshop - Craft ConferenceMicroservices Workshop - Craft Conference
Microservices Workshop - Craft Conference
 
Microservices: What's Missing - O'Reilly Software Architecture New York
Microservices: What's Missing - O'Reilly Software Architecture New YorkMicroservices: What's Missing - O'Reilly Software Architecture New York
Microservices: What's Missing - O'Reilly Software Architecture New York
 
What's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at CiscoWhat's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at Cisco
 
In Search of Segmentation
In Search of SegmentationIn Search of Segmentation
In Search of Segmentation
 
Microxchg Analyzing Response Time Distributions for Microservices
Microxchg Analyzing Response Time Distributions for MicroservicesMicroxchg Analyzing Response Time Distributions for Microservices
Microxchg Analyzing Response Time Distributions for Microservices
 
Innovation and Architecture
Innovation and ArchitectureInnovation and Architecture
Innovation and Architecture
 
Cloud Trends Nov2015 Structure
Cloud Trends Nov2015 StructureCloud Trends Nov2015 Structure
Cloud Trends Nov2015 Structure
 
Openstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock InOpenstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock In
 
When Developers Operate and Operators Develop
When Developers Operate and Operators DevelopWhen Developers Operate and Operators Develop
When Developers Operate and Operators Develop
 
Dockercon 2015 - Faster Cheaper Safer
Dockercon 2015 - Faster Cheaper SaferDockercon 2015 - Faster Cheaper Safer
Dockercon 2015 - Faster Cheaper Safer
 
Microservices the Good Bad and the Ugly
Microservices the Good Bad and the UglyMicroservices the Good Bad and the Ugly
Microservices the Good Bad and the Ugly
 
Microxchg Microservices
Microxchg MicroservicesMicroxchg Microservices
Microxchg Microservices
 
Cloud Native Cost Optimization UCC
Cloud Native Cost Optimization UCCCloud Native Cost Optimization UCC
Cloud Native Cost Optimization UCC
 
Dockercon State of the Art in Microservices
Dockercon State of the Art in MicroservicesDockercon State of the Art in Microservices
Dockercon State of the Art in Microservices
 
Goto Berlin - Migrating to Microservices (Fast Delivery)
Goto Berlin - Migrating to Microservices (Fast Delivery)Goto Berlin - Migrating to Microservices (Fast Delivery)
Goto Berlin - Migrating to Microservices (Fast Delivery)
 
Cloud Native Cost Optimization
Cloud Native Cost OptimizationCloud Native Cost Optimization
Cloud Native Cost Optimization
 
Fast Delivery DevOps Israel
Fast Delivery DevOps IsraelFast Delivery DevOps Israel
Fast Delivery DevOps Israel
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Gluecon Monitoring Microservices and Containers: A Challenge

  • 1. Monitoring Microservices & Containers: A Challenge Adrian Cockcroft @adrianco Technology Fellow - Battery Ventures May 2015
  • 2. Monitoring ! Update of my monitoring rules from Monitorama 2014
  • 3. Rule #1: Spend more time working on code that analyzes the meaning of metrics, than code that collects, moves, stores and displays metrics.
  • 4. Rule #2: Metric to display latency needs to be less than human attention span (~10s)
  • 5. Rule #3: Validate that your measurement system has enough accuracy and precision. Collect histograms of response time.
  • 6. Rule #4: Monitoring systems need to be more available and scalable than the systems being monitored.
  • 7. Rule #5: Optimize for distributed, ephemeral, cloud native, containerized microservices.
  • 8. Rule #6: Fit metrics to models to understand relationships. (New rule)
  • 9.
  • 10. Container Instance e.g. Machine failure affects all instances and containers inside itZone/DC Region Microservice Model Infrastructure as a Containment Hierarchy Machine Many tools use a naming scheme to imply this model, but most can’t reason about the relationships
  • 11.
  • 12. Request Model Applications and Networks as a Dataflow Graph APM Tools often model these as business transactions Microservice Zone/DC Region
  • 13. Developer Developer Model Deployment Ownership and Support Developer Developer
  • 14. Developer Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer
  • 15. Developer Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Monitoring Tools
  • 16. DeveloperDeveloper Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Monitoring Tools
  • 17. DeveloperDeveloper Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Site Reliability Monitoring Tools Availability Metrics 99.95% customer success rate
  • 18. DeveloperDeveloper Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Manager Manager Site Reliability Monitoring Tools Availability Metrics 99.95% customer success rate
  • 19. DeveloperDeveloper Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Manager Manager VP Engineering Site Reliability Monitoring Tools Availability Metrics 99.95% customer success rate
  • 20. Infrastructure, flow and ownership models are orthogonal and need to be linked to make sense of the metrics
  • 21. Monitoring Rules by @adrianco 1. Spend more time on analysis than data collection and display 2. Reduce key business metric latency to less than 10s 3. Validate your measurement system, use histograms 4. Be more available and scalable than the services being monitored 5. Optimize for distributed, ephemeral cloud native applications 6. Fit metrics to models to understand relationships
  • 24. A Microservice Definition ! Loosely coupled service oriented architecture with bounded contexts
  • 25. A Microservice Definition ! Loosely coupled service oriented architecture with bounded contexts If every service has to be updated at the same time it’s not loosely coupled
  • 26. A Microservice Definition ! Loosely coupled service oriented architecture with bounded contexts If every service has to be updated at the same time it’s not loosely coupled If you have to know too much about surrounding services you don’t have a bounded context. See the Domain Driven Design book by Eric Evans.
  • 28. Monolithic apps have unlimited invisible internal dependencies ! Vastly more complex than explicit visible microservice dependencies
  • 29. Speed
  • 30. Speeding Up Deployments Datacenter Snowflakes • Deploy in months • Live for years
  • 31. Speeding Up Deployments Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks
  • 32. Speeding Up Deployments Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours
  • 33. Speeding Up Deployments Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours AWS Lambda Events • Respond in milliseconds • Live for seconds
  • 34. Speeding Up Deployments Measuring CPU usage once a minute makes no sense for containers… Coping with rate of change is a big challenge for monitoring tools. Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours AWS Lambda Events • Respond in milliseconds • Live for seconds
  • 35. Scale
  • 36. A Possible Hierarchy Continents Regions Zones Services Versions Containers Instances How Many? 3 to 5 2-4 per Continent 1-5 per Region 100’s per Zone Many per Service 1000’s per Version 10,000’s It’s much more challenging than just a large number of machines
  • 37. Flow
  • 38. Some tools can show the request flow across a few services
  • 39. But interesting architectures have a lot of microservices! Flow visualization is a challenge. See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
  • 41. ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Simple NetflixOSS style microservices architecture on three AWS Availability Zones
  • 42. ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Simple NetflixOSS style microservices architecture on three AWS Availability Zones
  • 43. ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Simple NetflixOSS style microservices architecture on three AWS Availability Zones Zone partition/failure What should you do? What should monitors show?
  • 44. ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Simple NetflixOSS style microservices architecture on three AWS Availability Zones Zone partition/failure What should you do? What should monitors show? By design, everything works with 2 of 3 zones running. This is not an outage, inform but don’t touch anything! Halt deployments perhaps?
  • 45. ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Simple NetflixOSS style microservices architecture on three AWS Availability Zones Zone partition/failure What should you do? What should monitors show? By design, everything works with 2 of 3 zones running. This is not an outage, inform but don’t touch anything! Halt deployments perhaps? Challenge: understand and communicate common microservice failure patterns.
  • 47. Testing monitoring tools at scale gets expensive quickly…
  • 49. Simulated Microservices Model and visualize microservices Simulate interesting architectures Generate large scale configurations Eventually stress test real tools ! See github.com/adrianco/spigo Simulate Protocol Interactions in Go Visualize with D3 ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Three Availability Zones
  • 50. netflixoss.go architecture !!!!!!!!!asgard.Create(cname, asgard.PriamCassandraPkg, regions, priamCassandracount, "eureka", cname) asgard.Create(tname, asgard.StaashPkg, regions, staashcount, cname) asgard.Create(jname, asgard.KaryonPkg, regions, javacount, tname) asgard.Create(nname, asgard.KaryonPkg, regions, nodecount, jname) asgard.Create(zuname, asgard.ZuulPkg, regions, zuulcount, nname) asgard.Create(elbname, asgard.ElbPkg, regions, 0, zuname) asgard.Run(asgard.Create(dns, asgard.DenominatorPkg, 0, 0, elbname), jname) // victimize a javaweb Tooling New tier name Tier package Region count: 1 Node count List of tier dependencies
  • 51. Run and log results to json $ spigo -a netflixoss -d 10 -j 2015/05/21 00:05:32 netflixoss: scaling to 100% 2015/05/21 00:05:32 netflixoss.edda: starting 2015/05/21 00:05:32 netflixoss.us-east-1.zoneA.eureka.eureka.eureka0: starting 2015/05/21 00:05:32 netflixoss.us-east-1.zoneB.eureka.eureka.eureka1: starting 2015/05/21 00:05:32 netflixoss.us-east-1.zoneC.eureka.eureka.eureka2: starting 2015/05/21 00:05:32 netflixoss.*.*.www.denominator.www0 activity rate 10ms 2015/05/21 00:05:37 chaosmonkey delete: netflixoss.us-east-1.zoneC.javaweb.karyon.javaweb14 2015/05/21 00:05:42 asgard: Shutdown 2015/05/21 00:05:42 netflixoss.us-east-1.zoneB.eureka.eureka.eureka1: closing 2015/05/21 00:05:42 netflixoss.us-east-1.zoneA.eureka.eureka.eureka0: closing 2015/05/21 00:05:42 netflixoss.us-east-1.zoneC.eureka.eureka.eureka2: closing 2015/05/21 00:05:42 spigo: complete 2015/05/21 00:05:42 netflixoss.edda: closing 10 sec run time edda.go logs config to json eureka.go service registry per zone Chaos monkey victim!
  • 52. Simianviz from json logs http://simianviz.divshot.io/netflixoss/1 ELB splits traffic over zones in single region microservices Cassandra Cluster Six regions Big thanks to @kurtiskemple
  • 53. Why Build Spigo? Generate test microservice configurations at scale Stress monitoring tools and simulated game day training ! Eventually (i.e. not implemented yet) Dynamically vary configuration: autoscale, code push Chaos gorilla for zone, region failures and partitions Websocket connection between spigo and simianviz display !
  • 54. My challenge to you: Build your architecture in Spigo. Stress monitoring tools with it. Help fix monitoring for microservices! ! @mgroeniger
  • 55. Questions? Disclosure: some of the companies mentioned may be Battery Ventures Portfolio Companies See www.battery.com for a list of portfolio investments ● Microservices Challenges ● Speed and Scale ● Flow and Failures ● Testing and Simulation ! ● Battery Ventures http://www.battery.com ● Adrian’s Tweets @adrianco and Blog http://perfcap.blogspot.com ● Slideshare http://slideshare.com/adriancockcroft ● Github http://github.com/adrianco/spigo
  • 56. What does @adrianco do? @adrianco Technology Due Diligence on Deals Presentations at Conferences Presentations at Companies Technical Advice for Portfolio Companies Program Committee for Conferences Networking with Interesting PeopleTinkering with Technologies Maintain Deep Relationship with Cloud Vendors
  • 57. | Battery Ventures Portfolio Companies for Enterprise IT Security Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested. Palo Alto Networks Enterprise IT Operations & Management Big DataCompute Networking Storage