SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
The RED Method
Patterns for instrumentation & monitoring.
Tom Wilkie
tom@grafana.com @tom_wilkie
github.com/tomwilkie
Introduction
Why does this matter?

The USE Method
Utilisation, Saturation, Errors

The RED Method
Requests Rate, Errors, Duration..

The Four Golden Signals
RED + Saturation
Introduction
The USE Method
For every resource, monitor:

• Utilisation: % time that the resource was busy

• Saturation: amount of work resource has to do, often queue length

• Errors: the count of error events
http://www.brendangregg.com/usemethod.html
Utilisation Saturation Errors
CPU ✔ ✔ ✔
Memory ✔ ✔ ✔
Disk ✔ ✔ ✔
Network ✔ ✔ ✖
CPU USE in Prometheus
CPU Utilisation:
1 - avg(rate(node_cpu{job="default/node-exporter",mode="idle"}[1m]))
CPU Saturation:
sum(node_load1{job="default/node-exporter"})
/
sum(node:node_num_cpu:sum)
Memory USE in Prometheus
Memory Utilisation:
1 - sum(
node_memory_MemFree{job=“…”} +
node_memory_Cached{job=“…”} +
node_memory_Buffers{job=“…”}
)
/ sum(node_memory_MemTotal{job=“…”})
Memory Saturation:
1e3 * sum(
rate(node_vmstat_pgpgin{job=“…”}[1m]) +
rate(node_vmstat_pgpgout{job=“…”}[1m]))
)
Interesting / Hard Cases
• CPU Errors, Memory Errors

• Hard Disk Errors!

• Disk Capacity vs Disk IO

• Network Utilisation

• Interconnects
Demo
Time
More Details
• “The USE Method” - Brendan Gregg

• Kubernetes Mixin - https://github.com/grafana/jsonnet-libs
The RED Method
For every service, monitor request:

• Rate - number of requests per second

• Errors - the number of those requests that are failing

• Duration - the amount of time those requests take
The RED Method
Prometheus Implementation
import (
“github.com/prometheus/client_golang/prometheus"
)
var requestDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{
Name: "request_duration_seconds",
Help: "Time (in seconds) spent serving HTTP requests.",
Buckets: prometheus.DefBuckets,
}, []string{"method", "route", "status_code"})
func init() {
prometheus.MustRegister(requestDuration)
}
func wrap(h http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
m := httpsnoop.CaptureMetrics(h, w, r)
requestDuration.WithLabelValues(r.Method, r.URL.Path,
strconv.Itoa(m.Code)).Observe(m.Duration.Seconds())
})
}
func server(addr string) {
http.Handle("/metrics", prometheus.Handler())
http.Handle("/greeter", wrap(http.HandlerFunc(func(w
http.ResponseWriter, r *http.Request) {
...
}))
}
Prometheus Implementation
Easy to query
Rate:
sum(rate(request_duration_seconds_count{job=“…”}[1m]))
Errors:
sum(rate(request_duration_seconds_count{job=“…”, 
status_code!~”2..”}[1m]))
Duration:
histogram_quantile(0.99,
 sum(rate(request_duration_seconds_bucket{job=“…}[1m])) by (le))
Demo
Time
DAG of Services
Latencies & Averages
More Details
• “Monitoring Microservices” - Weaveworks (slides)

• "The RED Method: key metrics for microservices architecture” - Weaveworks

• “Monitoring and Observability with USE and RED” - VividCortex

• “RED Method for Prometheus – 3 Key Metrics for Monitoring” - Rancher Labs

• “Logs and Metrics” - Cindy Sridharan

• "Logging v. instrumentation”, “Go best practices, six years in” - Peter Bourgon
The Four
Golden Signals
The Four Golden Signals
For each service, monitor:

• Latency - time taken to serve a request
• Traffic - how much demand is places on your system
• Errors - rate or requests that are failing
• Saturation - how “full” your services is
Demo
Time
More Details
• “The Four Golden Signals” - The Google SRE Book

• “How to Monitor the SRE Golden Signals” - Steve Mushero
Summary
Thanks!
Questions?
Tom Wilkie VP Product, Grafana Labs
Previously: Kausal, Weaveworks, Google, Acunu, Xensource
Twitter: @tom_wilkie Email: tom@grafana.com
+
Grafana Cloud is a hosted and fully managed SaaS metrics
platform that helps Ops and Dev teams using Grafana
to understand the behavior of their applications and
infrastructure
Grafana Cloud allows users to provision and manage
the best open source observability tools - Grafana and
Prometheus - all through a simple UI and single API.
What is Grafana Cloud?
Store, visualize and alert without the headache of scaling or managing
your own monitoring stack.
Your complete, fully managed, hosted metrics platform.
Grafana Cloud:

Más contenido relacionado

La actualidad más candente

Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...DevOpsDays Tel Aviv
 
Site (Service) Reliability Engineering
Site (Service) Reliability EngineeringSite (Service) Reliability Engineering
Site (Service) Reliability EngineeringMark Underwood
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introductionRico Chen
 
Building Immutable Machine Images with Packer and Ansible
Building Immutable Machine Images with Packer and AnsibleBuilding Immutable Machine Images with Packer and Ansible
Building Immutable Machine Images with Packer and AnsibleJason Harley
 
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt
 
Infographic: Importance of Performance Testing
Infographic: Importance of Performance TestingInfographic: Importance of Performance Testing
Infographic: Importance of Performance TestingKiwiQA
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusMarco Pas
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
 
Site reliability engineering - Lightning Talk
Site reliability engineering - Lightning TalkSite reliability engineering - Lightning Talk
Site reliability engineering - Lightning TalkMichae Blakeney
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Abeer R
 
How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)Siglos
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2Chris Huang
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy Docker, Inc.
 
Getting Started with Infrastructure as Code
Getting Started with Infrastructure as CodeGetting Started with Infrastructure as Code
Getting Started with Infrastructure as CodeWinWire Technologies Inc
 

La actualidad más candente (20)

Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
 
Site (Service) Reliability Engineering
Site (Service) Reliability EngineeringSite (Service) Reliability Engineering
Site (Service) Reliability Engineering
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Building Immutable Machine Images with Packer and Ansible
Building Immutable Machine Images with Packer and AnsibleBuilding Immutable Machine Images with Packer and Ansible
Building Immutable Machine Images with Packer and Ansible
 
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
 
Infographic: Importance of Performance Testing
Infographic: Importance of Performance TestingInfographic: Importance of Performance Testing
Infographic: Importance of Performance Testing
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Site reliability engineering - Lightning Talk
Site reliability engineering - Lightning TalkSite reliability engineering - Lightning Talk
Site reliability engineering - Lightning Talk
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)
 
How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
Getting Started with Infrastructure as Code
Getting Started with Infrastructure as CodeGetting Started with Infrastructure as Code
Getting Started with Infrastructure as Code
 

Similar a The RED Method: How to monitoring your microservices.

The RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesThe RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesKausal
 
The RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesThe RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesKausal
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongFastly
 
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICESTHE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICESInfluxData
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
Introduction to Prometheus Monitoring (Singapore Meetup)
Introduction to Prometheus Monitoring (Singapore Meetup) Introduction to Prometheus Monitoring (Singapore Meetup)
Introduction to Prometheus Monitoring (Singapore Meetup) Arseny Chernov
 
A Modern Approach to Performance Monitoring
A Modern Approach to Performance MonitoringA Modern Approach to Performance Monitoring
A Modern Approach to Performance MonitoringCliff Crocker
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-youmkherlakian
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...rschuppe
 
Edge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance MonitoringEdge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance MonitoringAkamai Technologies
 
Add observability to your django application - PyCon FR 2019
Add observability to your django application - PyCon FR 2019Add observability to your django application - PyCon FR 2019
Add observability to your django application - PyCon FR 2019Bleemeo
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NETDavid Giard
 
Benchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and ResultsBenchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and ResultsNGINX, Inc.
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
Monitor some of the things
Monitor some of the thingsMonitor some of the things
Monitor some of the thingsandertech
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profitRodrigo Campos
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance ManagementNoriaki Tatsumi
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongFastly
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE MethodBrendan Gregg
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler
 

Similar a The RED Method: How to monitoring your microservices. (20)

The RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesThe RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your Services
 
The RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesThe RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your Services
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
 
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICESTHE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Introduction to Prometheus Monitoring (Singapore Meetup)
Introduction to Prometheus Monitoring (Singapore Meetup) Introduction to Prometheus Monitoring (Singapore Meetup)
Introduction to Prometheus Monitoring (Singapore Meetup)
 
A Modern Approach to Performance Monitoring
A Modern Approach to Performance MonitoringA Modern Approach to Performance Monitoring
A Modern Approach to Performance Monitoring
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-you
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Edge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance MonitoringEdge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance Monitoring
 
Add observability to your django application - PyCon FR 2019
Add observability to your django application - PyCon FR 2019Add observability to your django application - PyCon FR 2019
Add observability to your django application - PyCon FR 2019
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
 
Benchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and ResultsBenchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and Results
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Monitor some of the things
Monitor some of the thingsMonitor some of the things
Monitor some of the things
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance Management
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New Contexts
 

Más de Grafana Labs

Cortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusCortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusGrafana Labs
 
Monitoring the Hashistack with Prometheus
Monitoring the Hashistack with PrometheusMonitoring the Hashistack with Prometheus
Monitoring the Hashistack with PrometheusGrafana Labs
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Grafana Labs
 
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...Grafana Labs
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Prometheus Monitoring Mixins
Prometheus Monitoring MixinsPrometheus Monitoring Mixins
Prometheus Monitoring MixinsGrafana Labs
 

Más de Grafana Labs (6)

Cortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusCortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available Prometheus
 
Monitoring the Hashistack with Prometheus
Monitoring the Hashistack with PrometheusMonitoring the Hashistack with Prometheus
Monitoring the Hashistack with Prometheus
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
[PromCon2018] Prometheus Monitoring Mixins: Using Jsonnet to Package Together...
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Prometheus Monitoring Mixins
Prometheus Monitoring MixinsPrometheus Monitoring Mixins
Prometheus Monitoring Mixins
 

Último

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

The RED Method: How to monitoring your microservices.

  • 1. The RED Method Patterns for instrumentation & monitoring. Tom Wilkie tom@grafana.com @tom_wilkie github.com/tomwilkie
  • 2.
  • 3. Introduction Why does this matter? The USE Method Utilisation, Saturation, Errors The RED Method Requests Rate, Errors, Duration.. The Four Golden Signals RED + Saturation
  • 6. For every resource, monitor: • Utilisation: % time that the resource was busy • Saturation: amount of work resource has to do, often queue length • Errors: the count of error events http://www.brendangregg.com/usemethod.html
  • 7. Utilisation Saturation Errors CPU ✔ ✔ ✔ Memory ✔ ✔ ✔ Disk ✔ ✔ ✔ Network ✔ ✔ ✖
  • 8. CPU USE in Prometheus CPU Utilisation: 1 - avg(rate(node_cpu{job="default/node-exporter",mode="idle"}[1m])) CPU Saturation: sum(node_load1{job="default/node-exporter"}) / sum(node:node_num_cpu:sum)
  • 9. Memory USE in Prometheus Memory Utilisation: 1 - sum( node_memory_MemFree{job=“…”} + node_memory_Cached{job=“…”} + node_memory_Buffers{job=“…”} ) / sum(node_memory_MemTotal{job=“…”}) Memory Saturation: 1e3 * sum( rate(node_vmstat_pgpgin{job=“…”}[1m]) + rate(node_vmstat_pgpgout{job=“…”}[1m])) )
  • 10. Interesting / Hard Cases • CPU Errors, Memory Errors • Hard Disk Errors! • Disk Capacity vs Disk IO • Network Utilisation • Interconnects
  • 12. More Details • “The USE Method” - Brendan Gregg • Kubernetes Mixin - https://github.com/grafana/jsonnet-libs
  • 14. For every service, monitor request: • Rate - number of requests per second • Errors - the number of those requests that are failing • Duration - the amount of time those requests take The RED Method
  • 15.
  • 16. Prometheus Implementation import ( “github.com/prometheus/client_golang/prometheus" ) var requestDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{ Name: "request_duration_seconds", Help: "Time (in seconds) spent serving HTTP requests.", Buckets: prometheus.DefBuckets, }, []string{"method", "route", "status_code"}) func init() { prometheus.MustRegister(requestDuration) }
  • 17. func wrap(h http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { m := httpsnoop.CaptureMetrics(h, w, r) requestDuration.WithLabelValues(r.Method, r.URL.Path, strconv.Itoa(m.Code)).Observe(m.Duration.Seconds()) }) } func server(addr string) { http.Handle("/metrics", prometheus.Handler()) http.Handle("/greeter", wrap(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { ... })) } Prometheus Implementation
  • 21.
  • 23. More Details • “Monitoring Microservices” - Weaveworks (slides) • "The RED Method: key metrics for microservices architecture” - Weaveworks • “Monitoring and Observability with USE and RED” - VividCortex • “RED Method for Prometheus – 3 Key Metrics for Monitoring” - Rancher Labs • “Logs and Metrics” - Cindy Sridharan • "Logging v. instrumentation”, “Go best practices, six years in” - Peter Bourgon
  • 25.
  • 26. The Four Golden Signals For each service, monitor: • Latency - time taken to serve a request • Traffic - how much demand is places on your system • Errors - rate or requests that are failing • Saturation - how “full” your services is
  • 27.
  • 29. More Details • “The Four Golden Signals” - The Google SRE Book • “How to Monitor the SRE Golden Signals” - Steve Mushero
  • 32. Tom Wilkie VP Product, Grafana Labs Previously: Kausal, Weaveworks, Google, Acunu, Xensource Twitter: @tom_wilkie Email: tom@grafana.com
  • 33. + Grafana Cloud is a hosted and fully managed SaaS metrics platform that helps Ops and Dev teams using Grafana to understand the behavior of their applications and infrastructure Grafana Cloud allows users to provision and manage the best open source observability tools - Grafana and Prometheus - all through a simple UI and single API. What is Grafana Cloud? Store, visualize and alert without the headache of scaling or managing your own monitoring stack. Your complete, fully managed, hosted metrics platform. Grafana Cloud: