SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
@PierreVincent
How to build observable
distributed systems
March 8th, 2018 – QCon London
@PierreVincent pvincent.io
@PierreVincent
@PierreVincent
Pierre Vincent
SRE Manager at Poppulo
@PierreVincent
pvincent.io
@PierreVincent
Reaching production is
only the beginning
@PierreVincent
No system is immune to failure
Be ready to recover
@PierreVincent
When distributing a system,
we’re also distributing the places
where things might go wrong
@PierreVincent
Monitoring only applies to
known failure modes.
What about everything else?
@PierreVincent
Monitoring tells you whether the
system works. Observability lets
you ask why it's not working.
“
”– Baron Schwarz
@PierreVincent
Healthchecks Metrics
Logging Tracing
@PierreVincent
Healthchecks
@PierreVincent
Is it running?
Can it perform
its task?
Can it accept
more work?
Healthchecks
@PierreVincent
Broadcast Register Expose
Healthchecks
@PierreVincent
Healthchecks
/health
{
"service": "registration-service",
"healthy": true,
"workload": { "healthy": true },
"dependencies": [
{ "name": "cassandra", "healthy": true },
{ "name": "billing-svc", "healthy": true
},
]
}
GET http://1.2.3.4:8080/health
200 OK
@PierreVincent
Source: HTTP Healthchecks for a Resilient Platform - Chris O’Dell
skeltonthatcher.com/blog/http-healthchecks-for-a-resilient-platform
Overzealous Healthchecks can be
counter-productive
@PierreVincent
Metrics
@PierreVincent
System
metrics
Application
metrics
Business
metrics
CPU usage Error rates Customer conversions
Metrics
@PierreVincent
Servers / VMs
Appliances/Infra
Services
Metrics
collector
Metrics
query
engine
Dashboards
Alerts
Metrics
@PierreVincent
Servers / VMs
Appliances/Infra
Services
/metrics
/metrics
/metrics
Prometheus
Metrics
@PierreVincent
Watch out for over-reliance on metrics
Limitations at high-cardinality
Not every metric
deserves an alert
Real-time querying means
some trade-offs on retention
Limit alerting to
user-impacting symptoms
Poor fine-grained debugging
e.g. CustomerId
Not suitable for long-term
trend analysis
@PierreVincent
Logging
@PierreVincent
Searchable Correlated
Logging
Making sense of (a lot of) logs
Centralised
@PierreVincent
A
F
H
D
J
B
E
C
G
a1b2c3
a1b2c3
a1b2c3
ERROR [svc=H][trace=a1b2c3] Failed to save order
Cause: Cassandra timeout exception
ERROR [svc=F][trace=a1b2c3] Failed to complete order
Cause: Shipping service responded with 500
ERROR [svc=A][trace=a1b2c3] Failed to process order
Cause: Order process manager responded with 500
a1b2c3
INFO [svc=G][trace=a1b2c3] Items verified in stock
Log Correlation
@PierreVincent
JSON
2018-02-20T16:38:23+00:00 ERROR Read timed out
timestamp 2018-02-20T16:38:23+00:00
requestID ec667cb45
level ERROR
team eventsservice registration-service
commit 542a8b8e build 542a8b8e.7
node node_e79f3e52
log Read timed out
region europe-west2
runtime java-1.8.0_161
When did it happen?
Can I trace it?
What is it?
What is it running?
Where is it?
What is the message?
customerID 55123Who caused it? userID 458
... ...Any other info?
Hmmm thanks… ?
@PierreVincent
Structured logs unleash high-cardinality exploration
Error rate spike isolated
by build version
Activity spike isolated
for single customer
@PierreVincent
Tracing
@PierreVincent
Tracing
get_confirmed_attendees
get_attendees
get_confirm_status
cassandra/select
mysql/select
event-mgt-api
attendees-service
registration-service
Trace
Spans
0ms 50ms 100ms 150ms 172ms
172ms
73ms
54ms
78ms
41ms
@PierreVincent
@PierreVincent
Unknown
Unknowns
Known
Unknowns
Healthchecks
Metrics
(Alerting)
Logs
Metrics
(Queries)
Tracing
Events
Monitoring & Resiliency Debugging & Exploration
@PierreVincent
Cheap to
instrument
Easy to
explore
Reliable &
Trustworthy
Usability of tooling is key to adoption
@PierreVincent
Visibility builds trust
but requires safety
@PierreVincent
Visibility helps justify decisions
@PierreVincent
Visibility enables operability
@PierreVincent
Test (a little bit*) lessTest (a little bit*) lessDon’t spend all your time testing
...keep some for instrumenting
Thank you!
@PierreVincent
pvincent.io
@PierreVincent
Thank you!
Pierre Vincent
SRE Manager at Poppulo
@PierreVincent
pvincent.io

Más contenido relacionado

Similar a QCon London - How to build observable distributed systems

Plantsight presentation
Plantsight presentationPlantsight presentation
Plantsight presentation
rcontrols
 
Project_CharterProject TitleGO HOME!!Black BeltProject ChampionEx.docx
Project_CharterProject TitleGO HOME!!Black BeltProject ChampionEx.docxProject_CharterProject TitleGO HOME!!Black BeltProject ChampionEx.docx
Project_CharterProject TitleGO HOME!!Black BeltProject ChampionEx.docx
woodruffeloisa
 

Similar a QCon London - How to build observable distributed systems (20)

SucculentPi [AWS Basel Meetup - Oct 2022].pdf
SucculentPi [AWS Basel Meetup - Oct 2022].pdfSucculentPi [AWS Basel Meetup - Oct 2022].pdf
SucculentPi [AWS Basel Meetup - Oct 2022].pdf
 
Product Keynote: Jira Service Desk, Opsgenie, Statuspage
Product Keynote: Jira Service Desk, Opsgenie, StatuspageProduct Keynote: Jira Service Desk, Opsgenie, Statuspage
Product Keynote: Jira Service Desk, Opsgenie, Statuspage
 
How to build a scalable SNS via Polling & Push
How to build a scalable SNS via Polling & PushHow to build a scalable SNS via Polling & Push
How to build a scalable SNS via Polling & Push
 
Finding attacks with these 6 events
Finding attacks with these 6 eventsFinding attacks with these 6 events
Finding attacks with these 6 events
 
Sense Your Smart City: Connect Environmental Sensors to SensorThings API
Sense Your Smart City: Connect Environmental Sensors to SensorThings APISense Your Smart City: Connect Environmental Sensors to SensorThings API
Sense Your Smart City: Connect Environmental Sensors to SensorThings API
 
WTF is Sensu and Monitoring
WTF is Sensu and MonitoringWTF is Sensu and Monitoring
WTF is Sensu and Monitoring
 
How to investigate and recover from a security breach in WordPress
How to investigate and recover from a security breach in WordPressHow to investigate and recover from a security breach in WordPress
How to investigate and recover from a security breach in WordPress
 
Sarah Wells - Alert overload: How to adopt a microservices architecture witho...
Sarah Wells - Alert overload: How to adopt a microservices architecture witho...Sarah Wells - Alert overload: How to adopt a microservices architecture witho...
Sarah Wells - Alert overload: How to adopt a microservices architecture witho...
 
Codemotion Milan 2015 Alerts Overload
Codemotion Milan 2015 Alerts OverloadCodemotion Milan 2015 Alerts Overload
Codemotion Milan 2015 Alerts Overload
 
Monitoring and Logging in Wonderland
Monitoring and Logging in WonderlandMonitoring and Logging in Wonderland
Monitoring and Logging in Wonderland
 
Multiple awr reports_parser
Multiple awr reports_parserMultiple awr reports_parser
Multiple awr reports_parser
 
De Arquitectura Hexagonal a Event Sourcing
De Arquitectura Hexagonal a Event SourcingDe Arquitectura Hexagonal a Event Sourcing
De Arquitectura Hexagonal a Event Sourcing
 
Velocity 2015 Amsterdam: Alerts overload
Velocity 2015 Amsterdam: Alerts overloadVelocity 2015 Amsterdam: Alerts overload
Velocity 2015 Amsterdam: Alerts overload
 
Plantsight presentation
Plantsight presentationPlantsight presentation
Plantsight presentation
 
Real-time QC for Factories Whitepaper
Real-time QC for Factories WhitepaperReal-time QC for Factories Whitepaper
Real-time QC for Factories Whitepaper
 
Matt Turner: Istio, The Packet's-Eye View (DevSecOps - London Gathering, Janu...
Matt Turner: Istio, The Packet's-Eye View (DevSecOps - London Gathering, Janu...Matt Turner: Istio, The Packet's-Eye View (DevSecOps - London Gathering, Janu...
Matt Turner: Istio, The Packet's-Eye View (DevSecOps - London Gathering, Janu...
 
Supporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with SplunkSupporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with Splunk
 
Project_CharterProject TitleGO HOME!!Black BeltProject ChampionEx.docx
Project_CharterProject TitleGO HOME!!Black BeltProject ChampionEx.docxProject_CharterProject TitleGO HOME!!Black BeltProject ChampionEx.docx
Project_CharterProject TitleGO HOME!!Black BeltProject ChampionEx.docx
 
Integrating Docker EE into Société Générale's Existing Enterprise IT Systems
Integrating Docker EE into Société Générale's Existing Enterprise IT SystemsIntegrating Docker EE into Société Générale's Existing Enterprise IT Systems
Integrating Docker EE into Société Générale's Existing Enterprise IT Systems
 
Team Tour Seoul: Bringing Agile to IT
Team Tour Seoul: Bringing Agile to ITTeam Tour Seoul: Bringing Agile to IT
Team Tour Seoul: Bringing Agile to IT
 

Más de Pierre Vincent

Más de Pierre Vincent (10)

[Test bash NL] Contract testing in practice with Pact
[Test bash NL] Contract testing in practice with Pact[Test bash NL] Contract testing in practice with Pact
[Test bash NL] Contract testing in practice with Pact
 
DevOpsDays Galway 2019 - Zero-downtime deployments
DevOpsDays Galway 2019 - Zero-downtime deploymentsDevOpsDays Galway 2019 - Zero-downtime deployments
DevOpsDays Galway 2019 - Zero-downtime deployments
 
[Test Bash Manchester] Observability and Testing
[Test Bash Manchester] Observability and Testing[Test Bash Manchester] Observability and Testing
[Test Bash Manchester] Observability and Testing
 
[Test bash manchester] contract testing in practice
[Test bash manchester] contract testing in practice[Test bash manchester] contract testing in practice
[Test bash manchester] contract testing in practice
 
[RebelCon] Increasing visibility of distributed systems in production
[RebelCon] Increasing visibility of distributed systems in production[RebelCon] Increasing visibility of distributed systems in production
[RebelCon] Increasing visibility of distributed systems in production
 
Deploying microservices in a fast-paced customer-centric environment: How and...
Deploying microservices in a fast-paced customer-centric environment: How and...Deploying microservices in a fast-paced customer-centric environment: How and...
Deploying microservices in a fast-paced customer-centric environment: How and...
 
Improve collaboration and confidence with Consumer-driven contracts
Improve collaboration and confidence with Consumer-driven contractsImprove collaboration and confidence with Consumer-driven contracts
Improve collaboration and confidence with Consumer-driven contracts
 
Consumer-driven contracts: avoid microservices integration hell! (MuCon Londo...
Consumer-driven contracts: avoid microservices integration hell! (MuCon Londo...Consumer-driven contracts: avoid microservices integration hell! (MuCon Londo...
Consumer-driven contracts: avoid microservices integration hell! (MuCon Londo...
 
Consumer-driven contracts: avoid microservices integration hell! (LondonCD - ...
Consumer-driven contracts: avoid microservices integration hell! (LondonCD - ...Consumer-driven contracts: avoid microservices integration hell! (LondonCD - ...
Consumer-driven contracts: avoid microservices integration hell! (LondonCD - ...
 
Agile at Newsweaver (Agile Cork March 2016)
Agile at Newsweaver (Agile Cork March 2016)Agile at Newsweaver (Agile Cork March 2016)
Agile at Newsweaver (Agile Cork March 2016)
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Último (20)

Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

QCon London - How to build observable distributed systems