SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Reducing MTTR and False Escalations:
Event Correlation at LinkedIn
Michael Kehoe
Staff Site Reliability Engineer
LinkedIn
Have you ever?
2
False Escalations
• Been woken because your service is unhealthy because of a dependency?
• Been woken because someone believes your service is responsible?
• Spent hours trying to work out why your service is broken?
3
Agenda
• Project Problem Statement
• Project Goals
• Architecture Considerations
• Correlation Engine Overview
• Results & Takeaways
• Questions
$ whoami
4
Michael Kehoe
• Staff Site Reliability Engineer (SRE) @ LinkedIn
• Production-SRE team
• Funny accent = Australian + 3 years American
$ whatis PROD-SRE
5
Michael Kehoe
• Production-SRE
• Develop applications to improve MTTD and
MTTR
• Build tools for efficient site issue
troubleshooting, issue detection & correlation
• Provide direction on site monitoring
• Assist in restoring stability to services during site
critical issues
6
Problem Statement
Service Complexity
Learning Curve MTTR
Reliability
Project Technical Goal
7
Problem Statement
Find problem with a service between a given time period (or ongoing) using:
Unified API Web Frontend
Project Success Criteria
8
Problem Statement
• Reduce MTTR on incidents
• Reduce false/ needless escalations
Expected Use-Cases
9
Problem Statement
Applicable use-cases:
• A service has high latency or error rates
• Find the problematic service(s)
Non-applicable use-cases:
• External monitoring services show slow page-load times
10
Architecture Considerations
Real-Time metrics analytics
(stream processing)
Ad-Hoc metrics Analytics Alert Correlation
Evaluation
11
Architecture Considerations
• Real-Time metrics analytics (stream processing)
• Pros
• Fast response time
• Ability to do advanced analytics in real-time
• Cons
• Resource intensive (especially at LinkedIn scale)
Evaluation
12
Architecture Considerations
• Ad-Hoc metric analytics
• Pros
• Smaller resource footprint
• Cons
• Analysis time is slow
Evaluation
13
Architecture Considerations
• Alert Correlation
• Pros
• Leverage already existing alerts
• Strong signal-to-noise ratio
• Cons
• Analysis constrained to alerts only (boolean state)
Evaluation
14
Architecture Considerations
• Real-time analytics is expensive, but useful
• Ad-Hoc metric analytics is slower, but cheaper
• Alert Correlation gives us strong signal
15
Correlation Engine Overview
At LinkedIn, we had two smaller projects that we could leverage
Drilldown + Site-Stabilizer
Near-Time metric analytics & event correlation
Invisualize
Alert Correlation
Existing knowledge available
Where to get started
16
Correlation Engine Overview
The ability to correlate is great!
But you need to understand dependencies
Build a callgraph!
Callgraph
17
Correlation Engine Overview
LinkedIn applications emit metrics on a per-API and per-dependency basis
Map metrics to understand dependencies
Simple to build callgraph platform!
Callgraph
18
Correlation Engine Overview
Callgraph-be
Voldemort
(RO Datastore)
Espresso
(RW Datastore)
Collect:
● Call count
● Latency
drilldown (Near-Time analytics)
19
Correlation Engine Overview
Using callgraph, identifies high-value dependencies (and the associated metrics)
In 5min chunks, analyses high-value metrics
Using a k-means unsupervised algorithm, find similar trends between service metrics
Queryable API
Outputs correlation confidence scores
Normalised between 0-100
inVisualize (Alert Correlation)
20
Correlation Engine Overview
inVisualize analyses alerts (in realtime) from each service
Use callgraph to calculate the unhealthy service and affected services
Queryable API
Results normalised between 0-100
Visualizes impact
inVisualize
21
Correlation Engine Overview
Site-Stabilizer
22
Correlation Engine Overview
Backend service
Collates recommendations from Drilldown & inVisualize
Decorates recommendations with:
Scheduled changes
Deployment events
A/B experiment changes
Architecture
23
Correlation Engine Overview
Callgraph-api
Callgraph-be
drilldown invisualize
site-stabilizer
Correlate-fe
24
Correlation Engine Overview
API for automation
Auto-remediation
Alert suppressing
UI for manual introspection
Correlate-fe
25
Correlation Engine Overview
User Interfaces gives
Responsible service
Correlation Confidence
Root cause
SRE team
Analysis
Architecture
26
Correlation Engine Overview
Callgraph-api
Callgraph-be
correlate-fe
drilldown invisualize
site-stabilizer
Latency Alert
NURSE
Nurse Plan arguments
• service-name: my-frontend
• req_confidence = 85
• escalate = True
Escalate to
correct SRE
Find what’s wrong with
‘my-frontend’ in
DatacenterB
IrisAlert Correlation API
Service: Service-C
Confidence: 91%
Reason: ‘Service-C’ has high latency after a deploy
Service Owner: SRE
28
Early Results
Siteops (NOC) has greater visibility on the site
Reducing MTTR
Reducing false escalations
29
Conclusion
Understand what correlation approach makes sense for you
Understand your dependencies
Build, Integrate and benefit!
30
Team
Govindaluri
Kishore
Renjith
Rajan
Reynold
Perumpilly
Rusty
Wickell
Michael
Kehoe
31
Questions?
©2014 LinkedIn Corporation. All Rights Reserved.©2014 LinkedIn Corporation. All Rights Reserved.
Callgraph
33
Correlation Engine Overview
Callgraph-be
RestLi
(Internal API’s)
Voldemort
(RO Datastore)
Espresso
(RW Datastore)
Call count
Latency
Architecture
34
Correlation Engine Overview
Callgraph-api
Callgraph-be
correlate-fe
drilldown invisualize
site-stabilizer

Más contenido relacionado

La actualidad más candente

Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Christian Horsdal
 
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Tyler Nguyen
 
[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7Site24x7
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EnginePraveen Thirukonda
 
Gwava redline3.5
Gwava   redline3.5Gwava   redline3.5
Gwava redline3.5GWAVA
 
End user-experience monitoring
End user-experience monitoring End user-experience monitoring
End user-experience monitoring Site24x7
 
Microsoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoringMicrosoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoringSite24x7
 
David Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max
 
VMware Monitoring-Discover And Monitor Your Virtual Environment
VMware Monitoring-Discover And Monitor Your Virtual EnvironmentVMware Monitoring-Discover And Monitor Your Virtual Environment
VMware Monitoring-Discover And Monitor Your Virtual EnvironmentSite24x7
 
Server Monitoring from the Cloud
Server Monitoring from the CloudServer Monitoring from the Cloud
Server Monitoring from the CloudSite24x7
 
JIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter StricklandJIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter StricklandAtlassian
 
Micro Services Architecture
Micro Services ArchitectureMicro Services Architecture
Micro Services ArchitectureRanjan Baisak
 
10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian PerformanceAtlassian
 
Enabling DevOps to optimize application and server performance
Enabling DevOps to optimize application and server performanceEnabling DevOps to optimize application and server performance
Enabling DevOps to optimize application and server performanceManageEngine, Zoho Corporation
 
Getting into the flow building applications with reactive streams
Getting into the flow building applications with reactive streamsGetting into the flow building applications with reactive streams
Getting into the flow building applications with reactive streamsTim van Eijndhoven
 
Database ingest with Apache NiFi and MiNiFi
Database ingest with Apache NiFi and MiNiFiDatabase ingest with Apache NiFi and MiNiFi
Database ingest with Apache NiFi and MiNiFiLucian Neghina
 
Application Performance Monitoring (APM)
Application Performance Monitoring (APM)Application Performance Monitoring (APM)
Application Performance Monitoring (APM)Site24x7
 
Site24x7 Plugins - Monitor your entire server stack
Site24x7 Plugins - Monitor your entire server stackSite24x7 Plugins - Monitor your entire server stack
Site24x7 Plugins - Monitor your entire server stackSite24x7
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringManageEngine, Zoho Corporation
 

La actualidad más candente (20)

Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017
 
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
 
[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation Engine
 
Gwava redline3.5
Gwava   redline3.5Gwava   redline3.5
Gwava redline3.5
 
End user-experience monitoring
End user-experience monitoring End user-experience monitoring
End user-experience monitoring
 
Microsoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoringMicrosoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoring
 
David Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to Espresso
 
VMware Monitoring-Discover And Monitor Your Virtual Environment
VMware Monitoring-Discover And Monitor Your Virtual EnvironmentVMware Monitoring-Discover And Monitor Your Virtual Environment
VMware Monitoring-Discover And Monitor Your Virtual Environment
 
Intro to.net core 20170111
Intro to.net core   20170111Intro to.net core   20170111
Intro to.net core 20170111
 
Server Monitoring from the Cloud
Server Monitoring from the CloudServer Monitoring from the Cloud
Server Monitoring from the Cloud
 
JIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter StricklandJIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter Strickland
 
Micro Services Architecture
Micro Services ArchitectureMicro Services Architecture
Micro Services Architecture
 
10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance
 
Enabling DevOps to optimize application and server performance
Enabling DevOps to optimize application and server performanceEnabling DevOps to optimize application and server performance
Enabling DevOps to optimize application and server performance
 
Getting into the flow building applications with reactive streams
Getting into the flow building applications with reactive streamsGetting into the flow building applications with reactive streams
Getting into the flow building applications with reactive streams
 
Database ingest with Apache NiFi and MiNiFi
Database ingest with Apache NiFi and MiNiFiDatabase ingest with Apache NiFi and MiNiFi
Database ingest with Apache NiFi and MiNiFi
 
Application Performance Monitoring (APM)
Application Performance Monitoring (APM)Application Performance Monitoring (APM)
Application Performance Monitoring (APM)
 
Site24x7 Plugins - Monitor your entire server stack
Site24x7 Plugins - Monitor your entire server stackSite24x7 Plugins - Monitor your entire server stack
Site24x7 Plugins - Monitor your entire server stack
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoring
 

Destacado

Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialPooja Tangi
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 
The servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONThe servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONDaniel ( Danny ) ☃ Lawrence
 
Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Michael Kehoe
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentMichael Kehoe
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the dayPooja Tangi
 
HBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHortonworks
 
How to Reduce your MTTI/MTTR with a Single Click
How to Reduce your MTTI/MTTR with a Single ClickHow to Reduce your MTTI/MTTR with a Single Click
How to Reduce your MTTI/MTTR with a Single ClickSumo Logic
 
MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012Energized Work
 
White Belt DMAIC Project Line G MTTR
White Belt DMAIC Project Line G  MTTRWhite Belt DMAIC Project Line G  MTTR
White Belt DMAIC Project Line G MTTRIrfan Rasheed Rana
 
Introducing libpd -Pdをアプリのサウンドエンジンに-
Introducing libpd -Pdをアプリのサウンドエンジンに-Introducing libpd -Pdをアプリのサウンドエンジンに-
Introducing libpd -Pdをアプリのサウンドエンジンに-Yoichi Hirata
 
ゼロから始めるSparkSQL徹底活用!
ゼロから始めるSparkSQL徹底活用!ゼロから始めるSparkSQL徹底活用!
ゼロから始めるSparkSQL徹底活用!Nagato Kasaki
 
Reliability Centered Maintenance Made Simple
Reliability Centered Maintenance Made SimpleReliability Centered Maintenance Made Simple
Reliability Centered Maintenance Made SimpleRicky Smith CMRP, CMRT
 
Similan dive center diving liveaboards
Similan dive center diving liveaboardsSimilan dive center diving liveaboards
Similan dive center diving liveaboardsSimilan Diving
 
conservation and rewarding biodiversity conservation Trondheim 05-10-gupta-...
conservation and rewarding biodiversity conservation Trondheim   05-10-gupta-...conservation and rewarding biodiversity conservation Trondheim   05-10-gupta-...
conservation and rewarding biodiversity conservation Trondheim 05-10-gupta-...Dr Anil Gupta
 
Grâce aux tags Varnish, j'ai switché ma prod sur Raspberry Pi
Grâce aux tags Varnish, j'ai switché ma prod sur Raspberry PiGrâce aux tags Varnish, j'ai switché ma prod sur Raspberry Pi
Grâce aux tags Varnish, j'ai switché ma prod sur Raspberry PiJérémy Derussé
 

Destacado (20)

Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potential
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
The servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONThe servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECON
 
Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the day
 
HBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minute
 
How to Reduce your MTTI/MTTR with a Single Click
How to Reduce your MTTI/MTTR with a Single ClickHow to Reduce your MTTI/MTTR with a Single Click
How to Reduce your MTTI/MTTR with a Single Click
 
MTTR
MTTRMTTR
MTTR
 
MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012
 
White Belt DMAIC Project Line G MTTR
White Belt DMAIC Project Line G  MTTRWhite Belt DMAIC Project Line G  MTTR
White Belt DMAIC Project Line G MTTR
 
Introducing libpd -Pdをアプリのサウンドエンジンに-
Introducing libpd -Pdをアプリのサウンドエンジンに-Introducing libpd -Pdをアプリのサウンドエンジンに-
Introducing libpd -Pdをアプリのサウンドエンジンに-
 
ゼロから始めるSparkSQL徹底活用!
ゼロから始めるSparkSQL徹底活用!ゼロから始めるSparkSQL徹底活用!
ゼロから始めるSparkSQL徹底活用!
 
Reliability Centered Maintenance Made Simple
Reliability Centered Maintenance Made SimpleReliability Centered Maintenance Made Simple
Reliability Centered Maintenance Made Simple
 
Reliability centered maintenance
Reliability centered maintenanceReliability centered maintenance
Reliability centered maintenance
 
Similan dive center diving liveaboards
Similan dive center diving liveaboardsSimilan dive center diving liveaboards
Similan dive center diving liveaboards
 
conservation and rewarding biodiversity conservation Trondheim 05-10-gupta-...
conservation and rewarding biodiversity conservation Trondheim   05-10-gupta-...conservation and rewarding biodiversity conservation Trondheim   05-10-gupta-...
conservation and rewarding biodiversity conservation Trondheim 05-10-gupta-...
 
Grâce aux tags Varnish, j'ai switché ma prod sur Raspberry Pi
Grâce aux tags Varnish, j'ai switché ma prod sur Raspberry PiGrâce aux tags Varnish, j'ai switché ma prod sur Raspberry Pi
Grâce aux tags Varnish, j'ai switché ma prod sur Raspberry Pi
 
Presentación - Estudio Anual Comercio Electrónico 2016
Presentación - Estudio Anual Comercio Electrónico 2016 Presentación - Estudio Anual Comercio Electrónico 2016
Presentación - Estudio Anual Comercio Electrónico 2016
 

Similar a Reduce MTTR and False Alerts with Event Correlation

SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Mike Villiger
 
A DevOps Playbook at DraftKings Built with New Relic and AWS
 A DevOps Playbook at DraftKings Built with New Relic and AWS A DevOps Playbook at DraftKings Built with New Relic and AWS
A DevOps Playbook at DraftKings Built with New Relic and AWSAmazon Web Services
 
Refining Your API Design - Architecture and Modeling Learning Event
Refining Your API Design - Architecture and Modeling Learning EventRefining Your API Design - Architecture and Modeling Learning Event
Refining Your API Design - Architecture and Modeling Learning EventLaunchAny
 
Unlock your core business assets for the hybrid cloud with addi webinar dec...
Unlock your core business assets for the hybrid cloud with addi   webinar dec...Unlock your core business assets for the hybrid cloud with addi   webinar dec...
Unlock your core business assets for the hybrid cloud with addi webinar dec...Sherri Hanna
 
Technical Webinar with AWS - Everything You Need to Measure in Your Migration
Technical Webinar with AWS - Everything You Need to Measure in Your MigrationTechnical Webinar with AWS - Everything You Need to Measure in Your Migration
Technical Webinar with AWS - Everything You Need to Measure in Your MigrationNew Relic
 
APIdays Singapore 2019 - Business of APIs: From Integration to Monetisation, ...
APIdays Singapore 2019 - Business of APIs: From Integration to Monetisation, ...APIdays Singapore 2019 - Business of APIs: From Integration to Monetisation, ...
APIdays Singapore 2019 - Business of APIs: From Integration to Monetisation, ...apidays
 
Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture Solace
 
Get the Message Across: Seamlessly Transport Data to Apps, Anywhere
Get the Message Across: Seamlessly Transport Data to Apps, AnywhereGet the Message Across: Seamlessly Transport Data to Apps, Anywhere
Get the Message Across: Seamlessly Transport Data to Apps, AnywhereVMware Tanzu
 
Achieve Full API Lifecycle Management Using NGINX Controller
Achieve Full API Lifecycle Management Using NGINX ControllerAchieve Full API Lifecycle Management Using NGINX Controller
Achieve Full API Lifecycle Management Using NGINX ControllerNGINX, Inc.
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for SpeedCapgemini
 
Connect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API ProtectionConnect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API ProtectionDevOps.com
 
Microservices: Organizing Large Teams for Rapid Delivery
Microservices: Organizing Large Teams for Rapid DeliveryMicroservices: Organizing Large Teams for Rapid Delivery
Microservices: Organizing Large Teams for Rapid DeliveryVMware Tanzu
 
Kochi mulesoft meetup 02
Kochi mulesoft meetup 02Kochi mulesoft meetup 02
Kochi mulesoft meetup 02sumitahuja94
 
Emvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Technologies
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azuregjuljo
 
Improve_Application_Availability_and_Performance_Sales_Crib_Sheet.pdf
Improve_Application_Availability_and_Performance_Sales_Crib_Sheet.pdfImprove_Application_Availability_and_Performance_Sales_Crib_Sheet.pdf
Improve_Application_Availability_and_Performance_Sales_Crib_Sheet.pdfمنیزہ ہاشمی
 
11 Ways Microservices & Dynamic Clouds Break Your Monitoring
11 Ways Microservices & Dynamic Clouds Break Your Monitoring11 Ways Microservices & Dynamic Clouds Break Your Monitoring
11 Ways Microservices & Dynamic Clouds Break Your MonitoringAbner Germanow
 
5 Key Metrics to Release Better Software Faster
5 Key Metrics to Release Better Software Faster5 Key Metrics to Release Better Software Faster
5 Key Metrics to Release Better Software FasterDynatrace
 
5 Pillars of Building Enterprise0grade APIs
5 Pillars of Building Enterprise0grade APIs5 Pillars of Building Enterprise0grade APIs
5 Pillars of Building Enterprise0grade APIsWSO2
 

Similar a Reduce MTTR and False Alerts with Event Correlation (20)

SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
 
A DevOps Playbook at DraftKings Built with New Relic and AWS
 A DevOps Playbook at DraftKings Built with New Relic and AWS A DevOps Playbook at DraftKings Built with New Relic and AWS
A DevOps Playbook at DraftKings Built with New Relic and AWS
 
Refining Your API Design - Architecture and Modeling Learning Event
Refining Your API Design - Architecture and Modeling Learning EventRefining Your API Design - Architecture and Modeling Learning Event
Refining Your API Design - Architecture and Modeling Learning Event
 
Unlock your core business assets for the hybrid cloud with addi webinar dec...
Unlock your core business assets for the hybrid cloud with addi   webinar dec...Unlock your core business assets for the hybrid cloud with addi   webinar dec...
Unlock your core business assets for the hybrid cloud with addi webinar dec...
 
Technical Webinar with AWS - Everything You Need to Measure in Your Migration
Technical Webinar with AWS - Everything You Need to Measure in Your MigrationTechnical Webinar with AWS - Everything You Need to Measure in Your Migration
Technical Webinar with AWS - Everything You Need to Measure in Your Migration
 
APIdays Singapore 2019 - Business of APIs: From Integration to Monetisation, ...
APIdays Singapore 2019 - Business of APIs: From Integration to Monetisation, ...APIdays Singapore 2019 - Business of APIs: From Integration to Monetisation, ...
APIdays Singapore 2019 - Business of APIs: From Integration to Monetisation, ...
 
Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture
 
Get the Message Across: Seamlessly Transport Data to Apps, Anywhere
Get the Message Across: Seamlessly Transport Data to Apps, AnywhereGet the Message Across: Seamlessly Transport Data to Apps, Anywhere
Get the Message Across: Seamlessly Transport Data to Apps, Anywhere
 
Achieve Full API Lifecycle Management Using NGINX Controller
Achieve Full API Lifecycle Management Using NGINX ControllerAchieve Full API Lifecycle Management Using NGINX Controller
Achieve Full API Lifecycle Management Using NGINX Controller
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for Speed
 
Connect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API ProtectionConnect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API Protection
 
Microservices: Organizing Large Teams for Rapid Delivery
Microservices: Organizing Large Teams for Rapid DeliveryMicroservices: Organizing Large Teams for Rapid Delivery
Microservices: Organizing Large Teams for Rapid Delivery
 
Kochi mulesoft meetup 02
Kochi mulesoft meetup 02Kochi mulesoft meetup 02
Kochi mulesoft meetup 02
 
Emvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce Deck
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azure
 
Improve_Application_Availability_and_Performance_Sales_Crib_Sheet.pdf
Improve_Application_Availability_and_Performance_Sales_Crib_Sheet.pdfImprove_Application_Availability_and_Performance_Sales_Crib_Sheet.pdf
Improve_Application_Availability_and_Performance_Sales_Crib_Sheet.pdf
 
11 Ways Microservices & Dynamic Clouds Break Your Monitoring
11 Ways Microservices & Dynamic Clouds Break Your Monitoring11 Ways Microservices & Dynamic Clouds Break Your Monitoring
11 Ways Microservices & Dynamic Clouds Break Your Monitoring
 
5 Key Metrics to Release Better Software Faster
5 Key Metrics to Release Better Software Faster5 Key Metrics to Release Better Software Faster
5 Key Metrics to Release Better Software Faster
 
5 Pillars of Building Enterprise0grade APIs
5 Pillars of Building Enterprise0grade APIs5 Pillars of Building Enterprise0grade APIs
5 Pillars of Building Enterprise0grade APIs
 

Más de Michael Kehoe

Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFMichael Kehoe
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsMichael Kehoe
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 

Más de Michael Kehoe (16)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 

Reduce MTTR and False Alerts with Event Correlation