The move from monolithic architectures to microservices has resulted in a monumental increase in the number of distinct pieces of software that engineering teams own. It’s getting harder — some would say impossible — for engineers to keep the architecture of the entire system in their heads. And this is to say nothing of understanding service interdependencies and the resultant risk profile associated with either code or architectural changes. If we don’t find solutions to these problems, we not only risk large-scale service disruption, but lengthening the time to diagnose and resolve incidents due to a lack of system-level understanding.
In this talk from Sensu Summit 2019, Julian Dunn, Sr. Manager of Product at PagerDuty, shares insights from how the most innovative companies on the Internet today combat these issues with service maturity modelling: how they define maturity, how they measure it both before and after a service change is introduced to a system, and how they map out the potential impact of component changes on the whole environment. This talk is also a clarion call for a new way of keeping track of all the “stuff” that we’re building, because our existing approaches like CMDBs and Wikis are inadequate for keeping up with the scale of what’s being built today.
4. @julian_dunn 4
It’s getting harder to communicate with
each other about things that matter.
This situation is
totally unacceptable.
Well, that just
about wraps it up!
12. @julian_dunn
Main Problems with CMDBs
1. Designed to be updated manually by
“asset managers”
2. Relationships between objects is
impossible to track
3. Turned out to be most useful for people
other than operations
Result: Few people doing operations find
the CMDB usable for their purposes.
12
13. @julian_dunn
Service Inventory Systems...
1. Track customer-facing software components and not hard assets
2. Primarily serve operations people
3. Are derived implicitly, not updated explicitly
13
The maintenance of an service inventory must align with
actions an operations team is already taking in their
day-to-day work.
14. @julian_dunn
Things You’re Already Doing in PagerDuty
● Setting up technical services
● Associating them with business services
● Assigning teams to services
● Putting teams on-call
● Getting alerted on service failure
14
15. PagerDuty Service Directory
● Dynamic, searchable list of
services in your
organization, with
associated operational
metadata
● Updated by operations
teams as part of service
lifecycle
● Additional metadata,
service relationships, and
business impact coming
soon
16. @julian_dunn 16
Wrapping Up
● Architectures are too big for one person’s head.
● Autonomous teams need to find out information about
relationships in the heat of the moment.
● Existing solutions for keeping track of stuff we’re building are
insufficient.
● We need some sort of cloud-native, dynamic inventory that is
updated dynamically, not manually.
17. PDS19FF350 - “Friends and Family”
Westin St. Francis, San Francisco, CA
September 23-25, 2019
https://summit.pagerduty.com/
The Road to Real Time