SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
How Bad is Your Toil?
Measuring the Human Impact of Process
Kurt Andersen
LinkedIn Product Site Reliability
@drkurta
What is this
“toil”
of which you speak?
@drkurta
• Manual
• Repetitive
• Automatable
• Tactical
• Without enduring value
• O(n) with service growth
(or worse!)
Toil is:
@drkurta
Mean Time To Sleep (MTTS)
• Ryan Frantz & Laurie Denness
from Etsy (at the time)
• Velocity 2014
• Looked at the impact of oncall
vs. life
• https://github.com/etsy/opsweekly
@drkurta
Round 1 —
Time Usage Survey
@drkurta
General Time Categories
Writing/maintaining
code and associated
design & docs
S/W Eng
Making lasting
changes to entire
systems;
consulting on design
& architecture
System Eng /
Architecture
Taking vacation
should not count
against other
baselines
Vacation
(DTO)
Team meetings,
processes, training,
admin tasks, hiring
Overhead
Manual, repetitive,
lacks enduring value
Toil / Reactive
@drkurta
SRE-Time Survey
• Run monthly from June-
November 2018
• Average time to complete: 3m
• Some of the comments were
the most instructive
@drkurta
SRE-Time
Demographics
@drkurta
SRE-Time
Definitions
Reactive Work
Reactive work “is the kind of work tied to running a production service that tends
to be manual, repetitive, automatable, tactical, devoid of enduring value, and
that scales linearly as a service grows.”
Oncall time (either primary or secondary) is a base level of reactivity.
“[Google] SREs report that their top source of reactive work is interrupts (that is,
non-urgent service-related messages and emails). The next leading source is
on-call (urgent) response, followed by releases and pushes.”
@drkurta
SRE-Time Matrix
@drkurta
SRE-Time Results
reactive
SW Eng
Overhead
Systems Eng
20%
22%
37%
21%
@drkurta
SRE-Time Comments
• [Tool X] needs to go, or be supplemented… Its limitations flow into
every corner of how we do our job…
• I find that interviewing can be a huge off-the-top hit against potentially
productive time…
• Reactive work for my team is mostly confined to oncall, but our oncall
week is almost 100% reactive for 12+ hours a day.
• Most of my toil is interrupt driven work, working around broken
tooling, horizontal initiatives, and on call tasks. I also go to meetings.
@drkurta
Inferring interrupts from agile velocity
I nt erl ude
• Manual
• Repetitive
• Automatable
• Tactical
• Without enduring value
• O(n) with service growth
@drkurta
Round 2 —
Oncall Pain Working Group
@drkurta
Oncall Pain Survey
• Identify the problem areas that
make oncall painful
• Develop tools and strategies to
address those areas
• Survey run twice:
Spring and Fall 2017
@drkurta
Oncall Pain
“Scoring”
• 1 – Not a pain point
through
• 5 – Urgently needs improvement
• or N/A if it does not apply to you/your team
@drkurta
Oncall Pain
“General Issues”
• Duration of on-call shifts
• Frequency of on-call shifts
• Sleep-interrupting alerts & issues
• Work/Life balance interruption from weekend
or Holiday on-call
• Inability to progress on other projects while
on-call
• Difficulties contacting or receiving assistance
from product POCs (teammates) or
developers
• Difficulties contacting or receiving assistance
from other teams within LinkedIn
@drkurta
Oncall Pain
“Alerts”
• Spurious alerts requiring no action
• Challenges finding alert-specific severity
information
• Challenges finding useful alert-specific
resolution documentation
• Alerts due to problems with a downstream
service or resource
• Following & cleaning up email-only alerts
(low-level)
• Tracking alerts which need to be tuned
• Tuning and updating alerts
@drkurta
Oncall Pain
“Daily Activities”
• Mandatory meetings attended by on-call
• Deployments handled by SREs
• Interruptions for deployment/rollback
assistance
• LiX / RB approvals
• Interruptions for “urgent” LiX / RB approvals
• Other dev support-related interruptions
(detail below)
• Managing incoming team email
• Triaging new tickets
@drkurta
Oncall Pain
“Additional
Details”
• Are there other aspects of oncall not listed
above which you would rate 4-5 (towards the
“urgent/highly stressful” end of the scale)?
• For any items you ranked with 4 or above,
please provide any details you'd like to add
about how these aspects of on-call adversely
impact you, and any thoughts you have about
how to improve the situation.
• Have there been any changes to how your
team handles oncall within the last ~6
months which you feel have had a lasting
positive or negative impact in your oncall
experience? If so, please elaborate on them
here. @drkurta
Downstream Issues
SP’17
Issue Average Median # of 4+
# of
N/A
Alerts due to problems with a downstream service or resource 3.8 4 32 1
Inability to progress on other projects while on-call 3.6 4 30 1
Spurious alerts requiring no action 3.2 3 23 0
Challenges finding useful alert-specific resolution documentation 3.1 3 21 0
Sleep-interrupting alerts & issues 3.1 3 20 2
Tuning and updating alerts 2.9 3 16 2
Challenges finding alert-specific severity information 2.8 3 19 0
Interruptions for deployment/rollback assistance 2.8 3 16 1
Following & cleaning up email-only alerts (low-level) 2.8 3 14 1
Work/Life balance interruption from weekend or Holiday on-call 2.7 3 12 1
Other dev support-related interruptions (detail below) 2.7 3 12 2
Interruptions for "urgent" LiX / RB approvals 2.7 3 14 0
Spring’17
@drkurta
Alert Management
SP’17
Issue Average Median # of 4+
# of
N/A
Alerts due to problems with a downstream service or resource 3.8 4 32 1
Inability to progress on other projects while on-call 3.6 4 30 1
Spurious alerts requiring no action 3.2 3 23 0
Challenges finding useful alert-specific resolution documentation 3.1 3 21 0
Sleep-interrupting alerts & issues 3.1 3 20 2
Tuning and updating alerts 2.9 3 16 2
Challenges finding alert-specific severity information 2.8 3 19 0
Interruptions for deployment/rollback assistance 2.8 3 16 1
Following & cleaning up email-only alerts (low-level) 2.8 3 14 1
Work/Life balance interruption from weekend or Holiday on-call 2.7 3 12 1
Other dev support-related interruptions (detail below) 2.7 3 12 2
Interruptions for "urgent" LiX / RB approvals 2.7 3 14 0
Spring’17
@drkurta
Interruptions
SP’17
Issue Average Median # of 4+
# of
N/A
Alerts due to problems with a downstream service or resource 3.8 4 32 1
Inability to progress on other projects while on-call 3.6 4 30 1
Spurious alerts requiring no action 3.2 3 23 0
Challenges finding useful alert-specific resolution documentation 3.1 3 21 0
Sleep-interrupting alerts & issues 3.1 3 20 2
Tuning and updating alerts 2.9 3 16 2
Challenges finding alert-specific severity information 2.8 3 19 0
Interruptions for deployment/rollback assistance 2.8 3 16 1
Following & cleaning up email-only alerts (low-level) 2.8 3 14 1
Work/Life balance interruption from weekend or Holiday on-call 2.7 3 12 1
Other dev support-related interruptions (detail below) 2.7 3 12 2
Interruptions for "urgent" LiX / RB approvals 2.7 3 14 0
Spring’17
@drkurta
Response Patterns by Team
SP’17
Average 1 2 3 4 5 6 7 8 9 A B C D
3.8
2.0
Spring’17
@drkurta
Methods and Tools
Downstream Alerts Interrupts
Linkedout
Alert Correlation
Oncall Retrospective
Improve alert notes
Oncall Scheduling
Dev Oncall
@drkurta
Oncall Pain
“Fall Follow-up”
Part 1
• Leverage the oncall retro tool to review
alerts / identify those needing improvement
• Change in on-call scheduling (4/3, etc)
• Nurse plan with alert correlation to
prevent/delay unneeded alert
• Alert note improvements
@drkurta
Oncall Pain
“Fall Follow-up”
Part 2
• Have there been any changes (including
those above) to how your team handles on-
call within the time since the last survey
(April 2017) which you feel have had a
lasting positive or negative impact in your
on-call experience? If so, please elaborate
on them here
@drkurta
Improvement Results
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Leverage the oncall retro
tool to review alerts /
identify those needing
improvement
Change in on-call scheduling
(4/3, etc)
Nurse plan with alert
correlation to prevent/delay
unneeded alert
Alert note improvements
Much Better
Slightly Better
Unchanged
Slightly Worse
Much Worse
@drkurta
Round 3 —
Tools
@drkurta
INsomnia
• Goal: create a weekly oncall
dashboard for the shift handover so
that we can have some data to
[pursue longer term improvement].
• After loading the data into
elasticsearch, the visualization
provided by Kibana was better than
expected, so most of the time went
into improving the dashboard.
@drkurta
Insomnia
Example
@drkurta
Oncall Retro Input
@drkurta
Oncall Retro Results
@drkurta
Oncall Retro Results
@drkurta
• Manual
• Repetitive
• Automatable
• Tactical
• Without enduring value
• O(n) with service growth
(or worse!)
Toil is:
@drkurta
•Will you join the ATL?
@drkurta
Questions?
@drkurta
How bad is your toil? Measuring the Human Impact of Process

Más contenido relacionado

La actualidad más candente

Preliminary Examination Proposal Slides
Preliminary Examination Proposal SlidesPreliminary Examination Proposal Slides
Preliminary Examination Proposal SlidesManas Tungare
 
VSLive Orlando 2019 - When "We are down" is not good enough. SRE on Azure
VSLive Orlando 2019 - When "We are down" is not good enough. SRE on AzureVSLive Orlando 2019 - When "We are down" is not good enough. SRE on Azure
VSLive Orlando 2019 - When "We are down" is not good enough. SRE on AzureRene Van Osnabrugge
 
Patch Management - 2013
Patch Management - 2013Patch Management - 2013
Patch Management - 2013Vicky Ames
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Abeer R
 

La actualidad más candente (7)

Dit yvol3iss45
Dit yvol3iss45Dit yvol3iss45
Dit yvol3iss45
 
Root Cause Analysis Presentation
Root Cause Analysis PresentationRoot Cause Analysis Presentation
Root Cause Analysis Presentation
 
Root cause analysis training
Root cause analysis trainingRoot cause analysis training
Root cause analysis training
 
Preliminary Examination Proposal Slides
Preliminary Examination Proposal SlidesPreliminary Examination Proposal Slides
Preliminary Examination Proposal Slides
 
VSLive Orlando 2019 - When "We are down" is not good enough. SRE on Azure
VSLive Orlando 2019 - When "We are down" is not good enough. SRE on AzureVSLive Orlando 2019 - When "We are down" is not good enough. SRE on Azure
VSLive Orlando 2019 - When "We are down" is not good enough. SRE on Azure
 
Patch Management - 2013
Patch Management - 2013Patch Management - 2013
Patch Management - 2013
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)
 

Similar a How bad is your toil? Measuring the Human Impact of Process

#speakgeek - Support Processes for iconnect360
#speakgeek - Support Processes for iconnect360#speakgeek - Support Processes for iconnect360
#speakgeek - Support Processes for iconnect360Derek Chan
 
Bally chohan support
Bally chohan supportBally chohan support
Bally chohan supportdubai
 
Bally chohan support
Bally chohan supportBally chohan support
Bally chohan supportdubai
 
Reducing service interruption duration
Reducing service interruption durationReducing service interruption duration
Reducing service interruption durationErik Giles
 
Becoma an Ace in Analytics
Becoma an Ace in AnalyticsBecoma an Ace in Analytics
Becoma an Ace in AnalyticsKen Goossens
 
Robert Mc Geachy Common Pitfalls Agile
Robert Mc Geachy Common Pitfalls AgileRobert Mc Geachy Common Pitfalls Agile
Robert Mc Geachy Common Pitfalls AgileRobert McGeachy
 
#15NTC NTEN Help Desk or Service Desk? (Align Nonprofit Technology)
#15NTC NTEN Help Desk or Service Desk? (Align Nonprofit Technology)#15NTC NTEN Help Desk or Service Desk? (Align Nonprofit Technology)
#15NTC NTEN Help Desk or Service Desk? (Align Nonprofit Technology)Steve Heye
 
Visualisation&agile practices ai2014
Visualisation&agile practices ai2014Visualisation&agile practices ai2014
Visualisation&agile practices ai2014Balaji Muniraja
 
Zetta_Consultng_Brief.pptx
Zetta_Consultng_Brief.pptxZetta_Consultng_Brief.pptx
Zetta_Consultng_Brief.pptxYasir Habibullah
 
Andrew Shepherd - Rethink the service desk role to change its image forever
Andrew Shepherd - Rethink the service desk role to change its image foreverAndrew Shepherd - Rethink the service desk role to change its image forever
Andrew Shepherd - Rethink the service desk role to change its image foreveritSMF UK
 
2011 06 15 velocity conf from visible ops to dev ops final
2011 06 15 velocity conf   from visible ops to dev ops final2011 06 15 velocity conf   from visible ops to dev ops final
2011 06 15 velocity conf from visible ops to dev ops finalGene Kim
 
Stefanini Tech Team - Help Desk to Service Desk
Stefanini Tech Team - Help Desk to Service DeskStefanini Tech Team - Help Desk to Service Desk
Stefanini Tech Team - Help Desk to Service DeskIT Service and Support
 
· Choose an information system for an individual project.  During .docx
· Choose an information system for an individual project.  During .docx· Choose an information system for an individual project.  During .docx
· Choose an information system for an individual project.  During .docxLynellBull52
 
Culture First 2019: Day 2, Making remote work work
Culture First 2019: Day 2, Making remote work work Culture First 2019: Day 2, Making remote work work
Culture First 2019: Day 2, Making remote work work Culture Amp
 
Managing a Crisis, Women in Tech Summit Philly, 2017
Managing a Crisis, Women in Tech Summit Philly, 2017Managing a Crisis, Women in Tech Summit Philly, 2017
Managing a Crisis, Women in Tech Summit Philly, 2017Sarah Toms
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsRicardo Amaro
 

Similar a How bad is your toil? Measuring the Human Impact of Process (20)

#speakgeek - Support Processes for iconnect360
#speakgeek - Support Processes for iconnect360#speakgeek - Support Processes for iconnect360
#speakgeek - Support Processes for iconnect360
 
Bally chohan support
Bally chohan supportBally chohan support
Bally chohan support
 
Bally chohan support
Bally chohan supportBally chohan support
Bally chohan support
 
Reducing service interruption duration
Reducing service interruption durationReducing service interruption duration
Reducing service interruption duration
 
Becoma an Ace in Analytics
Becoma an Ace in AnalyticsBecoma an Ace in Analytics
Becoma an Ace in Analytics
 
Robert Mc Geachy Common Pitfalls Agile
Robert Mc Geachy Common Pitfalls AgileRobert Mc Geachy Common Pitfalls Agile
Robert Mc Geachy Common Pitfalls Agile
 
#15NTC NTEN Help Desk or Service Desk? (Align Nonprofit Technology)
#15NTC NTEN Help Desk or Service Desk? (Align Nonprofit Technology)#15NTC NTEN Help Desk or Service Desk? (Align Nonprofit Technology)
#15NTC NTEN Help Desk or Service Desk? (Align Nonprofit Technology)
 
Visualisation&agile practices ai2014
Visualisation&agile practices ai2014Visualisation&agile practices ai2014
Visualisation&agile practices ai2014
 
Working Effectively with PeopleSoft Support
Working Effectively with PeopleSoft SupportWorking Effectively with PeopleSoft Support
Working Effectively with PeopleSoft Support
 
Zetta_Consultng_Brief.pptx
Zetta_Consultng_Brief.pptxZetta_Consultng_Brief.pptx
Zetta_Consultng_Brief.pptx
 
met_ins_exerpts
met_ins_exerptsmet_ins_exerpts
met_ins_exerpts
 
Andrew Shepherd - Rethink the service desk role to change its image forever
Andrew Shepherd - Rethink the service desk role to change its image foreverAndrew Shepherd - Rethink the service desk role to change its image forever
Andrew Shepherd - Rethink the service desk role to change its image forever
 
Redmine
RedmineRedmine
Redmine
 
2011 06 15 velocity conf from visible ops to dev ops final
2011 06 15 velocity conf   from visible ops to dev ops final2011 06 15 velocity conf   from visible ops to dev ops final
2011 06 15 velocity conf from visible ops to dev ops final
 
Stefanini Tech Team - Help Desk to Service Desk
Stefanini Tech Team - Help Desk to Service DeskStefanini Tech Team - Help Desk to Service Desk
Stefanini Tech Team - Help Desk to Service Desk
 
· Choose an information system for an individual project.  During .docx
· Choose an information system for an individual project.  During .docx· Choose an information system for an individual project.  During .docx
· Choose an information system for an individual project.  During .docx
 
Culture First 2019: Day 2, Making remote work work
Culture First 2019: Day 2, Making remote work work Culture First 2019: Day 2, Making remote work work
Culture First 2019: Day 2, Making remote work work
 
Managing a Crisis, Women in Tech Summit Philly, 2017
Managing a Crisis, Women in Tech Summit Philly, 2017Managing a Crisis, Women in Tech Summit Philly, 2017
Managing a Crisis, Women in Tech Summit Philly, 2017
 
Scrum à la Pablo (English)
Scrum à la Pablo (English)Scrum à la Pablo (English)
Scrum à la Pablo (English)
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 

Más de Kurt Andersen

Collective Mindfulness for Better Decision Making
Collective Mindfulness for Better Decision MakingCollective Mindfulness for Better Decision Making
Collective Mindfulness for Better Decision MakingKurt Andersen
 
Assessing stages of practice
Assessing stages of practiceAssessing stages of practice
Assessing stages of practiceKurt Andersen
 
Facilitating DevOps Execution in an All Digital Environment
Facilitating DevOps Execution in an All Digital EnvironmentFacilitating DevOps Execution in an All Digital Environment
Facilitating DevOps Execution in an All Digital EnvironmentKurt Andersen
 
The NeverEnding Story: Site Reliability
The NeverEnding Story: Site ReliabilityThe NeverEnding Story: Site Reliability
The NeverEnding Story: Site ReliabilityKurt Andersen
 
Lessons from Iraq - Building & Running SRE Teams
Lessons from Iraq - Building & Running SRE TeamsLessons from Iraq - Building & Running SRE Teams
Lessons from Iraq - Building & Running SRE TeamsKurt Andersen
 
What You Need to Know About Email Authentication
What You Need to Know About Email AuthenticationWhat You Need to Know About Email Authentication
What You Need to Know About Email AuthenticationKurt Andersen
 
Weeping Angels of Site Reliability
Weeping Angels of Site ReliabilityWeeping Angels of Site Reliability
Weeping Angels of Site ReliabilityKurt Andersen
 
Join us at #SREcon15
Join us at #SREcon15Join us at #SREcon15
Join us at #SREcon15Kurt Andersen
 
Fighting Email Abuse with DMARC
Fighting Email Abuse with DMARCFighting Email Abuse with DMARC
Fighting Email Abuse with DMARCKurt Andersen
 
Operational Costs of Technical Debt
Operational Costs of Technical DebtOperational Costs of Technical Debt
Operational Costs of Technical DebtKurt Andersen
 

Más de Kurt Andersen (10)

Collective Mindfulness for Better Decision Making
Collective Mindfulness for Better Decision MakingCollective Mindfulness for Better Decision Making
Collective Mindfulness for Better Decision Making
 
Assessing stages of practice
Assessing stages of practiceAssessing stages of practice
Assessing stages of practice
 
Facilitating DevOps Execution in an All Digital Environment
Facilitating DevOps Execution in an All Digital EnvironmentFacilitating DevOps Execution in an All Digital Environment
Facilitating DevOps Execution in an All Digital Environment
 
The NeverEnding Story: Site Reliability
The NeverEnding Story: Site ReliabilityThe NeverEnding Story: Site Reliability
The NeverEnding Story: Site Reliability
 
Lessons from Iraq - Building & Running SRE Teams
Lessons from Iraq - Building & Running SRE TeamsLessons from Iraq - Building & Running SRE Teams
Lessons from Iraq - Building & Running SRE Teams
 
What You Need to Know About Email Authentication
What You Need to Know About Email AuthenticationWhat You Need to Know About Email Authentication
What You Need to Know About Email Authentication
 
Weeping Angels of Site Reliability
Weeping Angels of Site ReliabilityWeeping Angels of Site Reliability
Weeping Angels of Site Reliability
 
Join us at #SREcon15
Join us at #SREcon15Join us at #SREcon15
Join us at #SREcon15
 
Fighting Email Abuse with DMARC
Fighting Email Abuse with DMARCFighting Email Abuse with DMARC
Fighting Email Abuse with DMARC
 
Operational Costs of Technical Debt
Operational Costs of Technical DebtOperational Costs of Technical Debt
Operational Costs of Technical Debt
 

Último

𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Último (20)

𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Call Girls In Noida 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Noida 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In Noida 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Noida 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 

How bad is your toil? Measuring the Human Impact of Process

  • 1. How Bad is Your Toil? Measuring the Human Impact of Process Kurt Andersen LinkedIn Product Site Reliability @drkurta
  • 2. What is this “toil” of which you speak? @drkurta
  • 3. • Manual • Repetitive • Automatable • Tactical • Without enduring value • O(n) with service growth (or worse!) Toil is: @drkurta
  • 4. Mean Time To Sleep (MTTS) • Ryan Frantz & Laurie Denness from Etsy (at the time) • Velocity 2014 • Looked at the impact of oncall vs. life • https://github.com/etsy/opsweekly @drkurta
  • 5. Round 1 — Time Usage Survey @drkurta
  • 6. General Time Categories Writing/maintaining code and associated design & docs S/W Eng Making lasting changes to entire systems; consulting on design & architecture System Eng / Architecture Taking vacation should not count against other baselines Vacation (DTO) Team meetings, processes, training, admin tasks, hiring Overhead Manual, repetitive, lacks enduring value Toil / Reactive @drkurta
  • 7. SRE-Time Survey • Run monthly from June- November 2018 • Average time to complete: 3m • Some of the comments were the most instructive @drkurta
  • 9. SRE-Time Definitions Reactive Work Reactive work “is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” Oncall time (either primary or secondary) is a base level of reactivity. “[Google] SREs report that their top source of reactive work is interrupts (that is, non-urgent service-related messages and emails). The next leading source is on-call (urgent) response, followed by releases and pushes.” @drkurta
  • 12. SRE-Time Comments • [Tool X] needs to go, or be supplemented… Its limitations flow into every corner of how we do our job… • I find that interviewing can be a huge off-the-top hit against potentially productive time… • Reactive work for my team is mostly confined to oncall, but our oncall week is almost 100% reactive for 12+ hours a day. • Most of my toil is interrupt driven work, working around broken tooling, horizontal initiatives, and on call tasks. I also go to meetings. @drkurta
  • 13. Inferring interrupts from agile velocity I nt erl ude • Manual • Repetitive • Automatable • Tactical • Without enduring value • O(n) with service growth @drkurta
  • 14. Round 2 — Oncall Pain Working Group @drkurta
  • 15. Oncall Pain Survey • Identify the problem areas that make oncall painful • Develop tools and strategies to address those areas • Survey run twice: Spring and Fall 2017 @drkurta
  • 16. Oncall Pain “Scoring” • 1 – Not a pain point through • 5 – Urgently needs improvement • or N/A if it does not apply to you/your team @drkurta
  • 17. Oncall Pain “General Issues” • Duration of on-call shifts • Frequency of on-call shifts • Sleep-interrupting alerts & issues • Work/Life balance interruption from weekend or Holiday on-call • Inability to progress on other projects while on-call • Difficulties contacting or receiving assistance from product POCs (teammates) or developers • Difficulties contacting or receiving assistance from other teams within LinkedIn @drkurta
  • 18. Oncall Pain “Alerts” • Spurious alerts requiring no action • Challenges finding alert-specific severity information • Challenges finding useful alert-specific resolution documentation • Alerts due to problems with a downstream service or resource • Following & cleaning up email-only alerts (low-level) • Tracking alerts which need to be tuned • Tuning and updating alerts @drkurta
  • 19. Oncall Pain “Daily Activities” • Mandatory meetings attended by on-call • Deployments handled by SREs • Interruptions for deployment/rollback assistance • LiX / RB approvals • Interruptions for “urgent” LiX / RB approvals • Other dev support-related interruptions (detail below) • Managing incoming team email • Triaging new tickets @drkurta
  • 20. Oncall Pain “Additional Details” • Are there other aspects of oncall not listed above which you would rate 4-5 (towards the “urgent/highly stressful” end of the scale)? • For any items you ranked with 4 or above, please provide any details you'd like to add about how these aspects of on-call adversely impact you, and any thoughts you have about how to improve the situation. • Have there been any changes to how your team handles oncall within the last ~6 months which you feel have had a lasting positive or negative impact in your oncall experience? If so, please elaborate on them here. @drkurta
  • 21. Downstream Issues SP’17 Issue Average Median # of 4+ # of N/A Alerts due to problems with a downstream service or resource 3.8 4 32 1 Inability to progress on other projects while on-call 3.6 4 30 1 Spurious alerts requiring no action 3.2 3 23 0 Challenges finding useful alert-specific resolution documentation 3.1 3 21 0 Sleep-interrupting alerts & issues 3.1 3 20 2 Tuning and updating alerts 2.9 3 16 2 Challenges finding alert-specific severity information 2.8 3 19 0 Interruptions for deployment/rollback assistance 2.8 3 16 1 Following & cleaning up email-only alerts (low-level) 2.8 3 14 1 Work/Life balance interruption from weekend or Holiday on-call 2.7 3 12 1 Other dev support-related interruptions (detail below) 2.7 3 12 2 Interruptions for "urgent" LiX / RB approvals 2.7 3 14 0 Spring’17 @drkurta
  • 22. Alert Management SP’17 Issue Average Median # of 4+ # of N/A Alerts due to problems with a downstream service or resource 3.8 4 32 1 Inability to progress on other projects while on-call 3.6 4 30 1 Spurious alerts requiring no action 3.2 3 23 0 Challenges finding useful alert-specific resolution documentation 3.1 3 21 0 Sleep-interrupting alerts & issues 3.1 3 20 2 Tuning and updating alerts 2.9 3 16 2 Challenges finding alert-specific severity information 2.8 3 19 0 Interruptions for deployment/rollback assistance 2.8 3 16 1 Following & cleaning up email-only alerts (low-level) 2.8 3 14 1 Work/Life balance interruption from weekend or Holiday on-call 2.7 3 12 1 Other dev support-related interruptions (detail below) 2.7 3 12 2 Interruptions for "urgent" LiX / RB approvals 2.7 3 14 0 Spring’17 @drkurta
  • 23. Interruptions SP’17 Issue Average Median # of 4+ # of N/A Alerts due to problems with a downstream service or resource 3.8 4 32 1 Inability to progress on other projects while on-call 3.6 4 30 1 Spurious alerts requiring no action 3.2 3 23 0 Challenges finding useful alert-specific resolution documentation 3.1 3 21 0 Sleep-interrupting alerts & issues 3.1 3 20 2 Tuning and updating alerts 2.9 3 16 2 Challenges finding alert-specific severity information 2.8 3 19 0 Interruptions for deployment/rollback assistance 2.8 3 16 1 Following & cleaning up email-only alerts (low-level) 2.8 3 14 1 Work/Life balance interruption from weekend or Holiday on-call 2.7 3 12 1 Other dev support-related interruptions (detail below) 2.7 3 12 2 Interruptions for "urgent" LiX / RB approvals 2.7 3 14 0 Spring’17 @drkurta
  • 24. Response Patterns by Team SP’17 Average 1 2 3 4 5 6 7 8 9 A B C D 3.8 2.0 Spring’17 @drkurta
  • 25. Methods and Tools Downstream Alerts Interrupts Linkedout Alert Correlation Oncall Retrospective Improve alert notes Oncall Scheduling Dev Oncall @drkurta
  • 26. Oncall Pain “Fall Follow-up” Part 1 • Leverage the oncall retro tool to review alerts / identify those needing improvement • Change in on-call scheduling (4/3, etc) • Nurse plan with alert correlation to prevent/delay unneeded alert • Alert note improvements @drkurta
  • 27. Oncall Pain “Fall Follow-up” Part 2 • Have there been any changes (including those above) to how your team handles on- call within the time since the last survey (April 2017) which you feel have had a lasting positive or negative impact in your on-call experience? If so, please elaborate on them here @drkurta
  • 28. Improvement Results 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Leverage the oncall retro tool to review alerts / identify those needing improvement Change in on-call scheduling (4/3, etc) Nurse plan with alert correlation to prevent/delay unneeded alert Alert note improvements Much Better Slightly Better Unchanged Slightly Worse Much Worse @drkurta
  • 30. INsomnia • Goal: create a weekly oncall dashboard for the shift handover so that we can have some data to [pursue longer term improvement]. • After loading the data into elasticsearch, the visualization provided by Kibana was better than expected, so most of the time went into improving the dashboard. @drkurta
  • 35. • Manual • Repetitive • Automatable • Tactical • Without enduring value • O(n) with service growth (or worse!) Toil is: @drkurta
  • 36. •Will you join the ATL? @drkurta