2. I break systems… a LOT
● Auth
● Syslog
● Chef
● Ambassadors
● Prod Frontends
3. Sometimes I ‘break’ systems on purpose...
● Service discovery by chef
● 90% code in prod
● No shared storage for cloudstack
Sometimes you just need do things.
4. Higher standards
And yet, I still hold others to a higher standard..
● Servers still on public internet???
● Created a flat VLAN when we did move to private IPs???
● No centralized management of virtualization infrastructure???
● The only 'shared storage' is via DRBD and ha.d???
5. Technical debtor’s prison
We’re obsessed with technical debt
Qualifying it:
● Application Debt
● Infrastructure Debt
● Architecture Debt
Quantifying it:
● size of code base
● code coverage
● coupling and cohesion reports
● cyclomatic complexity
● Halstead complexity measures
6. The myth of technical debt
Peter Norvig, “All code is liability”
Not actually technical debt:
● Maintenance
● Changes in understanding
● Operational inertia
● Poor code choices
● Dependency liabilities
7. So what is technical debt?
Technical debt is the choices we intentionally make to speed up the development
or implementation of systems, and which we acknowledge will need to be
changed later.
Technical debt is the result of an Efficiency-Thoroughness Trade-Off at an
individual level.
Technical debt is the output of a project constraint model at an organizational
level.
8. The blame game
Shouldn't we stop blaming people for making the trade-offs they're forced to
make?
9. Being Blameless
● If we remove fear we will have a more
honest conversation about trade-offs
● if we're honest about those trade-offs
crisis might be averted altogether
● If we understand our history, we won't be
destined to repeat it
10. What is blameless system design?
Assuming goodwill
Blameless post-mortems
Empathy
Experimentation
Honesty
Communication
11. Assume Goodwill
Your co-worker probably doesn’t come into work every day with
the intent of harming you or the organization.
12. Blameless Post-mortems
“We must strive to understand that accidents don’t
happen because people gamble and lose.
Accidents happen because the person believes that:
…what is about to happen is not possible,
…or what is about to happen has no connection to
what they are doing,
…or that the possibility of getting the intended
outcome is well worth whatever risk there is.”
- Erik Hollnagel
14. Experimentation
The Engineering Design Process
● Define the Problem
● Do Background Research
● Specify Requirements
● Brainstorm Solutions
● Choose the Best Solution
● Do Development Work
● Build a Prototype
● Test and Redesign
15. Honesty
● Publish ALL your results
● Document ALL your decisions
● Be honest about trade-offs
● Track mitigations
17. Did someone say devops?
● Culture
● Measurement
● Sharing
● Feedback loops
18. The bad
It’s hard to change culture and get away from a retribution
culture and the RCA mentality
It’s hard to get over hindsight bias.
It’s a lot of work to encourage openness and honesty, and
define what that looks like.
It’s hard to get over their impostor syndrome and / or contempt
cultures.
19. The good
● Remove fear
● Encourage ‘risk’
● Create feedback
● Reduce redundant learning
● Improve working environment, trust
20. Douglas Land - Director of operations, Vast.com, Inc.
doug@webuilddevops.com | @webuilddevops
Some References:
http://www.datical.com/blog/technical-debt-devops/
http://laughingmeme.org/2016/01/10/towards-an-understanding-of-technical-debt/
http://blog.aurynn.com/86/contempt-culture
http://erikhollnagel.com/ideas/etto-principle/index.html
http://indecorous.com/fallible_humans/
https://hbr.org/2003/05/it-doesnt-matter/ar/pr
https://codeascraft.com/2014/07/18/just-culture-resources/
http://sidneydekker.com/just-culture/