SlideShare una empresa de Scribd logo
1 de 35
Jesse Robbins
Cofounder, Opscode

@jesserobbins
jesse@opscode.com




                     1
Join Us!!!
             2
“You don’t choose the moment,
  the moment chooses you.

You only choose how prepared
    you are when it does.”
             -Fire Chief Mike Burtch




                                       3
Operations is Work that Matters




                                  4
GameDay



          5
define:
 GameDay
   An exercise designed to increase
   Resilience through large-scale fault
   injection across critical systems.

   Part of a larger discipline called
   Resilience Engineering.

   Not new, just new to us ;-)
define:
 Resilience
   Resilience is a the ability
   of a System to adapt to
   changes, failures, &
   disturbances.
define:
 System
   People
   Culture
   Processes
   Applications & Services
   Infrastructure
   Software
   Hardware
This will be on the test:
Resilience is a product of
    People & Culture


                             9
Copyright © 2010 Opscode, Inc - All Rights Reserved   10
This will be on the test:
FAILURE HAPPENS!
“multiple & unexpected interactions of
    failures are inevitable”
                       -Charles Perrow
Catastrophic Potential
           Simple             Complexity                               Complex


   Tight
                                                     KEEP
                                                     OUT!!!
Coupling
 Loose




                                             Created by Jesse Robbins
              "Catastrophic Potential" adapted from Normal Accidents by Charles Perrow   14
define:
 The Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
   99.99% 53 min
   99.999% 5 min
   99.9999% 30 Seconds
   99.99999% 3 Seconds
99.9% *
99.9% *
99.9%
   =
99.7% (oops!)
                16
MTTR > MTBF


              17
I ❤ MTTR
           18
Copyright © 2010 Opscode, Inc - All Rights Reserved   19
GameDay


Slide Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
http://www.flickr.com/photos/dnorman/2678090600                                                                            20
Useful Ops Personality Defects



          25%       Pyromaniac



  75%                Paranoid
setting a good example
GameDay increases Resilience in 3 ways
 Preparation
  ‣ Identification and mitigation of risks and impact from
    failure
  ‣ Reduces frequency of failure (MTBF)
  ‣ Reduces duration of recovery (MTTR)
 Participation
  ‣ Builds confidence & competence responding to failure
    and under stress.
  ‣ Strengthens individual and cultural ability to anticipate,
    mitigate, respond to, and recover from failures of all
    types.
 Exercises
  ‣ Trigger and expose “latent defects”
  ‣ Choose discover them, instead of letting that be
    determined by the next real disaster.
                                                                 23
start small...
http://www.flickr.com/photos/oakleyoriginals/5674150237   24
increase awareness
 http://www.flickr.com/photos/maunzy/5099921731   Copyright © 2010 Opscode, Inc - All Rights Reserved   25
build confidence
http://www.flickr.com/photos/skevbo/4864249944
full scale, live fire exercises
http://tacomafiredepartment.blogspot.com/2010/05/west-slope-training-burn.html Opscode, Inc - All Rights Reserved
                                                                  Copyright © 2010                                  27
safety standards &
                   “building codes”
http://www.flickr.com/photos/peregrinari/3801964067   28
GameDay increases Resilience in 3 ways
 Preparation
  ‣ Identification and mitigation of risks and impact from
    failure
  ‣ Reduces frequency of failure (MTBF)
  ‣ Reduces duration of recovery (MTTR)
 Participation
  ‣ Builds confidence & competence responding to failure
    and under stress.
  ‣ Strengthens individual and cultural ability to anticipate,
    mitigate, respond to, and recover from failures of all
    types.
 Exercises
  ‣ Trigger and expose “latent defects”
  ‣ Choose discover them, instead of letting that be
    determined by the next real disaster.
                                                                 29
no substitutes for experience...
 Failure free operations require
 experience with failure.
Ana Grillo © Ana Grillo Photography
                                      30
The “OODA” Loop
Observe, Orient, Decide, Act



                               31
OODA: Observe, Orient, Decide, Act




             http://en.wikipedia.org/wiki/OODA_loop




                                                      32
“You don’t choose the moment,
  the moment chooses you.

You only choose how prepared
    you are when it does.”
             -Fire Chief Mike Burtch




                                       33
Jesse Robbins
Cofounder, Opscode

@jesserobbins
jesse@opscode.com




                     34
Please See:	

 John Allspaw
  ‣ Resilience Engineering:
    http://www.kitchensoap.com/2011/04/07/resilience-engineering-part-i/

  ‣ Advanced Post Mortem Fu:
    http://www.slideshare.net/jallspaw/advanced-postmortem-fu-and-human-error-101-velocity-2011




 Dr. Richard Cook
  ‣ How Complex Systems Fail
    http://www.ctlab.org/documents/How%20Complex%20Systems%20Fail.pdf




                                                                                                  35

Más contenido relacionado

La actualidad más candente

DevSecOps Training Bootcamp - A Practical DevSecOps Course
DevSecOps Training Bootcamp - A Practical DevSecOps CourseDevSecOps Training Bootcamp - A Practical DevSecOps Course
DevSecOps Training Bootcamp - A Practical DevSecOps Course
Tonex
 

La actualidad más candente (20)

Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
 
DevSecOps Training Bootcamp - A Practical DevSecOps Course
DevSecOps Training Bootcamp - A Practical DevSecOps CourseDevSecOps Training Bootcamp - A Practical DevSecOps Course
DevSecOps Training Bootcamp - A Practical DevSecOps Course
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShift
 
ROI & Business Value of CI, CD, DevOps, DevSecOps, & Microservices
ROI & Business Value of CI, CD, DevOps, DevSecOps, & MicroservicesROI & Business Value of CI, CD, DevOps, DevSecOps, & Microservices
ROI & Business Value of CI, CD, DevOps, DevSecOps, & Microservices
 
GitOps with ArgoCD
GitOps with ArgoCDGitOps with ArgoCD
GitOps with ArgoCD
 
Transforming Organizations with CI/CD
Transforming Organizations with CI/CDTransforming Organizations with CI/CD
Transforming Organizations with CI/CD
 
Service Mesh - Why? How? What?
Service Mesh - Why? How? What?Service Mesh - Why? How? What?
Service Mesh - Why? How? What?
 
Platform as a Product: How to Delight Your Developers and Deliver Value for Y...
Platform as a Product: How to Delight Your Developers and Deliver Value for Y...Platform as a Product: How to Delight Your Developers and Deliver Value for Y...
Platform as a Product: How to Delight Your Developers and Deliver Value for Y...
 
Cloud Ubuntu Open Stack, Juju, MaaS - Ua Deck Nov 2013
Cloud Ubuntu Open Stack, Juju, MaaS - Ua Deck Nov 2013Cloud Ubuntu Open Stack, Juju, MaaS - Ua Deck Nov 2013
Cloud Ubuntu Open Stack, Juju, MaaS - Ua Deck Nov 2013
 
Desktop Management Using Microsoft SCCM
Desktop Management Using Microsoft SCCMDesktop Management Using Microsoft SCCM
Desktop Management Using Microsoft SCCM
 
Monitoring Java Applications with Prometheus and Grafana
Monitoring Java Applications with Prometheus and GrafanaMonitoring Java Applications with Prometheus and Grafana
Monitoring Java Applications with Prometheus and Grafana
 
Introduction to Azure DevOps
Introduction to Azure DevOpsIntroduction to Azure DevOps
Introduction to Azure DevOps
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red Hat
 
Introduction to CICD
Introduction to CICDIntroduction to CICD
Introduction to CICD
 
Breaking down agile requirements in Agile Methodology
Breaking down agile requirements in Agile MethodologyBreaking down agile requirements in Agile Methodology
Breaking down agile requirements in Agile Methodology
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
 
Azure DevOps Presentation
Azure DevOps PresentationAzure DevOps Presentation
Azure DevOps Presentation
 
Microservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous DeliveryMicroservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous Delivery
 
Drive business outcomes using Azure Devops
Drive business outcomes using Azure DevopsDrive business outcomes using Azure Devops
Drive business outcomes using Azure Devops
 

Similar a GameDay: Creating Resiliency Through Destruction - LISA11

Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014
Noah Sussman
 
SecureWorld: Security is Dead, Rugged DevOps 1f
SecureWorld:  Security is Dead, Rugged DevOps 1fSecureWorld:  Security is Dead, Rugged DevOps 1f
SecureWorld: Security is Dead, Rugged DevOps 1f
Gene Kim
 
Nurturing Failure: reconceptualizing library leadership to embrace change and...
Nurturing Failure: reconceptualizing library leadership to embrace change and...Nurturing Failure: reconceptualizing library leadership to embrace change and...
Nurturing Failure: reconceptualizing library leadership to embrace change and...
Chris Sweet
 

Similar a GameDay: Creating Resiliency Through Destruction - LISA11 (20)

Jeff Atwood, Michael Krakovskiy "The role of catastrophic failure in software...
Jeff Atwood, Michael Krakovskiy "The role of catastrophic failure in software...Jeff Atwood, Michael Krakovskiy "The role of catastrophic failure in software...
Jeff Atwood, Michael Krakovskiy "The role of catastrophic failure in software...
 
Gamification of Chaos Testing
Gamification of Chaos TestingGamification of Chaos Testing
Gamification of Chaos Testing
 
From Darwin to Design
From Darwin to DesignFrom Darwin to Design
From Darwin to Design
 
How to Use Agile to Move the Earth
How to Use Agile to Move the EarthHow to Use Agile to Move the Earth
How to Use Agile to Move the Earth
 
Disaster Terms Defined For Better Understanding
Disaster Terms Defined For Better UnderstandingDisaster Terms Defined For Better Understanding
Disaster Terms Defined For Better Understanding
 
The Most Important Thing: How Mozilla Does Security and What You Can Steal
The Most Important Thing: How Mozilla Does Security and What You Can StealThe Most Important Thing: How Mozilla Does Security and What You Can Steal
The Most Important Thing: How Mozilla Does Security and What You Can Steal
 
Modelling "Effects" in Simulation and Training.
Modelling "Effects" in Simulation and Training.Modelling "Effects" in Simulation and Training.
Modelling "Effects" in Simulation and Training.
 
Extending human capabilities: Design for people, not around
Extending human capabilities: Design for people, not around Extending human capabilities: Design for people, not around
Extending human capabilities: Design for people, not around
 
Gamification of Chaos Testing
Gamification of Chaos TestingGamification of Chaos Testing
Gamification of Chaos Testing
 
The Fail Lecture
The Fail LectureThe Fail Lecture
The Fail Lecture
 
Human Error Prevention
Human Error PreventionHuman Error Prevention
Human Error Prevention
 
DevOps @ InterOP Las Vegas - Jesse Robbins - Opscode
DevOps @ InterOP Las Vegas - Jesse Robbins - OpscodeDevOps @ InterOP Las Vegas - Jesse Robbins - Opscode
DevOps @ InterOP Las Vegas - Jesse Robbins - Opscode
 
Fail4Lib
Fail4LibFail4Lib
Fail4Lib
 
Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeries
 
BSidesAugusta 2022 - The Power of the OT Security Playbook
BSidesAugusta 2022 - The Power of the OT Security PlaybookBSidesAugusta 2022 - The Power of the OT Security Playbook
BSidesAugusta 2022 - The Power of the OT Security Playbook
 
SecureWorld: Security is Dead, Rugged DevOps 1f
SecureWorld:  Security is Dead, Rugged DevOps 1fSecureWorld:  Security is Dead, Rugged DevOps 1f
SecureWorld: Security is Dead, Rugged DevOps 1f
 
Accident Investigations - Blame and Shame or Listen and Learn?
Accident Investigations - Blame and Shame or Listen and Learn? Accident Investigations - Blame and Shame or Listen and Learn?
Accident Investigations - Blame and Shame or Listen and Learn?
 
Without Resilience, Nothing Else Matters
Without Resilience, Nothing Else MattersWithout Resilience, Nothing Else Matters
Without Resilience, Nothing Else Matters
 
Nurturing Failure: reconceptualizing library leadership to embrace change and...
Nurturing Failure: reconceptualizing library leadership to embrace change and...Nurturing Failure: reconceptualizing library leadership to embrace change and...
Nurturing Failure: reconceptualizing library leadership to embrace change and...
 

Más de Jesse Robbins

Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
Jesse Robbins
 
Failure Happens Interop Nyc
Failure Happens Interop NycFailure Happens Interop Nyc
Failure Happens Interop Nyc
Jesse Robbins
 
Serving Those That Serve Others Web2 Summit Jesse Robbins Final
Serving Those That Serve Others Web2 Summit Jesse Robbins FinalServing Those That Serve Others Web2 Summit Jesse Robbins Final
Serving Those That Serve Others Web2 Summit Jesse Robbins Final
Jesse Robbins
 
Failure Happens: CloudCamp Interop
Failure Happens: CloudCamp InteropFailure Happens: CloudCamp Interop
Failure Happens: CloudCamp Interop
Jesse Robbins
 
DisasterTech Presentation @ NEMA
DisasterTech Presentation @ NEMADisasterTech Presentation @ NEMA
DisasterTech Presentation @ NEMA
Jesse Robbins
 
ETech2008 DisasterTech Robbins Maron 20080305a
ETech2008 DisasterTech Robbins Maron 20080305aETech2008 DisasterTech Robbins Maron 20080305a
ETech2008 DisasterTech Robbins Maron 20080305a
Jesse Robbins
 

Más de Jesse Robbins (14)

Jesse Robbins @ MWC 2015 - Building Orion Onyx - Real-time wearable push to t...
Jesse Robbins @ MWC 2015 - Building Orion Onyx - Real-time wearable push to t...Jesse Robbins @ MWC 2015 - Building Orion Onyx - Real-time wearable push to t...
Jesse Robbins @ MWC 2015 - Building Orion Onyx - Real-time wearable push to t...
 
Orion Labs - From Bits to Atoms
Orion Labs - From Bits to AtomsOrion Labs - From Bits to Atoms
Orion Labs - From Bits to Atoms
 
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
Jesse Robbins Keynote - Hacking Culture @ Cloud Expo Europe 2013
 
Continuous Deployment & Delivery + Culture Hacks @ QCON 2012
Continuous Deployment & Delivery + Culture Hacks @ QCON 2012Continuous Deployment & Delivery + Culture Hacks @ QCON 2012
Continuous Deployment & Delivery + Culture Hacks @ QCON 2012
 
Hacking Culture at VelocityConf
Hacking Culture at VelocityConfHacking Culture at VelocityConf
Hacking Culture at VelocityConf
 
Rebooting a Cloud
Rebooting a CloudRebooting a Cloud
Rebooting a Cloud
 
Gov 2.0: Scaling, Automation, & Management in the Cloud
Gov 2.0: Scaling, Automation, & Management in the CloudGov 2.0: Scaling, Automation, & Management in the Cloud
Gov 2.0: Scaling, Automation, & Management in the Cloud
 
Cloud Operations Bootcamp: Culture - Jesse Robbins
Cloud Operations Bootcamp: Culture - Jesse Robbins Cloud Operations Bootcamp: Culture - Jesse Robbins
Cloud Operations Bootcamp: Culture - Jesse Robbins
 
Failure Happens Interop Nyc
Failure Happens Interop NycFailure Happens Interop Nyc
Failure Happens Interop Nyc
 
Using Chef for Automated Infrastructure in the Cloud
Using Chef for Automated Infrastructure in the CloudUsing Chef for Automated Infrastructure in the Cloud
Using Chef for Automated Infrastructure in the Cloud
 
Serving Those That Serve Others Web2 Summit Jesse Robbins Final
Serving Those That Serve Others Web2 Summit Jesse Robbins FinalServing Those That Serve Others Web2 Summit Jesse Robbins Final
Serving Those That Serve Others Web2 Summit Jesse Robbins Final
 
Failure Happens: CloudCamp Interop
Failure Happens: CloudCamp InteropFailure Happens: CloudCamp Interop
Failure Happens: CloudCamp Interop
 
DisasterTech Presentation @ NEMA
DisasterTech Presentation @ NEMADisasterTech Presentation @ NEMA
DisasterTech Presentation @ NEMA
 
ETech2008 DisasterTech Robbins Maron 20080305a
ETech2008 DisasterTech Robbins Maron 20080305aETech2008 DisasterTech Robbins Maron 20080305a
ETech2008 DisasterTech Robbins Maron 20080305a
 

Último

Último (20)

Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 

GameDay: Creating Resiliency Through Destruction - LISA11