SlideShare una empresa de Scribd logo
1 de 12
NETFLIX’S
CHAOS
MONKEY
Michael Whitehead
“EVERYTHING FAILS ALL THE TIME”
- WERNER VOGELS
CHAOS MONKEY
A service that causes failure and
wreaks havoc on instances in Auto
Scaling Groups
A member of the Simian Army
developed by Netflix
WHY WOULD WE INTENTIONALLY
CAUSE FAILURE?!?
 It is inevitable
 Infrastructure is Complex
 Forcing failure puts you in control
 Identify faults in your architecture
• Does you load balancers reroute traffic correctly?
• Do your instances function correctly when they come back up?
• Are you monitoring tools alerting you on important events?
GETTING STARTED WITH CHAOS MONKEY
 Amazon Web Services
 Must be using Auto Scaling Groups
 Uses Amazon SimpleDB for event storage
 Simple Email Service setup (optional for notifications)
 Can be used with Netflix’s Asgard (optional)
 Java 7 JDK or newer
WOW!
EXAMPLE WITH CLOUDFORMATION
NEAT!
AWESOME!
COOL!
NO WAY!
BUILDING & CONFIGURATION
 Clone SimianArmy repo from Github
 Builds using Gradle
 Runs 6 times a day during business hours- 9am to 3pm
 Does not run on holidays or weekends
 Timeframes and frequency of runs can be configured
IMPORTANT PROPERTIES
 Enabling Chaos Monkey
 Set simianarmy.chaos.enabled = true
 Set simianarmy.chaos.leashed=false
 Probability of 1 instance being terminated per day per ASG
 simianarmy.chaos.ASG.probability = 1.0
 Opt-in or Opt-out model
OPT-IN / OPT-OUT MODEL
 Set to False = Opt-in Set to True = Opt-out
 simianarmy.chaos.ASG.enabled = false
 When Opt-In (false) you must enable each auto scaling group you
want to run Chaos Monkey in
 simianarmy.chaos.<<auto scaling group name>>.enabled = true
 When Opt-Out (true) you must disable each auto scaling group
you do not want it to run in
 simianarmy.chaos.<<auto scaling group name>>.enabled = false
EMAIL NOTIFICATIONS
ARE TERMINATIONS ALL IT CAN DO?
 Block all network traffic
 Burn CPU
 Burn IO
 Fill Disk
 Kill Processes
 Network Loss
 Null-Route
• All EC2 <-> EC2 traffic
SSH REQUIRED
 Detach all EBS volumes
 Fail DNS
 Fail EC2 API
 Fail S3 API
 Fail DynamoDB API
 Network Corruption
 Network Latency
LINKS
 CloudFormation Template:
https://github.com/joehack3r/aws/blob/master/cloudformation/te
mplates/chaosMonkey.json
 Chaos Monkey Announcement:
http://techblog.netflix.com/2012/07/chaos-monkey-released-into-
wild.html
 Simian Army Quick Start Guide:
https://github.com/Netflix/SimianArmy/wiki/Quick-Start-Guide
 Chaos Monkey Configuration:
https://github.com/Netflix/SimianArmy/wiki/Chaos-Settings
 Chaos Monkey Army:
https://github.com/Netflix/SimianArmy/wiki/The-Chaos-Monkey-Army

Más contenido relacionado

La actualidad más candente

Selenium interview questions and answers
Selenium interview questions and answersSelenium interview questions and answers
Selenium interview questions and answerskavinilavuG
 
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllGraal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllThomas Wuerthinger
 
Exception handling in java
Exception handling in javaException handling in java
Exception handling in javapooja kumari
 
Introduction to MuleSoft Anytime Platform
Introduction to MuleSoft Anytime PlatformIntroduction to MuleSoft Anytime Platform
Introduction to MuleSoft Anytime PlatformSalesforce Developers
 
MuleSoft Sizing Guidelines - VirtualMuleys
MuleSoft Sizing Guidelines - VirtualMuleysMuleSoft Sizing Guidelines - VirtualMuleys
MuleSoft Sizing Guidelines - VirtualMuleysAngel Alberici
 
Automate All The Things with Flow
Automate All The Things with FlowAutomate All The Things with Flow
Automate All The Things with FlowSalesforce Admins
 
Dataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreDataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreVikalp Bhalia
 
Marketing Cloud integration with MuleSoft
Marketing Cloud integration with MuleSoftMarketing Cloud integration with MuleSoft
Marketing Cloud integration with MuleSoftPatryk Bandurski
 
Infrastructure as Microservices - OReillySACon London 2016
Infrastructure as Microservices - OReillySACon London 2016Infrastructure as Microservices - OReillySACon London 2016
Infrastructure as Microservices - OReillySACon London 2016Kief Morris
 
Denver MuleSoft Meetup: Deep Dive into Anypoint Runtime Fabric Security
Denver MuleSoft Meetup: Deep Dive into Anypoint Runtime Fabric Security Denver MuleSoft Meetup: Deep Dive into Anypoint Runtime Fabric Security
Denver MuleSoft Meetup: Deep Dive into Anypoint Runtime Fabric Security Stephanie Lawrence
 
Demystifying the use of circuit breakers with MuleSoft
Demystifying the use of circuit breakers with MuleSoftDemystifying the use of circuit breakers with MuleSoft
Demystifying the use of circuit breakers with MuleSoftSandeep Deshmukh
 
MuleSoft Architecture Presentation
MuleSoft Architecture PresentationMuleSoft Architecture Presentation
MuleSoft Architecture PresentationRupesh Sinha
 
Logging best practice in mule using logger component
Logging best practice in mule using logger componentLogging best practice in mule using logger component
Logging best practice in mule using logger componentGovind Mulinti
 
The Ideal Salesforce Development Lifecycle
The Ideal Salesforce Development LifecycleThe Ideal Salesforce Development Lifecycle
The Ideal Salesforce Development LifecycleJoshua Hoskins
 
Salesforce Developer Console ppt
Salesforce Developer Console  pptSalesforce Developer Console  ppt
Salesforce Developer Console pptKuhinoor Alom
 

La actualidad más candente (20)

Selenium interview questions and answers
Selenium interview questions and answersSelenium interview questions and answers
Selenium interview questions and answers
 
.Net
.Net.Net
.Net
 
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllGraal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them All
 
Exception handling in java
Exception handling in javaException handling in java
Exception handling in java
 
Introduction to MuleSoft Anytime Platform
Introduction to MuleSoft Anytime PlatformIntroduction to MuleSoft Anytime Platform
Introduction to MuleSoft Anytime Platform
 
MuleSoft Sizing Guidelines - VirtualMuleys
MuleSoft Sizing Guidelines - VirtualMuleysMuleSoft Sizing Guidelines - VirtualMuleys
MuleSoft Sizing Guidelines - VirtualMuleys
 
Exception Handling in Java
Exception Handling in JavaException Handling in Java
Exception Handling in Java
 
Automate All The Things with Flow
Automate All The Things with FlowAutomate All The Things with Flow
Automate All The Things with Flow
 
Dataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreDataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStore
 
Marketing Cloud integration with MuleSoft
Marketing Cloud integration with MuleSoftMarketing Cloud integration with MuleSoft
Marketing Cloud integration with MuleSoft
 
Infrastructure as Microservices - OReillySACon London 2016
Infrastructure as Microservices - OReillySACon London 2016Infrastructure as Microservices - OReillySACon London 2016
Infrastructure as Microservices - OReillySACon London 2016
 
Denver MuleSoft Meetup: Deep Dive into Anypoint Runtime Fabric Security
Denver MuleSoft Meetup: Deep Dive into Anypoint Runtime Fabric Security Denver MuleSoft Meetup: Deep Dive into Anypoint Runtime Fabric Security
Denver MuleSoft Meetup: Deep Dive into Anypoint Runtime Fabric Security
 
Demystifying the use of circuit breakers with MuleSoft
Demystifying the use of circuit breakers with MuleSoftDemystifying the use of circuit breakers with MuleSoft
Demystifying the use of circuit breakers with MuleSoft
 
MuleSoft Architecture Presentation
MuleSoft Architecture PresentationMuleSoft Architecture Presentation
MuleSoft Architecture Presentation
 
Logging best practice in mule using logger component
Logging best practice in mule using logger componentLogging best practice in mule using logger component
Logging best practice in mule using logger component
 
Jdbc Ppt
Jdbc PptJdbc Ppt
Jdbc Ppt
 
JVM++: The Graal VM
JVM++: The Graal VMJVM++: The Graal VM
JVM++: The Graal VM
 
The Ideal Salesforce Development Lifecycle
The Ideal Salesforce Development LifecycleThe Ideal Salesforce Development Lifecycle
The Ideal Salesforce Development Lifecycle
 
Salesforce Developer Console ppt
Salesforce Developer Console  pptSalesforce Developer Console  ppt
Salesforce Developer Console ppt
 
The Java memory model made easy
The Java memory model made easyThe Java memory model made easy
The Java memory model made easy
 

Destacado

ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012Amazon Web Services
 
Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overviewRyan Hodgin
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Jeremy Edberg
 
Release the Monkeys ! Testing in the Wild at Netflix
Release the Monkeys !  Testing in the Wild at NetflixRelease the Monkeys !  Testing in the Wild at Netflix
Release the Monkeys ! Testing in the Wild at NetflixGareth Bowles
 
Architecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureArchitecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureLen Bass
 
Jfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App DeploymentJfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App DeploymentAxel Fontaine
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systemsLen Bass
 
#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security Services#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security ServicesAlert Logic
 
Elements of User Experience for Mobile Apps
Elements of User Experience for Mobile AppsElements of User Experience for Mobile Apps
Elements of User Experience for Mobile AppsPek Pongpaet
 
From Sketch Mockup → WatchKit App
From Sketch Mockup → WatchKit AppFrom Sketch Mockup → WatchKit App
From Sketch Mockup → WatchKit AppPek Pongpaet
 
Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016João Cravo
 
Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Jay Zarfoss
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security AutomationJason Chan
 
#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the Cloud#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the CloudAlert Logic
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgNils Meder
 
From Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixFrom Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixDianne Marsh
 
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...Amazon Web Services
 
Full Stack Automation with Katello & The Foreman
Full Stack Automation with Katello & The ForemanFull Stack Automation with Katello & The Foreman
Full Stack Automation with Katello & The ForemanWeston Bassler
 

Destacado (20)

ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
 
Mini-Training: Netflix Simian Army
Mini-Training: Netflix Simian ArmyMini-Training: Netflix Simian Army
Mini-Training: Netflix Simian Army
 
Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overview
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)
 
Release the Monkeys ! Testing in the Wild at Netflix
Release the Monkeys !  Testing in the Wild at NetflixRelease the Monkeys !  Testing in the Wild at Netflix
Release the Monkeys ! Testing in the Wild at Netflix
 
Architecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureArchitecture for the cloud deployment case study future
Architecture for the cloud deployment case study future
 
Jfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App DeploymentJfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App Deployment
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systems
 
#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security Services#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security Services
 
Elements of User Experience for Mobile Apps
Elements of User Experience for Mobile AppsElements of User Experience for Mobile Apps
Elements of User Experience for Mobile Apps
 
From Sketch Mockup → WatchKit App
From Sketch Mockup → WatchKit AppFrom Sketch Mockup → WatchKit App
From Sketch Mockup → WatchKit App
 
Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016
 
presentation-chaos-monkey
presentation-chaos-monkeypresentation-chaos-monkey
presentation-chaos-monkey
 
Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security Automation
 
#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the Cloud#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the Cloud
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
 
From Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixFrom Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at Netflix
 
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
 
Full Stack Automation with Katello & The Foreman
Full Stack Automation with Katello & The ForemanFull Stack Automation with Katello & The Foreman
Full Stack Automation with Katello & The Foreman
 

Similar a Intro to Netflix's Chaos Monkey

Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaAmazon Web Services
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13Dave Gardner
 
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...Puppet
 
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and JenkinsChasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and JenkinsTomas Doran
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationChris Dagdigian
 
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Yan Cui
 
Salt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsSalt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsBenjamin Cane
 
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...Claus Ibsen
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and DrupalPromet Source
 
Automating Perl deployments with Hudson
Automating Perl deployments with HudsonAutomating Perl deployments with Hudson
Automating Perl deployments with Hudsonnachbaur
 
Paris Kafka Meetup - patterns anti-patterns
Paris Kafka Meetup -  patterns anti-patternsParis Kafka Meetup -  patterns anti-patterns
Paris Kafka Meetup - patterns anti-patternsFlorent Ramiere
 
Advanced front-end automation with npm scripts
Advanced front-end automation with npm scriptsAdvanced front-end automation with npm scripts
Advanced front-end automation with npm scriptsk88hudson
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13Dave Gardner
 
Ansible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeAnsible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeSarah Z
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
 
I Don't Test Often ...
I Don't Test Often ...I Don't Test Often ...
I Don't Test Often ...Gareth Bowles
 
I don't always test...but when I do I test in production - Gareth Bowles
I don't always test...but when I do I test in production - Gareth BowlesI don't always test...but when I do I test in production - Gareth Bowles
I don't always test...but when I do I test in production - Gareth BowlesQA or the Highway
 
MongoDB, Cloudformation and Chef
MongoDB, Cloudformation and ChefMongoDB, Cloudformation and Chef
MongoDB, Cloudformation and ChefMongoDB
 

Similar a Intro to Netflix's Chaos Monkey (20)

Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
 
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and JenkinsChasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow Orchestration
 
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
 
Salt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsSalt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environments
 
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
Automating Perl deployments with Hudson
Automating Perl deployments with HudsonAutomating Perl deployments with Hudson
Automating Perl deployments with Hudson
 
Paris Kafka Meetup - patterns anti-patterns
Paris Kafka Meetup -  patterns anti-patternsParis Kafka Meetup -  patterns anti-patterns
Paris Kafka Meetup - patterns anti-patterns
 
Advanced front-end automation with npm scripts
Advanced front-end automation with npm scriptsAdvanced front-end automation with npm scripts
Advanced front-end automation with npm scripts
 
ChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptxChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptx
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Ansible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeAnsible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less Coffee
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in Production
 
I Don't Test Often ...
I Don't Test Often ...I Don't Test Often ...
I Don't Test Often ...
 
I don't always test...but when I do I test in production - Gareth Bowles
I don't always test...but when I do I test in production - Gareth BowlesI don't always test...but when I do I test in production - Gareth Bowles
I don't always test...but when I do I test in production - Gareth Bowles
 
How to Design for High Availability & Scale with AWS
How to Design for High Availability & Scale with AWSHow to Design for High Availability & Scale with AWS
How to Design for High Availability & Scale with AWS
 
MongoDB, Cloudformation and Chef
MongoDB, Cloudformation and ChefMongoDB, Cloudformation and Chef
MongoDB, Cloudformation and Chef
 

Último

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Intro to Netflix's Chaos Monkey

  • 2. “EVERYTHING FAILS ALL THE TIME” - WERNER VOGELS
  • 3. CHAOS MONKEY A service that causes failure and wreaks havoc on instances in Auto Scaling Groups A member of the Simian Army developed by Netflix
  • 4. WHY WOULD WE INTENTIONALLY CAUSE FAILURE?!?  It is inevitable  Infrastructure is Complex  Forcing failure puts you in control  Identify faults in your architecture • Does you load balancers reroute traffic correctly? • Do your instances function correctly when they come back up? • Are you monitoring tools alerting you on important events?
  • 5. GETTING STARTED WITH CHAOS MONKEY  Amazon Web Services  Must be using Auto Scaling Groups  Uses Amazon SimpleDB for event storage  Simple Email Service setup (optional for notifications)  Can be used with Netflix’s Asgard (optional)  Java 7 JDK or newer
  • 7. BUILDING & CONFIGURATION  Clone SimianArmy repo from Github  Builds using Gradle  Runs 6 times a day during business hours- 9am to 3pm  Does not run on holidays or weekends  Timeframes and frequency of runs can be configured
  • 8. IMPORTANT PROPERTIES  Enabling Chaos Monkey  Set simianarmy.chaos.enabled = true  Set simianarmy.chaos.leashed=false  Probability of 1 instance being terminated per day per ASG  simianarmy.chaos.ASG.probability = 1.0  Opt-in or Opt-out model
  • 9. OPT-IN / OPT-OUT MODEL  Set to False = Opt-in Set to True = Opt-out  simianarmy.chaos.ASG.enabled = false  When Opt-In (false) you must enable each auto scaling group you want to run Chaos Monkey in  simianarmy.chaos.<<auto scaling group name>>.enabled = true  When Opt-Out (true) you must disable each auto scaling group you do not want it to run in  simianarmy.chaos.<<auto scaling group name>>.enabled = false
  • 11. ARE TERMINATIONS ALL IT CAN DO?  Block all network traffic  Burn CPU  Burn IO  Fill Disk  Kill Processes  Network Loss  Null-Route • All EC2 <-> EC2 traffic SSH REQUIRED  Detach all EBS volumes  Fail DNS  Fail EC2 API  Fail S3 API  Fail DynamoDB API  Network Corruption  Network Latency
  • 12. LINKS  CloudFormation Template: https://github.com/joehack3r/aws/blob/master/cloudformation/te mplates/chaosMonkey.json  Chaos Monkey Announcement: http://techblog.netflix.com/2012/07/chaos-monkey-released-into- wild.html  Simian Army Quick Start Guide: https://github.com/Netflix/SimianArmy/wiki/Quick-Start-Guide  Chaos Monkey Configuration: https://github.com/Netflix/SimianArmy/wiki/Chaos-Settings  Chaos Monkey Army: https://github.com/Netflix/SimianArmy/wiki/The-Chaos-Monkey-Army