SlideShare a Scribd company logo
1 of 66
Download to read offline
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and TrademarkĀ© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Arun Gupta, @arungupta
Principal Open Source Technologist,
Amazon Web Services
Using Chaos to Bring Resiliency
to Your Applications in
Kubernetes
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Failures are a given and
everything will eventually
fail over time.
https://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://www.youtube.com/watch?v=zoz0ZjfrQ9s
Amazon 2006
GameDay: Creating
Resiliency Through
Destruction
Jesse Robbins
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Monkeys
https://github.com/Netflix/SimianArmy
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Resilience
Ability of a system to adapt
to changes, failures, and disturbances
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering is the discipline of
experimenting on a distributed system in
order to build confidence in the systemā€™s
capability to withstand turbulent
conditions in production
Credit: https://www.flickr.com/photos/loseryouthcrew/8775130600/
https://principlesofchaos.org/
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Bad things will happen to your system,
no matter how well designed it is
You cannot become ignorant to it
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Break your systems on purpose
Find out their weaknesses and
fix them before they break when least expected
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos doesnā€™t cause problems.
It reveals them.
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
ā€¢ Application level
ā€¢ Host failure
ā€¢ Resource attacks (CPU, memory, ā€¦)
ā€¢ Network attacks (dependencies, latency, ā€¦)
ā€¢ Region attacks!
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Where do you inject Chaos?
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://www.elastic.co/blog/timelion-tutorial-from-zero-to-hero
ā€Normalā€ behavior of your system
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Business metric
https://medium.com/netflix-
techblog/sps-the-pulse-of-
netflix-streaming-
ae4db0e05f8a
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
ā€¢ a service gives 404 or 503?
ā€¢ latency increases by 300ms?
ā€¢ the port is not accessible?
ā€¢ security group rules changed?
ā€¢ the database stops?
ā€¢ excessive number of requests come?
ā€¢ iptables are wiped out?
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Pick hypothesis
Scope the experiment
Identify metrics
Notify the organization
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Start with very small
As close as possible to production
Minimize the blast radius.
Have an emergency STOP!
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Users
Canary deployment
99%
users
1%
users
Start with...
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Time to detect?
Time for notification? And escalation?
Time to public notification?
Time for graceful degradation to kick-in?
Time for self healing to happen?
Time to recoveryā€”partial and full?
Time to all-clear and stable?
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
DONā€™T blame that one personā€¦
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
PostMortemsā€”COE (Correction of Errors)
The 5 WHYs
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Fix
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Failure free operations require
experience with failure.
http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kubernetes cluster
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Reconciles desired and actual state for pods
Distributes pods across AZs
Automatic health-check based restarts
Rolling deployment of a service
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kubernetes cluster with Amazon EKS
AWS managed
Customer account
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kubernetes cluster with Amazon EKS
mycluster.eks.amazonaws.com
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Kubectl
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Region and Availability Zones
Control Plane is highly available
Master and Workers are configured in ASG
Master instance type auto-scaling
Etcd is HA and backed up every hour
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos in a Kubernetes cluster
mycluster.eks.amazonaws.com
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Kubectl
x
x
Health check?
Dead node?
x
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio
Chaos Toolkit
Kube Monkey
PowerfulSeal
Gremlin
Simian Army
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio
Intelligent routing
and load balancing
Resilience across
languages and
platforms
Fleet-wide policy
enforcement
In-depth
telemetry
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Timeouts
Bounded retries with timeout budget
Concurrent connections limit and request load
Active health checks (periodic)
Passive health checks (circuit breakers)
AZ-aware load balancing with automatic failover
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
ā€¢ Timing failures
ā€¢ Increased network latency
ā€¢ Overloaded upstream service
ā€¢ Crashes
ā€¢ HTTP error codes
ā€¢ TCP connection failures
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Fault injection using Istioā€”timeout
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: greeting
spec:
hosts:
- greeting
http:
- fault:
delay:
fixedDelay: 10s
percent: 100
route:
- destination:
host: greeting
subset: greeting-hello
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: greeting-destination-rule
spec:
host: greeting
subsets:
- name: greeting-hello
labels:
greeting: hello
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Fault injection using Istioā€”HTTP abort
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: greeting
spec:
hosts:
- greeting
http:
- fault:
abort:
httpStatus: 500
percent: 100
route:
- destination:
host: greeting
subset: greeting-hello
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio traffic management
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: greeting-virtual-service
spec:
hosts:
- greeting
http:
- route:
- destination:
host: greeting
subset: greeting-hello
weight: 75
- destination:
host: greeting
subset: greeting-howdy
weight: 25
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: greeting-destination-rule
spec:
host: greeting
subsets:
- name: greeting-hello
labels:
greeting: hello
- name: greeting-howdy
labels:
greeting: howdy
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio circuit breaker
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: greeting-destination-rule
spec:
host: greeting
subsets:
- name: greeting-hello
labels:
greeting: hello
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://istio.io/docs/
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit
Open API for Chaos Engineering
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
CLI-driven
Experiments declared in JSON/YAML files
Open specification
Extensible: Kubernetes, AWS, Spring, others
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit follows the principles of chaos
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
query a system to observe a behavior
ā€¢ Check state of a pod with a specific label
ā€¢ Multiple probes to define steady state
real-world events
ā€¢ Terminate a deployment
ā€¢ Multiple actions simulate events
Types of probe and method
ā€¢ Process: Run a binary
ā€¢ HTTP: Invoke a HTTP endpoint
ā€¢ Python: Call a Python function to perform richer operations
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit metadata
{
"version": "1.0.0",
"title": "Terminating the greeting service should not impact users",
"description": "How does the greeting service unavailbility impacts our users? Do they see
an error or does the webapp gets slower?",
"tags": [
"kubernetes",
"aws"
],
"configuration": {
"web_app_url": {
"type": "env",
"key": "WEBAPP_URL"
}
},
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit steady state & hypothesis
"steady-state-hypothesis": {
"title": "Services are all available and healthy",
"probes": [
{
"type": "probe",
"name": "alive-and-healthy",
"tolerance": true,
"provider": {
"type": "python",
"module": "chaosk8s.pod.probes",
"func": "pods_in_phase",
"arguments": {
"label_selector": "app=webapp-pod",
"phase": "Running",
"ns": "default"
}
}
},
{
"type": "probe",
"name": "application-must-respond-normally",
"tolerance": 200,
"provider": {
"type": "http",
"url": "${web_app_url}",
"timeout": 3
}
}
]
},
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit experiment & verify
"method": [
{
"type": "action",
"name": "terminate-greeting-service",
"provider": {
"type": "python",
"module": "chaosk8s.pod.actions",
"func": "terminate_pods",
"arguments": {
"label_selector": "app=greeter-pod",
"ns": "default"
}
}
},
{
"type": "probe",
"name": "fetch-application-logs",
"provider": {
"type": "python",
"module": "chaosk8s.pod.probes",
"func": "read_pod_logs",
"arguments": {
"label_selector": "app=webapp-pod",
"last": "20s",
"ns": "default"
}
}
}
],
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit run
$ chaos run experiments/experiment.json
[2018-03-10 14:42:38 INFO] Validating the experiment's syntax
[2018-03-10 14:42:38 INFO] Experiment looks valid
[2018-03-10 14:42:38 INFO] Running experiment: Terminate the greeting service should not impact users
[2018-03-10 14:42:38 INFO] Steady state hypothesis: Services are all available and healthy
[2018-03-10 14:42:38 INFO] Probe: application-should-be-alive-and-healthy
[2018-03-10 14:42:38 INFO] Probe: application-must-respond-normally
[2018-03-10 14:42:39 INFO] Steady state hypothesis is met!
[2018-03-10 14:42:39 INFO] Action: terminate-greeting-service
[2018-03-10 14:42:40 INFO] Probe: fetch-application-logs
[2018-03-10 14:42:41 INFO] Steady state hypothesis: Services are all available and healthy
[2018-03-10 14:42:41 INFO] Probe: application-should-be-alive-and-healthy
[2018-03-10 14:42:42 INFO] Probe: application-must-respond-normally
[2018-03-10 14:42:45 ERROR] => failed: activity took too long to complete
[2018-03-10 14:42:45 CRITICAL] Steady state probe 'application-must-respond-normally' is not in the
given tolerance so failing this experiment
[2018-03-10 14:42:45 INFO] Let's rollback...
[2018-03-10 14:42:45 INFO] No declared rollbacks, let's move on.
[2018-03-10 14:42:45 INFO] Experiment ended with status: failed
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://github.com/chaostoolkit/chaostoolkit/
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Implementation of Netflixā€™s Chaos Monkey for Kubernetes
Randomly deletes pods in the cluster
Applications opt-in using annotations
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Run Kube-Monkeyā€”create configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-monkey-config-map
namespace: kube-system
data:
config.toml: |
[kubemonkey]
run_hour = 8
start_hour = 10
end_hour = 16
blacklisted_namespaces = ["kube-system"]
whitelisted_namespaces = [""]
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kube-Monkey application opt-in
apiVersion: apps/v1
kind: Deployment
. . .
template:
metadata:
labels:
app: greeting
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim-pods
kube-monkey/mtbf: 2
kube-monkey/kill-mode: random-max-percent
kube-monkey/kill-value: 40
spec:
containers:
- name: greeting
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://github.com/asobti/kube-monkey
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering working group @ CNCF
https://github.com/chaoseng/wg-chaoseng
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering mind map
https://bit.ly/2uKOJMQ
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
You donā€™t chose the moment,
the moment chooses you.
You only choose how prepared
you are, when it does.
Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and TrademarkĀ© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Thank you!

More Related Content

What's hot

Chaos Engineering Kubernetes
Chaos Engineering KubernetesChaos Engineering Kubernetes
Chaos Engineering KubernetesAlex Soto
Ā 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos EngineeringGremlin
Ā 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
Ā 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgNils Meder
Ā 
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsC4Media
Ā 
Chaos engineering and chaos testing
Chaos engineering and chaos testingChaos engineering and chaos testing
Chaos engineering and chaos testingjeetendra mandal
Ā 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos EngineeringSIGHUP
Ā 
Chaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in SystemsChaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in SystemsYury Roa
Ā 
Introduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft AzureIntroduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft AzureAna Medina
Ā 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-KnolxKnoldus Inc.
Ā 
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...Amazon Web Services
Ā 
Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017 Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017 Nora Jones
Ā 
Chaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin PlatformChaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin PlatformAnshul Patel
Ā 
K8s on AWS - Introducing Amazon EKS
K8s on AWS - Introducing Amazon EKSK8s on AWS - Introducing Amazon EKS
K8s on AWS - Introducing Amazon EKSAmazon Web Services
Ā 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixJosh Evans
Ā 
From Monolithic to Microservices
From Monolithic to Microservices From Monolithic to Microservices
From Monolithic to Microservices Amazon Web Services
Ā 
DEV204_Debugging Modern Applications Introduction to AWS X-Ray
DEV204_Debugging Modern Applications Introduction to AWS X-RayDEV204_Debugging Modern Applications Introduction to AWS X-Ray
DEV204_Debugging Modern Applications Introduction to AWS X-RayAmazon Web Services
Ā 
Introduction to Resilience4j
Introduction to Resilience4jIntroduction to Resilience4j
Introduction to Resilience4jKnoldus Inc.
Ā 
Build CICD Pipeline for Container Presentation Slides
Build CICD Pipeline for Container Presentation SlidesBuild CICD Pipeline for Container Presentation Slides
Build CICD Pipeline for Container Presentation SlidesAmazon Web Services
Ā 

What's hot (20)

Chaos Engineering Kubernetes
Chaos Engineering KubernetesChaos Engineering Kubernetes
Chaos Engineering Kubernetes
Ā 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
Ā 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in Production
Ā 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Ā 
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
Ā 
Chaos engineering and chaos testing
Chaos engineering and chaos testingChaos engineering and chaos testing
Chaos engineering and chaos testing
Ā 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
Ā 
Chaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in SystemsChaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in Systems
Ā 
Introduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft AzureIntroduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft Azure
Ā 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-Knolx
Ā 
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Ā 
Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017 Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017
Ā 
Chaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin PlatformChaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin Platform
Ā 
K8s on AWS - Introducing Amazon EKS
K8s on AWS - Introducing Amazon EKSK8s on AWS - Introducing Amazon EKS
K8s on AWS - Introducing Amazon EKS
Ā 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at Netflix
Ā 
From Monolithic to Microservices
From Monolithic to Microservices From Monolithic to Microservices
From Monolithic to Microservices
Ā 
DEV204_Debugging Modern Applications Introduction to AWS X-Ray
DEV204_Debugging Modern Applications Introduction to AWS X-RayDEV204_Debugging Modern Applications Introduction to AWS X-Ray
DEV204_Debugging Modern Applications Introduction to AWS X-Ray
Ā 
Introduction to Resilience4j
Introduction to Resilience4jIntroduction to Resilience4j
Introduction to Resilience4j
Ā 
Build CICD Pipeline for Container Presentation Slides
Build CICD Pipeline for Container Presentation SlidesBuild CICD Pipeline for Container Presentation Slides
Build CICD Pipeline for Container Presentation Slides
Ā 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
Ā 

Similar to Chaos Engineering with Kubernetes

Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringAmazon Web Services
Ā 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Amazon Web Services
Ā 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudAmazon Web Services
Ā 
Using chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applicationsUsing chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applicationsJohn Varghese
Ā 
Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Yan Cui
Ā 
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Amazon Web Services
Ā 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Adrian Hornsby
Ā 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedAWS User Group Bengaluru
Ā 
Intro to SageMaker
Intro to SageMakerIntro to SageMaker
Intro to SageMakerSoji Adeshina
Ā 
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Amazon Web Services
Ā 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSDThomas Delteil
Ā 
Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18CodeOps Technologies LLP
Ā 
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...Amazon Web Services
Ā 
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018Amazon Web Services
Ā 
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech TalksImprove Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech TalksAmazon Web Services
Ā 
Come Out From Behind Your Firewall
Come Out From Behind Your FirewallCome Out From Behind Your Firewall
Come Out From Behind Your FirewallAmazon Web Services
Ā 
Modern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayModern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayAmazon Web Services
Ā 
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018Amazon Web Services
Ā 
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...Amazon Web Services
Ā 
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018Amazon Web Services
Ā 

Similar to Chaos Engineering with Kubernetes (20)

Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
Ā 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Ā 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
Ā 
Using chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applicationsUsing chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applications
Ā 
Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)
Ā 
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Ā 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
Ā 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
Ā 
Intro to SageMaker
Intro to SageMakerIntro to SageMaker
Intro to SageMaker
Ā 
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Ā 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSD
Ā 
Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18
Ā 
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Ā 
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Ā 
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech TalksImprove Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Ā 
Come Out From Behind Your Firewall
Come Out From Behind Your FirewallCome Out From Behind Your Firewall
Come Out From Behind Your Firewall
Ā 
Modern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayModern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat Way
Ā 
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018
Ā 
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Ā 
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Ā 

More from Arun Gupta

5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdfArun Gupta
Ā 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Arun Gupta
Ā 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesArun Gupta
Ā 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerArun Gupta
Ā 
Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Arun Gupta
Ā 
Why Amazon Cares about Open Source
Why Amazon Cares about Open SourceWhy Amazon Cares about Open Source
Why Amazon Cares about Open SourceArun Gupta
Ā 
Machine learning using Kubernetes
Machine learning using KubernetesMachine learning using Kubernetes
Machine learning using KubernetesArun Gupta
Ā 
Building Cloud Native Applications
Building Cloud Native ApplicationsBuilding Cloud Native Applications
Building Cloud Native ApplicationsArun Gupta
Ā 
How to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMHow to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMArun Gupta
Ā 
Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Arun Gupta
Ā 
The Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteThe Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteArun Gupta
Ā 
Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Arun Gupta
Ā 
Mastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitMastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitArun Gupta
Ā 
Top 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeTop 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeArun Gupta
Ā 
Container Landscape in 2017
Container Landscape in 2017Container Landscape in 2017
Container Landscape in 2017Arun Gupta
Ā 
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftJava EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftArun Gupta
Ā 
Docker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersDocker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersArun Gupta
Ā 
Thanks Managers!
Thanks Managers!Thanks Managers!
Thanks Managers!Arun Gupta
Ā 
Migrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersMigrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersArun Gupta
Ā 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessArun Gupta
Ā 

More from Arun Gupta (20)

5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf
Ā 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
Ā 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
Ā 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using Firecracker
Ā 
Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019
Ā 
Why Amazon Cares about Open Source
Why Amazon Cares about Open SourceWhy Amazon Cares about Open Source
Why Amazon Cares about Open Source
Ā 
Machine learning using Kubernetes
Machine learning using KubernetesMachine learning using Kubernetes
Machine learning using Kubernetes
Ā 
Building Cloud Native Applications
Building Cloud Native ApplicationsBuilding Cloud Native Applications
Building Cloud Native Applications
Ā 
How to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMHow to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAM
Ā 
Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018
Ā 
The Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteThe Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 Keynote
Ā 
Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018
Ā 
Mastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitMastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv Summit
Ā 
Top 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeTop 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's Landscape
Ā 
Container Landscape in 2017
Container Landscape in 2017Container Landscape in 2017
Container Landscape in 2017
Ā 
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftJava EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Ā 
Docker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersDocker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developers
Ā 
Thanks Managers!
Thanks Managers!Thanks Managers!
Thanks Managers!
Ā 
Migrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersMigrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to Containers
Ā 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern Success
Ā 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
Ā 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
Ā 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
Ā 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
Ā 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
Ā 
Finology Group ā€“ Insurtech Innovation Award 2024
Finology Group ā€“ Insurtech Innovation Award 2024Finology Group ā€“ Insurtech Innovation Award 2024
Finology Group ā€“ Insurtech Innovation Award 2024The Digital Insurer
Ā 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationRadu Cotescu
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
Ā 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
Ā 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Ā 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Ā 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Ā 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Ā 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Ā 
Finology Group ā€“ Insurtech Innovation Award 2024
Finology Group ā€“ Insurtech Innovation Award 2024Finology Group ā€“ Insurtech Innovation Award 2024
Finology Group ā€“ Insurtech Innovation Award 2024
Ā 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organization
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Ā 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Ā 

Chaos Engineering with Kubernetes

  • 1. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and TrademarkĀ© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Arun Gupta, @arungupta Principal Open Source Technologist, Amazon Web Services Using Chaos to Bring Resiliency to Your Applications in Kubernetes
  • 2. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Failures are a given and everything will eventually fail over time. https://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html
  • 3. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://www.youtube.com/watch?v=zoz0ZjfrQ9s Amazon 2006 GameDay: Creating Resiliency Through Destruction Jesse Robbins
  • 4. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Monkeys https://github.com/Netflix/SimianArmy
  • 5. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering
  • 6. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Resilience Ability of a system to adapt to changes, failures, and disturbances
  • 7. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the systemā€™s capability to withstand turbulent conditions in production Credit: https://www.flickr.com/photos/loseryouthcrew/8775130600/ https://principlesofchaos.org/
  • 8. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Bad things will happen to your system, no matter how well designed it is You cannot become ignorant to it
  • 9. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Break your systems on purpose Find out their weaknesses and fix them before they break when least expected
  • 10. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
  • 11. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos doesnā€™t cause problems. It reveals them.
  • 12. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark ā€¢ Application level ā€¢ Host failure ā€¢ Resource attacks (CPU, memory, ā€¦) ā€¢ Network attacks (dependencies, latency, ā€¦) ā€¢ Region attacks!
  • 13. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Where do you inject Chaos?
  • 14. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 15. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 16. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://www.elastic.co/blog/timelion-tutorial-from-zero-to-hero ā€Normalā€ behavior of your system
  • 17. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Business metric https://medium.com/netflix- techblog/sps-the-pulse-of- netflix-streaming- ae4db0e05f8a
  • 18. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 19. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 20. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark ā€¢ a service gives 404 or 503? ā€¢ latency increases by 300ms? ā€¢ the port is not accessible? ā€¢ security group rules changed? ā€¢ the database stops? ā€¢ excessive number of requests come? ā€¢ iptables are wiped out?
  • 21. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 22. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 23. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Pick hypothesis Scope the experiment Identify metrics Notify the organization
  • 24. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Start with very small As close as possible to production Minimize the blast radius. Have an emergency STOP!
  • 25. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Users Canary deployment 99% users 1% users Start with...
  • 26. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 27. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 28. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Time to detect? Time for notification? And escalation? Time to public notification? Time for graceful degradation to kick-in? Time for self healing to happen? Time to recoveryā€”partial and full? Time to all-clear and stable?
  • 29. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark DONā€™T blame that one personā€¦
  • 30. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark PostMortemsā€”COE (Correction of Errors) The 5 WHYs
  • 31. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 32. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 33. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Fix
  • 34. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Failure free operations require experience with failure. http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf
  • 35. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kubernetes cluster
  • 36. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Reconciles desired and actual state for pods Distributes pods across AZs Automatic health-check based restarts Rolling deployment of a service
  • 37. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kubernetes cluster with Amazon EKS AWS managed Customer account
  • 38. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kubernetes cluster with Amazon EKS mycluster.eks.amazonaws.com Availability Zone 1 Availability Zone 2 Availability Zone 3 Kubectl
  • 39. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Region and Availability Zones Control Plane is highly available Master and Workers are configured in ASG Master instance type auto-scaling Etcd is HA and backed up every hour
  • 40. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos in a Kubernetes cluster mycluster.eks.amazonaws.com Availability Zone 1 Availability Zone 2 Availability Zone 3 Kubectl x x Health check? Dead node? x
  • 41. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio Chaos Toolkit Kube Monkey PowerfulSeal Gremlin Simian Army
  • 42. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio Intelligent routing and load balancing Resilience across languages and platforms Fleet-wide policy enforcement In-depth telemetry
  • 43. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Timeouts Bounded retries with timeout budget Concurrent connections limit and request load Active health checks (periodic) Passive health checks (circuit breakers) AZ-aware load balancing with automatic failover
  • 44. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark ā€¢ Timing failures ā€¢ Increased network latency ā€¢ Overloaded upstream service ā€¢ Crashes ā€¢ HTTP error codes ā€¢ TCP connection failures
  • 45. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Fault injection using Istioā€”timeout apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: greeting spec: hosts: - greeting http: - fault: delay: fixedDelay: 10s percent: 100 route: - destination: host: greeting subset: greeting-hello --- apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: greeting-destination-rule spec: host: greeting subsets: - name: greeting-hello labels: greeting: hello
  • 46. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Fault injection using Istioā€”HTTP abort apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: greeting spec: hosts: - greeting http: - fault: abort: httpStatus: 500 percent: 100 route: - destination: host: greeting subset: greeting-hello
  • 47. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio traffic management apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: greeting-virtual-service spec: hosts: - greeting http: - route: - destination: host: greeting subset: greeting-hello weight: 75 - destination: host: greeting subset: greeting-howdy weight: 25 --- apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: greeting-destination-rule spec: host: greeting subsets: - name: greeting-hello labels: greeting: hello - name: greeting-howdy labels: greeting: howdy
  • 48. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio circuit breaker apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: greeting-destination-rule spec: host: greeting subsets: - name: greeting-hello labels: greeting: hello trafficPolicy: connectionPool: tcp: maxConnections: 100
  • 49. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://istio.io/docs/
  • 50. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit Open API for Chaos Engineering
  • 51. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark CLI-driven Experiments declared in JSON/YAML files Open specification Extensible: Kubernetes, AWS, Spring, others
  • 52. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit follows the principles of chaos
  • 53. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark query a system to observe a behavior ā€¢ Check state of a pod with a specific label ā€¢ Multiple probes to define steady state real-world events ā€¢ Terminate a deployment ā€¢ Multiple actions simulate events Types of probe and method ā€¢ Process: Run a binary ā€¢ HTTP: Invoke a HTTP endpoint ā€¢ Python: Call a Python function to perform richer operations
  • 54. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit metadata { "version": "1.0.0", "title": "Terminating the greeting service should not impact users", "description": "How does the greeting service unavailbility impacts our users? Do they see an error or does the webapp gets slower?", "tags": [ "kubernetes", "aws" ], "configuration": { "web_app_url": { "type": "env", "key": "WEBAPP_URL" } },
  • 55. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit steady state & hypothesis "steady-state-hypothesis": { "title": "Services are all available and healthy", "probes": [ { "type": "probe", "name": "alive-and-healthy", "tolerance": true, "provider": { "type": "python", "module": "chaosk8s.pod.probes", "func": "pods_in_phase", "arguments": { "label_selector": "app=webapp-pod", "phase": "Running", "ns": "default" } } }, { "type": "probe", "name": "application-must-respond-normally", "tolerance": 200, "provider": { "type": "http", "url": "${web_app_url}", "timeout": 3 } } ] },
  • 56. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit experiment & verify "method": [ { "type": "action", "name": "terminate-greeting-service", "provider": { "type": "python", "module": "chaosk8s.pod.actions", "func": "terminate_pods", "arguments": { "label_selector": "app=greeter-pod", "ns": "default" } } }, { "type": "probe", "name": "fetch-application-logs", "provider": { "type": "python", "module": "chaosk8s.pod.probes", "func": "read_pod_logs", "arguments": { "label_selector": "app=webapp-pod", "last": "20s", "ns": "default" } } } ],
  • 57. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit run $ chaos run experiments/experiment.json [2018-03-10 14:42:38 INFO] Validating the experiment's syntax [2018-03-10 14:42:38 INFO] Experiment looks valid [2018-03-10 14:42:38 INFO] Running experiment: Terminate the greeting service should not impact users [2018-03-10 14:42:38 INFO] Steady state hypothesis: Services are all available and healthy [2018-03-10 14:42:38 INFO] Probe: application-should-be-alive-and-healthy [2018-03-10 14:42:38 INFO] Probe: application-must-respond-normally [2018-03-10 14:42:39 INFO] Steady state hypothesis is met! [2018-03-10 14:42:39 INFO] Action: terminate-greeting-service [2018-03-10 14:42:40 INFO] Probe: fetch-application-logs [2018-03-10 14:42:41 INFO] Steady state hypothesis: Services are all available and healthy [2018-03-10 14:42:41 INFO] Probe: application-should-be-alive-and-healthy [2018-03-10 14:42:42 INFO] Probe: application-must-respond-normally [2018-03-10 14:42:45 ERROR] => failed: activity took too long to complete [2018-03-10 14:42:45 CRITICAL] Steady state probe 'application-must-respond-normally' is not in the given tolerance so failing this experiment [2018-03-10 14:42:45 INFO] Let's rollback... [2018-03-10 14:42:45 INFO] No declared rollbacks, let's move on. [2018-03-10 14:42:45 INFO] Experiment ended with status: failed
  • 58. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://github.com/chaostoolkit/chaostoolkit/
  • 59. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Implementation of Netflixā€™s Chaos Monkey for Kubernetes Randomly deletes pods in the cluster Applications opt-in using annotations
  • 60. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Run Kube-Monkeyā€”create configuration apiVersion: v1 kind: ConfigMap metadata: name: kube-monkey-config-map namespace: kube-system data: config.toml: | [kubemonkey] run_hour = 8 start_hour = 10 end_hour = 16 blacklisted_namespaces = ["kube-system"] whitelisted_namespaces = [""]
  • 61. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kube-Monkey application opt-in apiVersion: apps/v1 kind: Deployment . . . template: metadata: labels: app: greeting kube-monkey/enabled: enabled kube-monkey/identifier: monkey-victim-pods kube-monkey/mtbf: 2 kube-monkey/kill-mode: random-max-percent kube-monkey/kill-value: 40 spec: containers: - name: greeting
  • 62. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://github.com/asobti/kube-monkey
  • 63. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering working group @ CNCF https://github.com/chaoseng/wg-chaoseng
  • 64. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering mind map https://bit.ly/2uKOJMQ
  • 65. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark You donā€™t chose the moment, the moment chooses you. You only choose how prepared you are, when it does.
  • 66. Ā© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and TrademarkĀ© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Thank you!