SlideShare una empresa de Scribd logo
1 de 16
#ATAGTR2021
Chaos Engineering: Break It to
Make It
Anupam Agarwal & Peeyush Girdh
KNOW YOUR SPEAKERS
Anupam Agarwal Peeyush Girdhar
Cloud/DevOps
Architect
Cloud/DevOps
Architect
AGENDA
01
02
03
04
Concept of Chaos Engineering
Need for Chaos Engineering
Chaos Engineering vs Normal
Testing
Start your journey with Chaos
Engineering
Why the World Needs more Resilient Systems ?
1
BREACH
2
MATURITY
3
TEAMS
4
TESTING
Organizations confirmed or suspected breaches tied to their
applications or Infrastructure.
Organization that are in immature or improving state with respect to
environment resilience.
Teams have not incorporated resilience testing in their design during
initial stages of SDLC
Traditional testing are still not helping them to find the issues within
the ecosystems..
24%
86%
65%
47%
Common issues faced by multiple organizations
Chaos Engineering : Where are we ?
The art of breaking things purposefully
Ever since Netflix introduced Chaos Engineering
through their Simian Army toolset in 2012, the idea of
inducing failure as a preventative means has become
one of the preferred resilience techniques for cloud
native distributed systems.
“Chaos Engineering is the discipline of
experimenting on a distributed system in
order to induce artificial failures to build
confidence in the system's capability to withstand
turbulent conditions in production.”
Here's how Netflix describes why they built these chaos tools:
The cloud is all about redundancy and fault-tolerance. Since no single component can guarantee
100% uptime (and even the most expensive hardware eventually fails), we have to design a cloud
architecture where individual components can fail without affecting the availability of the entire
system. In effect, we have to be stronger than our weakest link.
Why Chaos Engineering?
Chaos Engineering is Preventive Medicine
Chaos Engineering is an approach for learning about how your
system behaves by applying a discipline of empirical
exploration.
Chaos engineering enables organizations to develop reliable and fault-tolerant
software systems, building your team’s confidence in them. The more stable
your systems are, the more confident you can be that they will function
properly.
By designing and executing Chaos Engineering experiments,
you will learn about weaknesses in your system that could
potentially lead to outages in customer environment.
LEARN
PREVENT
OUTAGES
BUILD
CONFIDENCE
Getting Started with Chaos Engineering
Disciplined approach to find failures before they become outages.
DEFINE ‘STEADY
STATE’
CREATE
HYPOTHESIS
RUN EXPERIMENTS INTERPRET THE
RESULTS
LEARN & IMPROVE
Start by defining
‘steady state’ as
some
measurable
output of a
system that
indicates normal
behavior.
Hypothesize that
this steady state
will continue in
both the control
group and the
experimental
group
Introduce attacks
that reflect real
world events like
server crash, hard
drive
malfunctioning,
network outage etc.
Try to disprove
the hypothesis
by looking for a
difference in
steady state
between the
control group
and the
experimental
Improve
functionalities in
the existing
system from the
above
experiments and
their results.
Chaos Engineering Meets DevOps
Maximize benefits by practicing automated Chaos Engineering within your
CI/CD pipelines
DEVOPS
SLOs/Error
Budget
Documentati
on
Architectu
re ModelRunbooks
Monitoring
Network Provider
CDNs
Cloud & SaaS
Providers
Performan
ce
Error
Handling
Timeouts/Retri
es/Circuit
Breakers
Automated
Testing
Continuous
Integration
Continuous
Deployment
Feature
Flagging/
Progressive
Continuous Chaos
Graduate chaos experiments into different phases
What is Game Day?
Game Day are like fire drills on a dedicated day for
running chaos engineering experiments on our
systems.
Define the timelines
Whiteboarding
Execution
Review
Define the Targets
How to run a
Game Day
Promote Chaos Days !!
How Chaos Engineering differ from Testing ?
Practice for generating new information
• Experiments propose a hypothesis,
and if the hypothesis is not
disproven, confidence grows in that
hypothesis. If it is disproven, then
we learn something new.
GENERATE NEW
INFORMATION
• An important distinction can be drawn
between testing and experimentation.
Tests make an assertion, based on
existing knowledge, and then running the
test collapses the valence of that
assertion, usually into either true or false.
DRAW DISTINCTION
 When you want to explore the many ways,
a complex system can misbehave,
injecting communication failures like
latency and errors is one good approach.
EXPLORATION OF
UNKNOWN
• Testing, strictly speaking, does not
create new knowledge. Testing
requires that the engineer writing the
test knows specific properties about
the system that they are looking for
in advance.
COMPLEX ECOSYSTEM
Tools to kickstart your Chaos Journey
AWS Fault Injection
Which one to choose?
Is it even worth embracing?
Pros Cons
• Insights received after running chaos
testing can lead to a reduction in
production incidents for the future.
• Implementing Chaos tools for a large-
scale system and experimenting can
lead to an increase in cost.
• Helps in improving the confidence
and engagement of team members for
carrying out disaster recovery
methods and makes applications
highly reliable.
• Carelessness or Incorrect
steps in formation and implementation
can impact the application, thereby
hampering the customer.
• On a high level, Chaos Engineering
provides us an advantage by overall
system availability.
• It doesn't support all kinds of
deployment.
• Production outages can lead to huge
losses; therefore, chaos engineering
helps in the prevention of large
losses in revenue.
• Most of the chaos Engineering tools
do not covers all type of
environments and its components.
• The team can verify
system's behavior on failure to take
Opportunities & Obstacles
DEMO- QUICK INSIGHT
15
ANY QUESTIONS
For Any Queries, please write at : anupamaggarwal.0611@gmail.com /
girdhar.peeyush@gmail.com
THANK YOU

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos Engineering
 
SRE in Enterprise - Local Journey DevopsDays Galway
SRE in Enterprise - Local Journey  DevopsDays GalwaySRE in Enterprise - Local Journey  DevopsDays Galway
SRE in Enterprise - Local Journey DevopsDays Galway
 
Testability Sales Pitch
Testability Sales PitchTestability Sales Pitch
Testability Sales Pitch
 
Chaos Engineering, When should you release the monkeys?
Chaos Engineering, When should you release the monkeys?Chaos Engineering, When should you release the monkeys?
Chaos Engineering, When should you release the monkeys?
 
Chaos Engineering when you're not Netflix
Chaos Engineering when you're not NetflixChaos Engineering when you're not Netflix
Chaos Engineering when you're not Netflix
 
Chaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in SystemsChaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in Systems
 
4 Steps to Effectively Integrate DevOps Workflows With Cloud Security Practices
4 Steps to Effectively Integrate DevOps Workflows With Cloud Security Practices4 Steps to Effectively Integrate DevOps Workflows With Cloud Security Practices
4 Steps to Effectively Integrate DevOps Workflows With Cloud Security Practices
 
Software testing and test environment​
Software testing and test environment​Software testing and test environment​
Software testing and test environment​
 
Architectural Testability Workshop for Test Academy Barcelona
Architectural Testability Workshop for Test Academy BarcelonaArchitectural Testability Workshop for Test Academy Barcelona
Architectural Testability Workshop for Test Academy Barcelona
 
3 Reasons Why The Host Rules Intrusion Detection in The Cloud
3 Reasons Why The Host Rules Intrusion Detection in The Cloud 3 Reasons Why The Host Rules Intrusion Detection in The Cloud
3 Reasons Why The Host Rules Intrusion Detection in The Cloud
 
It All Started With a Wager About System Upgrades
It All Started With a Wager About System UpgradesIt All Started With a Wager About System Upgrades
It All Started With a Wager About System Upgrades
 
SRE in Startup
SRE in StartupSRE in Startup
SRE in Startup
 
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
 
Testability is Everyone's Responsibility
Testability is Everyone's ResponsibilityTestability is Everyone's Responsibility
Testability is Everyone's Responsibility
 
Software development practices & Infrastructure as Code - how well do they wo...
Software development practices & Infrastructure as Code - how well do they wo...Software development practices & Infrastructure as Code - how well do they wo...
Software development practices & Infrastructure as Code - how well do they wo...
 
Should You Use Security Point Solutions?
Should You Use Security Point Solutions?Should You Use Security Point Solutions?
Should You Use Security Point Solutions?
 
SHOWDOWN: Threat Stack vs. Red Hat AuditD
SHOWDOWN: Threat Stack vs. Red Hat AuditDSHOWDOWN: Threat Stack vs. Red Hat AuditD
SHOWDOWN: Threat Stack vs. Red Hat AuditD
 
Puppet Camp Atlanta 2014: Keynote
Puppet Camp Atlanta 2014: Keynote  Puppet Camp Atlanta 2014: Keynote
Puppet Camp Atlanta 2014: Keynote
 
Craft 2019 - Security Chaos Engineering - Security Precognition
Craft 2019 - Security Chaos Engineering - Security PrecognitionCraft 2019 - Security Chaos Engineering - Security Precognition
Craft 2019 - Security Chaos Engineering - Security Precognition
 
InfoQ Live - Reducing Uncertainty in Software Delivery - Building reliability...
InfoQ Live - Reducing Uncertainty in Software Delivery - Building reliability...InfoQ Live - Reducing Uncertainty in Software Delivery - Building reliability...
InfoQ Live - Reducing Uncertainty in Software Delivery - Building reliability...
 

Similar a #ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupam Agarwal, Peeyush Girdhar.

30 February 2005 QUEUE rants [email protected] DARNEDTestin.docx
30  February 2005  QUEUE rants [email protected] DARNEDTestin.docx30  February 2005  QUEUE rants [email protected] DARNEDTestin.docx
30 February 2005 QUEUE rants [email protected] DARNEDTestin.docx
tamicawaysmith
 

Similar a #ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupam Agarwal, Peeyush Girdhar. (20)

Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
30 February 2005 QUEUE rants [email protected] DARNEDTestin.docx
30  February 2005  QUEUE rants [email protected] DARNEDTestin.docx30  February 2005  QUEUE rants [email protected] DARNEDTestin.docx
30 February 2005 QUEUE rants [email protected] DARNEDTestin.docx
 
Resilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and ScaleResilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and Scale
 
Test Environment Management
Test Environment ManagementTest Environment Management
Test Environment Management
 
Leveraging Cloud for Product Testing- Impetus White Paper
Leveraging Cloud for Product Testing- Impetus White PaperLeveraging Cloud for Product Testing- Impetus White Paper
Leveraging Cloud for Product Testing- Impetus White Paper
 
Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018
 
Muwanika rogers (software testing) muni university
Muwanika rogers (software testing) muni universityMuwanika rogers (software testing) muni university
Muwanika rogers (software testing) muni university
 
Implementing a testing strategy
Implementing a testing strategyImplementing a testing strategy
Implementing a testing strategy
 
Dev ops developer (session 3)
Dev ops developer (session 3)Dev ops developer (session 3)
Dev ops developer (session 3)
 
DBTest 2013 - In Data Veritas - Data Driven Testing for Distributed Systems
DBTest 2013 - In Data Veritas - Data Driven Testing for Distributed SystemsDBTest 2013 - In Data Veritas - Data Driven Testing for Distributed Systems
DBTest 2013 - In Data Veritas - Data Driven Testing for Distributed Systems
 
DockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is DeadDockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is Dead
 
ChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptxChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptx
 
Testing &ampdebugging
Testing &ampdebuggingTesting &ampdebugging
Testing &ampdebugging
 
Chaos Engineering to Establish Software Reliability
Chaos Engineering to Establish Software ReliabilityChaos Engineering to Establish Software Reliability
Chaos Engineering to Establish Software Reliability
 
Creating and managing test environments best practices for test infrastructur...
Creating and managing test environments best practices for test infrastructur...Creating and managing test environments best practices for test infrastructur...
Creating and managing test environments best practices for test infrastructur...
 
5 Essential Tips for Load Testing Beginners
5 Essential Tips for Load Testing Beginners5 Essential Tips for Load Testing Beginners
5 Essential Tips for Load Testing Beginners
 
DevOps for beginners
DevOps for beginnersDevOps for beginners
DevOps for beginners
 
Introduction to SDET
Introduction to SDETIntroduction to SDET
Introduction to SDET
 
_DevOps Certification and Chaos Engineering Testing System Resilience.pptx
_DevOps Certification and Chaos Engineering Testing System Resilience.pptx_DevOps Certification and Chaos Engineering Testing System Resilience.pptx
_DevOps Certification and Chaos Engineering Testing System Resilience.pptx
 
Testing In Software Engineering
Testing In Software EngineeringTesting In Software Engineering
Testing In Software Engineering
 

Más de Agile Testing Alliance

Más de Agile Testing Alliance (20)

#Interactive Session by Anindita Rath and Mahathee Dandibhotla, "From Good to...
#Interactive Session by Anindita Rath and Mahathee Dandibhotla, "From Good to...#Interactive Session by Anindita Rath and Mahathee Dandibhotla, "From Good to...
#Interactive Session by Anindita Rath and Mahathee Dandibhotla, "From Good to...
 
#Interactive Session by Ajay Balamurugadas, "Where Are The Real Testers In T...
#Interactive Session by  Ajay Balamurugadas, "Where Are The Real Testers In T...#Interactive Session by  Ajay Balamurugadas, "Where Are The Real Testers In T...
#Interactive Session by Ajay Balamurugadas, "Where Are The Real Testers In T...
 
#Interactive Session by Jishnu Nambiar and Mayur Ovhal, "Monitoring Web Per...
#Interactive Session by  Jishnu Nambiar and  Mayur Ovhal, "Monitoring Web Per...#Interactive Session by  Jishnu Nambiar and  Mayur Ovhal, "Monitoring Web Per...
#Interactive Session by Jishnu Nambiar and Mayur Ovhal, "Monitoring Web Per...
 
#Interactive Session by Pradipta Biswas and Sucheta Saurabh Chitale, "Navigat...
#Interactive Session by Pradipta Biswas and Sucheta Saurabh Chitale, "Navigat...#Interactive Session by Pradipta Biswas and Sucheta Saurabh Chitale, "Navigat...
#Interactive Session by Pradipta Biswas and Sucheta Saurabh Chitale, "Navigat...
 
#Interactive Session by Apoorva Ram, "The Art of Storytelling for Testers" at...
#Interactive Session by Apoorva Ram, "The Art of Storytelling for Testers" at...#Interactive Session by Apoorva Ram, "The Art of Storytelling for Testers" at...
#Interactive Session by Apoorva Ram, "The Art of Storytelling for Testers" at...
 
#Interactive Session by Nikhil Jain, "Catch All Mail With Graph" at #ATAGTR2023.
#Interactive Session by Nikhil Jain, "Catch All Mail With Graph" at #ATAGTR2023.#Interactive Session by Nikhil Jain, "Catch All Mail With Graph" at #ATAGTR2023.
#Interactive Session by Nikhil Jain, "Catch All Mail With Graph" at #ATAGTR2023.
 
#Interactive Session by Ashok Kumar S, "Test Data the key to robust test cove...
#Interactive Session by Ashok Kumar S, "Test Data the key to robust test cove...#Interactive Session by Ashok Kumar S, "Test Data the key to robust test cove...
#Interactive Session by Ashok Kumar S, "Test Data the key to robust test cove...
 
#Interactive Session by Seema Kohli, "Test Leadership in the Era of Artificia...
#Interactive Session by Seema Kohli, "Test Leadership in the Era of Artificia...#Interactive Session by Seema Kohli, "Test Leadership in the Era of Artificia...
#Interactive Session by Seema Kohli, "Test Leadership in the Era of Artificia...
 
#Interactive Session by Ashwini Lalit, RRR of Test Automation Maintenance" at...
#Interactive Session by Ashwini Lalit, RRR of Test Automation Maintenance" at...#Interactive Session by Ashwini Lalit, RRR of Test Automation Maintenance" at...
#Interactive Session by Ashwini Lalit, RRR of Test Automation Maintenance" at...
 
#Interactive Session by Srithanga Aishvarya T, "Machine Learning Model to aut...
#Interactive Session by Srithanga Aishvarya T, "Machine Learning Model to aut...#Interactive Session by Srithanga Aishvarya T, "Machine Learning Model to aut...
#Interactive Session by Srithanga Aishvarya T, "Machine Learning Model to aut...
 
#Interactive Session by Kirti Ranjan Satapathy and Nandini K, "Elements of Qu...
#Interactive Session by Kirti Ranjan Satapathy and Nandini K, "Elements of Qu...#Interactive Session by Kirti Ranjan Satapathy and Nandini K, "Elements of Qu...
#Interactive Session by Kirti Ranjan Satapathy and Nandini K, "Elements of Qu...
 
#Interactive Session by Sudhir Upadhyay and Ashish Kumar, "Strengthening Test...
#Interactive Session by Sudhir Upadhyay and Ashish Kumar, "Strengthening Test...#Interactive Session by Sudhir Upadhyay and Ashish Kumar, "Strengthening Test...
#Interactive Session by Sudhir Upadhyay and Ashish Kumar, "Strengthening Test...
 
#Interactive Session by Sayan Deb Kundu, "Testing Gen AI Applications" at #AT...
#Interactive Session by Sayan Deb Kundu, "Testing Gen AI Applications" at #AT...#Interactive Session by Sayan Deb Kundu, "Testing Gen AI Applications" at #AT...
#Interactive Session by Sayan Deb Kundu, "Testing Gen AI Applications" at #AT...
 
#Interactive Session by Dinesh Boravke, "Zero Defects – Myth or Reality" at #...
#Interactive Session by Dinesh Boravke, "Zero Defects – Myth or Reality" at #...#Interactive Session by Dinesh Boravke, "Zero Defects – Myth or Reality" at #...
#Interactive Session by Dinesh Boravke, "Zero Defects – Myth or Reality" at #...
 
#Interactive Session by Saby Saurabh Bhardwaj, "Redefine Quality Assurance –...
#Interactive Session by  Saby Saurabh Bhardwaj, "Redefine Quality Assurance –...#Interactive Session by  Saby Saurabh Bhardwaj, "Redefine Quality Assurance –...
#Interactive Session by Saby Saurabh Bhardwaj, "Redefine Quality Assurance –...
 
#Keynote Session by Sanjay Kumar, "Innovation Inspired Testing!!" at #ATAGTR2...
#Keynote Session by Sanjay Kumar, "Innovation Inspired Testing!!" at #ATAGTR2...#Keynote Session by Sanjay Kumar, "Innovation Inspired Testing!!" at #ATAGTR2...
#Keynote Session by Sanjay Kumar, "Innovation Inspired Testing!!" at #ATAGTR2...
 
#Keynote Session by Schalk Cronje, "Don’t Containerize me" at #ATAGTR2023.
#Keynote Session by Schalk Cronje, "Don’t Containerize me" at #ATAGTR2023.#Keynote Session by Schalk Cronje, "Don’t Containerize me" at #ATAGTR2023.
#Keynote Session by Schalk Cronje, "Don’t Containerize me" at #ATAGTR2023.
 
#Interactive Session by Chidambaram Vetrivel and Venkatesh Belde, "Revolution...
#Interactive Session by Chidambaram Vetrivel and Venkatesh Belde, "Revolution...#Interactive Session by Chidambaram Vetrivel and Venkatesh Belde, "Revolution...
#Interactive Session by Chidambaram Vetrivel and Venkatesh Belde, "Revolution...
 
#Interactive Session by Aniket Diwakar Kadukar and Padimiti Vaidik Eswar Dat...
#Interactive Session by Aniket Diwakar Kadukar and  Padimiti Vaidik Eswar Dat...#Interactive Session by Aniket Diwakar Kadukar and  Padimiti Vaidik Eswar Dat...
#Interactive Session by Aniket Diwakar Kadukar and Padimiti Vaidik Eswar Dat...
 
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupam Agarwal, Peeyush Girdhar.

  • 1. #ATAGTR2021 Chaos Engineering: Break It to Make It Anupam Agarwal & Peeyush Girdh
  • 2. KNOW YOUR SPEAKERS Anupam Agarwal Peeyush Girdhar Cloud/DevOps Architect Cloud/DevOps Architect
  • 3. AGENDA 01 02 03 04 Concept of Chaos Engineering Need for Chaos Engineering Chaos Engineering vs Normal Testing Start your journey with Chaos Engineering
  • 4. Why the World Needs more Resilient Systems ? 1 BREACH 2 MATURITY 3 TEAMS 4 TESTING Organizations confirmed or suspected breaches tied to their applications or Infrastructure. Organization that are in immature or improving state with respect to environment resilience. Teams have not incorporated resilience testing in their design during initial stages of SDLC Traditional testing are still not helping them to find the issues within the ecosystems.. 24% 86% 65% 47% Common issues faced by multiple organizations
  • 5. Chaos Engineering : Where are we ? The art of breaking things purposefully Ever since Netflix introduced Chaos Engineering through their Simian Army toolset in 2012, the idea of inducing failure as a preventative means has become one of the preferred resilience techniques for cloud native distributed systems. “Chaos Engineering is the discipline of experimenting on a distributed system in order to induce artificial failures to build confidence in the system's capability to withstand turbulent conditions in production.” Here's how Netflix describes why they built these chaos tools: The cloud is all about redundancy and fault-tolerance. Since no single component can guarantee 100% uptime (and even the most expensive hardware eventually fails), we have to design a cloud architecture where individual components can fail without affecting the availability of the entire system. In effect, we have to be stronger than our weakest link.
  • 6. Why Chaos Engineering? Chaos Engineering is Preventive Medicine Chaos Engineering is an approach for learning about how your system behaves by applying a discipline of empirical exploration. Chaos engineering enables organizations to develop reliable and fault-tolerant software systems, building your team’s confidence in them. The more stable your systems are, the more confident you can be that they will function properly. By designing and executing Chaos Engineering experiments, you will learn about weaknesses in your system that could potentially lead to outages in customer environment. LEARN PREVENT OUTAGES BUILD CONFIDENCE
  • 7. Getting Started with Chaos Engineering Disciplined approach to find failures before they become outages. DEFINE ‘STEADY STATE’ CREATE HYPOTHESIS RUN EXPERIMENTS INTERPRET THE RESULTS LEARN & IMPROVE Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior. Hypothesize that this steady state will continue in both the control group and the experimental group Introduce attacks that reflect real world events like server crash, hard drive malfunctioning, network outage etc. Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental Improve functionalities in the existing system from the above experiments and their results.
  • 8. Chaos Engineering Meets DevOps Maximize benefits by practicing automated Chaos Engineering within your CI/CD pipelines
  • 9. DEVOPS SLOs/Error Budget Documentati on Architectu re ModelRunbooks Monitoring Network Provider CDNs Cloud & SaaS Providers Performan ce Error Handling Timeouts/Retri es/Circuit Breakers Automated Testing Continuous Integration Continuous Deployment Feature Flagging/ Progressive Continuous Chaos Graduate chaos experiments into different phases
  • 10. What is Game Day? Game Day are like fire drills on a dedicated day for running chaos engineering experiments on our systems. Define the timelines Whiteboarding Execution Review Define the Targets How to run a Game Day Promote Chaos Days !!
  • 11. How Chaos Engineering differ from Testing ? Practice for generating new information • Experiments propose a hypothesis, and if the hypothesis is not disproven, confidence grows in that hypothesis. If it is disproven, then we learn something new. GENERATE NEW INFORMATION • An important distinction can be drawn between testing and experimentation. Tests make an assertion, based on existing knowledge, and then running the test collapses the valence of that assertion, usually into either true or false. DRAW DISTINCTION  When you want to explore the many ways, a complex system can misbehave, injecting communication failures like latency and errors is one good approach. EXPLORATION OF UNKNOWN • Testing, strictly speaking, does not create new knowledge. Testing requires that the engineer writing the test knows specific properties about the system that they are looking for in advance. COMPLEX ECOSYSTEM
  • 12. Tools to kickstart your Chaos Journey AWS Fault Injection Which one to choose?
  • 13. Is it even worth embracing? Pros Cons • Insights received after running chaos testing can lead to a reduction in production incidents for the future. • Implementing Chaos tools for a large- scale system and experimenting can lead to an increase in cost. • Helps in improving the confidence and engagement of team members for carrying out disaster recovery methods and makes applications highly reliable. • Carelessness or Incorrect steps in formation and implementation can impact the application, thereby hampering the customer. • On a high level, Chaos Engineering provides us an advantage by overall system availability. • It doesn't support all kinds of deployment. • Production outages can lead to huge losses; therefore, chaos engineering helps in the prevention of large losses in revenue. • Most of the chaos Engineering tools do not covers all type of environments and its components. • The team can verify system's behavior on failure to take Opportunities & Obstacles
  • 15. 15 ANY QUESTIONS For Any Queries, please write at : anupamaggarwal.0611@gmail.com / girdhar.peeyush@gmail.com

Notas del editor

  1. Opportunities & Obstacles