Testing inproduction svcc18

•Descargar como PPTX, PDF•

0 recomendaciones•259 vistas

While testing in demo and stage is good (indeed, essential), testing in production is all too often overlooked. Deploying to production and hoping for the best is a gamble, not a strategy. In this talk, we discuss 1) Better production deployment and testing strategies including dark pool testing, canary releases and feature switching. 2) After deployment, your work is still not done. We'll talk about Observability, including monitoring, tracing and metrics. 3) Finally, even with the best deployment strategies and monitoring techniques, your software WILL fail in production. It's a question of when, not if. So why not simulate those failures first? We'll finish with game days and chaos engineering. This talk should be of interest to all developers, QA and Ops folks who are responsible for getting working software in front of users.

Tecnología

Confidential 1
Testing
in
Production
Saturday 13th October, 2018
shaun@abram.com
shaunabram.com
@shaunabram
linkedin.com/in/sabram/
Evaluation: https://goo.gl/VTAwgw

Confidential
Testing in Production
3
How is Production different?

Confidential 4
Testing in Production
IS NOT
a replacement
for non-prod testing
Treat production validation with the respect it deserves.

Confidential 5
Testing in Production
Observability
Testing
at Release
Chaos
Engineering

Confidential 6
Observability
The ability to ask new questions of your system without deploying new code

Confidential
We need Observability in our systems
7
 Everything is sometimes broken
 Something is always broken
 If nothing seems broken...
…your monitoring is broken
It’s impossible to predict the myriad states of partial failures we’ll see

Confidential
Privileged and Confidential 8
Observability
How do we observe our apps?
Logs
Metrics
Monitoring &
Alerting
Traces
Tools

Confidential 9
Testing in Production
Observability

Confidential 10
Testing in Production
Observability
Testing
at Release
Chaos
Engineering

Confidential 11
Testing in Production
Testing
at Release

Confidential 12
Testing at Release
Deploy
Config Tests
Smoke Tests
Shadowing
Load Tests
Release
Canary release
Internal release
Post-Release
Feature Flags
A/B Testing
Chaos Engineering…

Confidential 13
Testing in Production
Testing
at Release

Confidential 14
Testing in Production
Observability
Testing
at Release
Chaos
Engineering

Confidential 15
Testing in Production
Chaos
Engineering
Carefully planned experiments
designed to reveal weaknesses in our
systems
aka
Resilience Engineering

Confidential
Game Days
16
An exercise where we place systems
under stress to
learn and improve resilience
(And even just getting the team together to discuss resilience can be worthwhile)

Confidential
Chaos Engineering – a step by step guide
17
Hypothesis
(Steady state)
Minimize
Blast Radius
Run
Analyze
Increase
Repeat,
Automate

Confidential 18
Testing in Production
Chaos
Engineering

Confidential 19
Testing in Production
Observability
Testing
at Release
Chaos
Engineering

Confidential
Reading material
20
Chaos Engineering (free eBook)
https://www.oreilly.com/webops-perf/free/chaos-engineering.csp
Distributed Systems Observability (free eBook)
https://distributed-systems-observability-ebook.humio.com/

Confidential
Reading material
21
shaunabram.com
Principles of Chaos Engineering
principlesofchaos.org
How to run a Game Day
https://www.gremlin.com/community/tutorials/how-to-run-a-gameday/
Testing in production:
https://medium.com/@copyconstruct/testing-in-production-the-safe-way-18ca102d0ef1
Monitoring in the time of Cloud Native
https://medium.com/@copyconstruct/monitoring-in-the-time-of-cloud-native-c87c7a5bfa3e
Deploy != Release
https://blog.turbinelabs.io/deploy-not-equal-release-part-one-4724bc1e726b

Confidential
Testing in production – The Industry Experts
22
Nora Jones
Charity Majors
Cindy Sridharan
Tammy Butow

Confidential
Questions?
And please evaluate!
https://goo.gl/VTAwgw
23

Más contenido relacionado

Similar a Testing inproduction svcc18

How can you squeeze Security into DevOps? Security is often an understaffed function, so how can you leverage what you have in DevOps to improve your security posture? Often the culture clash between Security and Development is even more prominent than between Development and Operations. Understanding the differences in how these functions work, and leveraging their similarities, will reveal processes already in place that can be used to improve security. This fine tuning of tools and processes can give you DevSecOps on a shoestring.

DevSecOps for Developers, How To Start (ETC 2020)

Patricia Aas

[Rakuten TechConf2014] [G-4] Beyond Agile Testing to Lean Development

Rakuten Group, Inc.

In un processo di sviluppo software vengono utilizzate migliaia di componenti open source e di proprietà. Sebbene questa pratica sia volta alla produzione di sviluppo software di alta qualità nel modo più efficiente possibile, basta dare un’occhiata alle statistiche per capire che la realtà è ben diversa. Il webinar illustra lo stato attuale della supply chain del software secondo la ricerca condotta nel 2016 da Nexus Sonatype su 3.000 aziende e più di 25.000 applicazioni.

Webinar: "La supply chain del software vista a raggi X"

Emerasoft, solutions to collaborate

Beyond Agile Testing to Lean Development — Rakuten Technology Conference

James Coplien

Software security is best built in. This presentation introduces three essential things to help you design more secure software. In order to have a secure foundation, you can create and select security requirements for your applications using evil user stories and utilizing existing material for example from OWASP. Another useful skill is threat modeling which helps you to assess security already in the design phase. Threat modeling helps you deliver better software, prioritize your preventive security measures, and focus penetration testing to the most risky parts of the system. The presentation covers various methods, such as the STRIDE model, for finding security and privacy threats. You will also learn what kind of security related testing you can do without having any infosec background.

What Every Developer And Tester Should Know About Software Security

Anne Oikarinen

https://www.meetup.com/cybersecurity-digital-trust/ https://www.meetup.com/cybersecurity-digital-trust/events/289368916/ Hi All, Let's get together and talk Cyber Security + enjoy free pizza! We're professionals in the technology space, and we're starting a new meetup to address modern cybersecurity challenges. Through these events, we aim to: - Share ideas, best practices, and case studies - Present the latest risks and developments in cybersecurity - Practice cybersecurity problems and solutions - Support each other via networking opportunities and broadening our business understanding of related topics For this initial meetup, we will be covering a technical analysis of the 2022 Optus data leak, and the corporate impact this is having on digital governance in Australia. And we will expand the conversation to zero-trust systems, and some recent developments in the cybersecurity space such as webauthn, web3, secrets management... etc. Potential topics for future events: * centralisation/decentralisation of digital assets * increasing risk due to secret sprawl * software supply chain security * zero-trust, trustless systems and how we got here from Snowden NSA leaks * case studies like the 2022 Optus data leak * identity fraud and developments in digital identity We hope this will create a community for professionals to share ideas and tips on how companies can improve their capabilities and most importantly create a safe and fun environment for everyone.

CyberSecurity - Future Risks, Zero Trust and the Optus Data Leak.pdf

Roger Qiu

12 Crucial Windows Security Skills for 2018

Paula Januszkiewicz

Applying formal methods to existing software by B.Monate

Mahaut Gouhier

Testistanbul 2016 - Keynote: "Why Automated Verification Matters" by Kristian...

Turkish Testing Board

Multipoint Conferencing Unit Comparative Study

Videoguy

In the past 5 years Continuous Delivery has gained much attention. Its benefits of rapid, iterative change are well understood, all the way up to board level. However, CD often encounters an adversary; Security. Protection of data and computer systems seems to stand on concepts like infrequent change, segregation of duties and bureaucratic heavyweight process. But are CD and Security really at odds? We don’t think so. Whilst we’ll show you the dangers of unfettered CD pipelines and the risk of letting security spread fear. We will also share ways in which we’ve managed to balance speed and security in our pipelines–considering both the technical and organisational aspects. In fact we hope you’ll see that not only is there a way, but it’s a far better way.

Securing the Pipeline

Thoughtworks

This lecture is a part of the online course on Software Testing for Complex Intelligent Systems and Autonomous Vehicles. The course lectures provide the theoretical basics of testing autonomous systems based on artificial intelligence. The fourth lecture of the course entitled Foundations of Software Testing reviews the ‘absence-of-errors fallacy’ and other principles of software testing, as well as the types and levels of software testing. The lecture also provides a fuller picture of the understanding of test objectives and methodologies by different schools of thought within the software testing domain.

Foundations of Software Testing Lecture 4

Iosif Itkin

Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...

Simon Storm

Building Trust in Automated Tests

Jyoti Mittal

With a fragile test suite, the Continuous Testing that's vital to Agile just isn't feasible. If you truly want to automate the execution of a broad test suite—embracing unit, component, integration, functional, performance, and security testing—during continuous integration, you need to ensure that your test suite is up to the task. How do you achieve this? This presentation provides tips on ensuring that your tests are up to the task.

Are Your Continuous Tests Too Fragile for Agile?

Parasoft

nullcon 2011 - Fuzzing with Complexities

n|u - The Open Security Community

CLL19 - Acceptance Tests as Monitors

Phill Barber

Forget the “Unicorns.” There is a lot to learn from “DevOps Unicorns” such as Etsy or Facebook, but for enterprises dealing with technical debt in legacy systems developed by teams no longer with the company, copying the unicorns is not an option. Richard Dominguez, Operations Developer at Prep Sportswear, needed to “keep the lights on” for their legacy systems, while enabling his DevOps teams to launch new features much faster. Today Prep Sportswear releases more updates to their legacy systems than ever before by reducing MTTR (Mean Time To Repair), giving them more time to innovate on DevOps and Continuous Delivery on their new platform. You’ll learn: • Top metrics for an Ops dashboard to catch potential issues early • Tips to manage technical debt in legacy code caused by dev teams long gone • Efficient ways to close loops while providing input to DevOps so they can optimize innovation and releases

How to Better Manage Technical Debt While Innovating on DevOps

Dynatrace

What is testing? What is agile testing? What is automated testing? What is agile testing? Unit testing Mock testing Functional testing Acceptance testing Integration testing Performance/load/stress testing Deployment testing Methods of testing White/black/grayboxtesting GUI vs. businesslogictesting Improving code testability Codefacing vs. businessfacingtesting Smoke testing Automated testing strategies Virtualization Code coverage Resources File Can be downloaded from: http://community.scmgalaxy.com/

Testing in a glance

Rajesh Kumar

Making Security Agile - Oleg Gryb

SeniorStoryteller

Similar a Testing inproduction svcc18 (20)

DevSecOps for Developers, How To Start (ETC 2020)

[Rakuten TechConf2014] [G-4] Beyond Agile Testing to Lean Development

Webinar: "La supply chain del software vista a raggi X"

Beyond Agile Testing to Lean Development — Rakuten Technology Conference

What Every Developer And Tester Should Know About Software Security

CyberSecurity - Future Risks, Zero Trust and the Optus Data Leak.pdf

12 Crucial Windows Security Skills for 2018

Applying formal methods to existing software by B.Monate

Testistanbul 2016 - Keynote: "Why Automated Verification Matters" by Kristian...

Multipoint Conferencing Unit Comparative Study

Securing the Pipeline

Foundations of Software Testing Lecture 4

Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...

Building Trust in Automated Tests

Are Your Continuous Tests Too Fragile for Agile?

nullcon 2011 - Fuzzing with Complexities

CLL19 - Acceptance Tests as Monitors

How to Better Manage Technical Debt While Innovating on DevOps

Testing in a glance

Making Security Agile - Oleg Gryb

Más de Shaun Abram

The DevOps movement provides guidance on better ways to deliver working software to production in a fast, safe and automated manner. Is releasing new features every few weeks still a good approach? How quickly can you get a critical bug fix into production? Are you continually learning and improving? This talked is based on books such as The DevOps Handbook, Release It, and from real world experience delivering "mission critical" microservices in high volume production environments. Come learn how to increase profitability, improve culture, and exceed productivity goals through DevOps practices. We've all messed up releases. Learn from it. Embrace. Improve!

Ship it boise

Shaun Abram

Unit testing has entered the main stream. It is generally considered best practice to have a high level of unit test code coverage, and to ideally write tests before the code, via Test Driven Development. However, some code is just plain difficult to test. The cost of effort of adding the tests may seem to outweigh the benefits. In this session, we will do a quick review of the benefits of unit tests, but focus on how to test tricky code, such as that static and private methods, and legacy code in general. Examples are in Java, but the principals are language agnostic.

Unit testing - the hard parts

Shaun Abram

The microservice architectural style is an approach to developing an application as a suite of small services that each can be independently developed and deployed. In this talk, we will cover the pros and cons of microservices, including contrasting them with the more traditional 'monolithic' application. We will also dive into the most common mechanism used to expose the functionality of a microservice. REST is an architecture style for building scalable web services. You've at least heard of it, you may have contributed to or even created 'RESTful' applications, but are you familiar with the basic constraints that make up REST? We'll cover the theory behind REST before diving into pragmatic implementation styles and better practices.

RESTful Microservices

Shaun Abram

The microservice architectural style is an approach to developing an application as a suite of small services. Each can be independently developed and deployed. This presentation covers the pros and cons of microservices, including contrasting with the more traditional 'monolithic' application. We also dive into the most common mechanism used to expose their functionality: RESTful APIs, including a discussion of HTTP and its components.

Rest and Microservices at the Las Vegas Dot Net Group

Shaun Abram

REST and Microservices

Shaun Abram

Software Quality via Unit Testing

Shaun Abram

Más de Shaun Abram (6)

Ship it boise

Unit testing - the hard parts

RESTful Microservices

Rest and Microservices at the Las Vegas Dot Net Group

REST and Microservices

Software Quality via Unit Testing

Último

Handwritten Text Recognition for manuscripts and early printed texts

Maria Levchenko

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

Scaling API-first – The story of a global engineering organization

Radu Cotescu

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Developing An App To Navigate The Roads of Brazil

V3cube

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

Histor y of HAM Radio presentation slide

vu2urc

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

This presentation explores the impact of HTML injection attacks on web applications, detailing how attackers exploit vulnerabilities to inject malicious code into web pages. Learn about the potential consequences of such attacks and discover effective mitigation strategies to protect your web applications from HTML injection vulnerabilities. for more information visit https://bostoninstituteofanalytics.org/category/cyber-security-ethical-hacking/

HTML Injection Attacks: Impact and Mitigation Strategies

Boston Institute of Analytics

Testing inproduction svcc18

1. Confidential 1 Testing in Production Saturday 13th October, 2018 shaun@abram.com shaunabram.com @shaunabram linkedin.com/in/sabram/ Evaluation: https://goo.gl/VTAwgw

2. Confidential 2

3. Confidential Testing in Production 3 How is Production different?

4. Confidential 4 Testing in Production IS NOT a replacement for non-prod testing Treat production validation with the respect it deserves.

5. Confidential 5 Testing in Production Observability Testing at Release Chaos Engineering

6. Confidential 6 Observability The ability to ask new questions of your system without deploying new code

7. Confidential We need Observability in our systems 7  Everything is sometimes broken  Something is always broken  If nothing seems broken... …your monitoring is broken It’s impossible to predict the myriad states of partial failures we’ll see

8. Confidential Privileged and Confidential 8 Observability How do we observe our apps? Logs Metrics Monitoring & Alerting Traces Tools

9. Confidential 9 Testing in Production Observability

10. Confidential 10 Testing in Production Observability Testing at Release Chaos Engineering

11. Confidential 11 Testing in Production Testing at Release

12. Confidential 12 Testing at Release Deploy Config Tests Smoke Tests Shadowing Load Tests Release Canary release Internal release Post-Release Feature Flags A/B Testing Chaos Engineering…

13. Confidential 13 Testing in Production Testing at Release

14. Confidential 14 Testing in Production Observability Testing at Release Chaos Engineering

15. Confidential 15 Testing in Production Chaos Engineering Carefully planned experiments designed to reveal weaknesses in our systems aka Resilience Engineering

16. Confidential Game Days 16 An exercise where we place systems under stress to learn and improve resilience (And even just getting the team together to discuss resilience can be worthwhile)

17. Confidential Chaos Engineering – a step by step guide 17 Hypothesis (Steady state) Minimize Blast Radius Run Analyze Increase Repeat, Automate

18. Confidential 18 Testing in Production Chaos Engineering

19. Confidential 19 Testing in Production Observability Testing at Release Chaos Engineering

20. Confidential Reading material 20 Chaos Engineering (free eBook) https://www.oreilly.com/webops-perf/free/chaos-engineering.csp Distributed Systems Observability (free eBook) https://distributed-systems-observability-ebook.humio.com/

21. Confidential Reading material 21 shaunabram.com Principles of Chaos Engineering principlesofchaos.org How to run a Game Day https://www.gremlin.com/community/tutorials/how-to-run-a-gameday/ Testing in production: https://medium.com/@copyconstruct/testing-in-production-the-safe-way-18ca102d0ef1 Monitoring in the time of Cloud Native https://medium.com/@copyconstruct/monitoring-in-the-time-of-cloud-native-c87c7a5bfa3e Deploy != Release https://blog.turbinelabs.io/deploy-not-equal-release-part-one-4724bc1e726b

22. Confidential Testing in production – The Industry Experts 22 Nora Jones Charity Majors Cindy Sridharan Tammy Butow

23. Confidential Questions? And please evaluate! https://goo.gl/VTAwgw 23

Notas del editor

Joke… or Good? Non-prod = pale imitation, like mocks, or “it works on my machine” Prod is different; 4th trimester… “Testing in production” You may have seen this meme before: The DosEquis guys saying “I don’t always test, but when I do, I test in production” “Testing in production” has been kind of a joke -> what you’re really saying is you don’t test anywhere. And instead you’re just winging it: deploying to production and <CROSS FINGERS> hoping it all works. But then I began to look at it differently. The DosEquis guy usually says “I don’t always drink beer, but when I do, I drink DosEquis” Meaning DosEquis is the best beer to drink. So, the implication here is not that testing in production is a joke, but that Production is actually the BEST place to test. And I’m increasingly believing that to be the case. Or, at least, that production is an environment we shouldn’t be ignoring for testing. After all, prod is the only place your software has an impact on your customers is production. But there has been this status quo of production being sacrosanct. Instead of testing there, it is common to keep a non-prod env, such as staging, as identical to production as possible, and test there. Such environments are usually a pale imitation of production however. Testing in staging is kind of like testing with mocks, an imitation, but not the real thing. Saying “works in staging” is only one step better than “works on my machine”. Production is a different beast! I’ve heard of software being released to production as being like a baby’s 4th trimester. When software leave’s it artificial environments and slams into the real world But what makes the real world of production so special?
Serious question: In what ways in Production different from other environments? Hardware & Cluster size, Data Configuration, Traffic, Monitoring Some things we can only test in production As our architecture becomes more compilated (particularly with Microservices), we need to consider all options to allow us to test and deliver working software to our customer. Including testing in production.
So should we skip testing in non-prod first? No! Testing in production is by no means a substitute for pre-production testing I’ve given talks on unit testing, integration testing Mocks About code coverage and Continuous Integration I believe very firmly in all those things. Testing in Production is an addition to all those. Most production testing is really validation only – although there is at least one exception (A/B testing) Respect production Beware of unwanted side effects Stateless services are good candidates Think SAFE methods e.g. GET, HEAD Consider testing using expected failures of others e.g. PUT that results in 400 error (still tells you something) Or at least be able to tell the difference between test data and “real” prod data
Today we’re going to cover some of the different ways we can test in production We’ll start with Observability, the foundation for any testing in production. Observability = Knowing what the heck your app is doing anyway. Going beyond just logs and alerting Around deployment & release times Chaos engineering. Perhaps the most advanced form of production testing, but I would argue its actually not that advanced. I talk about what is is, some basic rules for doing and how we’ve been starting to use it where I work.
Observability is The ability… Being able to answer questions that you have never thought of before You can think of it as the next step beyond just monitoring and alerting Systems have become more distributed, and in the case of containerization, more ephemeral. It is increasingly difficult to know what our software is doing And Observability means bringing better visibility into systems To have better visibility, we need to acknowledge that…
Everything is sometimes broken Something is always broker -> No complex system is ever fully healthy If nothing broken… Distributed systems are unpredictable. In particular, it’s impossible to predict all the ways a system might fail Failure needs to be embraced at every phase (from design to implementation, testing, deployment, and operation) Ease of debugging is of high importance
Logging: Structured logging: plain text -> splunk friendly -> json Eventlog can be a great source of logs for debugging too Consider sampling rather than aggregation Metrics: Time series metrics, like tracking system stats such as CPU and mem usage, stats like # logins Tracing: Distributed traceability using Correlation ID lib; Zipkin etc Alerting: Useful for about proactively learning about, typically, predictable issues Tools: e.g. Splunk, NR, OverOps; Wavefront EPX, TDA (Thread Dump Analyzer) UX OverOps / HoneyComb Stacktraces and exception trackers?
OK, so that was Observability The ability to answer questions about our applications behavior in production. Questions we may have never even though of before.
And with Observability in place, what types of production testing can we do… Let’s move onto…
Testing at Release time Let’s start by defining some terms: Deployment vs release
When talking with engineers, I usually use the term Chaos Engineering, because it sounds cool! When talking with management, I tend to use the term resilience engineering, since it sounds less scary. The terms are synonymous. In the past, terms such as Disaster Recovery and Contingency Planning have been used to describe somewhat similar processes. Whatever term you use, it basically refers to -> Conducting carefully planned experiments designed to reveal weaknesses in our systems. In other words, CE is the practice of confirming that your applications work as you expect them to in production. Despite the name, Chaos Engineering is not about introducing Chaos into your system! Instead it is about identifying any chaos already there, so that you can remediate. For example, if you GIVE A GOOD CANONICAL EXAMPLE OF A CHAOS ENGINEERING EXPERIMENT HERE if you believe your application will failover to something if x happens or can handle x requests per second before failing.
What are Game Days? If Chaos Engineering is the theory, Game days are the practice; the execution Game days are where you start with Chaos engineering -> Game days are “An exercise where we place systems under stress to learn and improve resilience” Systems can be technology, people, process They are like fire drills – an opportunity to practice a potentially dangerous scenario in a safer environment
To start with, what are we trying to test! Pick a hypothesis. Typically in Chaos Engineering experiments, the hypothesis is that if I do X (take out a server, kill a region), everything should be OK But we need to be specific about how to measure things are OK If out hypothesis is “if we fail primary DB, everything should be ok,” Then we need to define what OK is! And a big part of defining OK is to define “Steady State” Steady state is essentially what the key metrics are for you to monitor as part of your test. It could be things like: Loan application remain constant Or response times remain in an acceptable range If you don’t define steady state, how do you know your test is working on not? How do you know if you are breaking things? With a hypothesis in mind, and a way to test, but first think abut blast radius 2. Minimize the blast radius The blast radius refers to to how much damage can be done by the experiment If you take out a server, and everything is in fact NOT OK, how bad might it be Try to ensure that you limit he possible damage For example, if your hypothesis is that When Foo service is running in a pool of 2 servers And one of those servers dies, CPU and memory utilization should increase on the remain servers, but response time remain unaffected That is a fine thing to test But if you have 10 services depending on that service (even in non prod), and your wrong that response times will be unaffected, you may have caused 10 other services to have problems So a way to limit the blast radius in that test would be to test using a pool of Foo Service that only one other service relies on. Hopefully a service that you also control and that is closely monitored as part of the test. Another way to minimize possible damage is to make sure that you have the equivalent of a big red Stop Test button! If you metrics aren’t looking good, have the ability to abort the test immediately. Remember: our goal here is to build confidence in the resilience of the system and we so that with Controlled, contained experiments, that grow in scope incrementally. 3. Run the experiment Figure out the best way to test your hypothesis If you plan to take out a server, how do you do it? ssh in and kill -9? Orderly shutdown? Have Ops do it for you? Do you simulate failure by using bogus IP addresses, or simply removing a server from a VIP pool? And again, stop if metrics or alert dictate 4. Analyze the results Were your expectations correct? Did your system handle things correctly Did you spot issue with you alerts, metrics that should be improved before any future tests 5. Increase scope The idea is to start small 1 service, in non-prod, and gradually expand to prod. And the goal should be prod. Prod is where’s it’s at!
That brings us to the end of the presentation We have talked about Testing in production No longer a joke, instead increasingly viewed as a best practice. It is not a replacement for the essential and high value non-prod testing we do, but instead an addition. Observability: Testing in production, and indeed in all envs, requires being able to understand what our applications do. Conventional logs, monitoring and alerting are all good, but Observability is about more than that. It’s about the ability to answer complex questions about our apps at run time. Questions we may not have even thought of before like: why is my app slow. Is it me or a downstream service? Where is all my memory being used. We can use metrics, tracing, any tools at our disposal so that we can see what’s going on when things go wrong. Or better still, to proactively spot problems in advance. And with Observability in place, we can actually start to test in production! We ran through different types of Testing at Release We can do, including after deployment (Config, smoke, load, shadow) At release time: Canary and internal release After release: Feature flags and A/B testing Finally, even when everything is up and running in prod, customers are using it, and all looks good, there is still more testing we can do Chaos Engineering Not introducing chaos, but exposing the already present chaos! Carefully planned experiments designed to reveal weaknesses in our systems

Testing inproduction svcc18

Recomendados

Recomendados

Más contenido relacionado

Similar a Testing inproduction svcc18

Similar a Testing inproduction svcc18 (20)

Más de Shaun Abram

Más de Shaun Abram (6)

Último

Último (20)

Testing inproduction svcc18

Notas del editor