Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Resilient Architecture

1.363 visualizaciones

Publicado el

Disruptive companies are approaching resiliency differently. Stop preventing mistakes. Embrace failure. Resilient architectures enhance observability, leverage resiliency patterns, and embrace chaos!

Publicado en: Software
  • Sé el primero en comentar

Resilient Architecture

  1. 1. RESILIENT ARCHITECTURE Matt Stine ( )@mstine http://www.mattstine.com
  2. 2. HEADLINES
  3. 3. A SYSTEM FAILURE COSTS A WELL-KNOWN RETAILER SIGNIFICANT REVENUE ON THE BIGGEST INTERNET SHOPPING DAY OF THE YEAR.
  4. 4. A SYSTEM FAILURE CAUSES THE CANCELLATION OF HUNDREDS OF FLIGHTS, STRANDING THOUSANDS OF AIRLINE PASSENGERS, AND ULTIMATELY COSTING THE AIRLINE MILLIONS IN REVENUE.
  5. 5. A BEAUTIFULLY DESIGNED ONLINE STORE CRUMBLES UNDER THE PRESSURE OF A THUNDERING HERD OF CUSTOMERS TRYING TO PURCHASE THE LATEST TECH GADGET.
  6. 6. A SECURITY BREACH EXPOSES THOUSANDS OF CUSTOMER CREDIT CARD NUMBERS, LEADING TO MILLIONS IN LOST REVENUE DUE TO THE RESULTING LOSS OF TRUST.
  7. 7. WHAT CAN WE DO?
  8. 8. DISRUPTIVE COMPANIES ARE ALSO APPROACHING RESILIENCY DIFFERENTLY.
  9. 9. STOP TRYING TO PREVENT MISTAKES.
  10. 10. EMBRACE FAILURE.
  11. 11. FROM MTBF TO MTTR
  12. 12. WE NEED BETTER TOOLS AND TECHNIQUES.
  13. 13. RESILIENT ARCHITECTURES Enhance Observability Leverage Resiliency Patterns Embrace Chaos
  14. 14. ENHANCE OBSERVABILITY
  15. 15. SEE FAILURE WHEN IT HAPPENS
  16. 16. MEASURE EVERYTHING
  17. 17. WHAT IS NORMAL? Values Rates of Change Mean? P95/99/99.9?
  18. 18. WHAT IS NORMAL? http://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
  19. 19. SPRING BOOT HEALTH ENDPOINT { "diskSpace": { "status": "UP", "total": 1056858112, "free": 878850048, "threshold": 10485760 }, "refreshScope": { "status": "UP" }, "configServer": { "status": "UP", "propertySources": [ "configClient", "https://github.com/spring-cloud-services-samples/fortune-teller/configuration/application.yml" ] }, "hystrix": {
  20. 20. SPRING BOOT INFO ENDPOINT "git": { "build": { "host": "Matts-MacBook-Pro.local", "version": "0.0.1-SNAPSHOT", "time": 1489021333000, "user": { "name": "Matt Stine", "email": "mstine@pivotal.io" } }, "branch": "master", "commit": { "message": { "short": "initial commit", "full": "initial commit" }, "id": "9b624974e417693cf921b9abc50b5af4ea0b6dde", "id.describe-short": "9b62497-dirty", "id.abbrev": "9b62497", "id.describe": "9b62497-dirty",
  21. 21. DISTRIBUTED TRACING Zipkin
  22. 22. EXAMPLES: Spring Boot Actuator http://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#production-ready PCF Apps Manager https://docs.pivotal.io/pivotalcf/1-9/console/using-actuators.html Spring Cloud Sleuth https://cloud.spring.io/spring-cloud-sleuth/ Zipkin http://zipkin.io/
  23. 23. LEVERAGE RESILIENCY PATTERNS
  24. 24. TIMEOUTS
  25. 25. TIMEOUTS Thinking is half the battle! Anything that blocks threads Any method call with an optional timeout argument
  26. 26. ADDING TIMEOUTS TO RESTTEMPLATE @Bean public RestTemplate restTemplate() { SimpleClientHttpRequestFactory clientHttpRequestFactory = new SimpleClientHttpRequestFactory(); clientHttpRequestFactory.setConnectTimeout(10 * 1000); // Ten seconds! clientHttpRequestFactory.setReadTimeout(10 * 1000); // Ten seconds! return new RestTemplate(clientHttpRequestFactory); }
  27. 27. RETRIES
  28. 28. RETRIES Potentially transient failures Immediately With a backoff Maximum times Log all the things
  29. 29. SIMPLE RETRY @RequestMapping("/acquireThings") @Retryable public ResponseEntity<String> tryToAcquireThings() { logger.info("Attempting to acquire things..."); String things = restTemplate .getForObject("http://localhost:8081/things", String.class); return new ResponseEntity<String>(things, HttpStatus.OK); } @Recover public ResponseEntity<String> recover() { logger.warn("Returning default response..."); return new ResponseEntity<String>("default things", HttpStatus.OK); }
  30. 30. RETRY WITH BACKOFF @RequestMapping("/acquireThings") @Retryable(maxAttempts = 5, backoff = @Backoff(delay = 100L, maxDelay = 1000L, multiplier = 2, random = true) ) public ResponseEntity<String> tryToAcquireThings() { logger.info("Attempting to acquire things..."); String things = restTemplate .getForObject("http://localhost:8081/things", String.class); return new ResponseEntity<String>(things, HttpStatus.OK); }
  31. 31. EXPONENTIAL BACKOFF @Bean public BackOffPolicy backOffPolicy() { return new ExponentialBackOffPolicy(); }
  32. 32. BULKHEADS
  33. 33. BULKHEADS Microservices Thread Pools Availability Zones
  34. 34. CIRCUIT BREAKERS
  35. 35. CIRCUIT BREAKERS
  36. 36. SPRING CLOUD HYSTRIX @HystrixCommand(fallbackMethod = "fallbackFortune") public Fortune randomFortune() { return restTemplate.getForObject("http://fortunes/random", Fortune.class); } private Fortune fallbackFortune() { return new Fortune(42L, fortuneProperties.getFallbackFortune()); }
  37. 37. EXAMPLES: Spring Retry https://github.com/spring-projects/spring-retry Hystrix https://github.com/Netflix/Hystrix via Spring Cloud Netflix https://cloud.spring.io/spring-cloud-netflix/
  38. 38. EMBRACE CHAOS
  39. 39. HOW DO YOU KNOW YOUR SYSTEM WILL TOLERATE FAILURE IF IT HASN'T FAILED?
  40. 40. GAME DAY EXERCISES
  41. 41. CAN WE DIAL THAT UP A NOTCH?
  42. 42. YAU AND CHEUNG: DESIGN OF SELF-CHECKING SOFTWARE (1975)
  43. 43. DID SOMEBODY SAY...
  44. 44. EXAMPLES: Chaos Lemur (BOSH) https://github.com/strepsirrhini-army/chaos-lemur Chaos Loris (CF) https://github.com/strepsirrhini-army/chaos-loris
  45. 45. REVIEW TIME! Stop trying to prevent mistakes Focus on MTTR Enhance observability Leverage resiliency patterns Embrace chaos!
  46. 46. THANKS! Matt Stine ( )@mstine http://www.mattstine.com

×