Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Observability for developer ( Inny So & Andrew Jones, ThoughtWorks) Kafka Summit SF 2019

1.472 visualizaciones

Publicado el

Have you ever tried to debug a production outage, when your system comprises apps your team has written, third-party apps your team runs, with logs going into some system, application performance metrics going into another system, and cloud platform metrics going somewhere else? Did you find yourself switching tabs, trying to correlate metrics with logs and alerts and finding yourself in a huge tangle? It is a nightmare. In the data world, we talk about aggregating all our data so we can derive new insights quickly, but what about our operational data? Observability is your ability to be able to ask questions of your system without having to write new code, or grab new data. When you've got an observable system, it feels like you have debugging superpowers, but can be challenging to even know where to start. If you can even convince your colleagues to start, finding the right tools can be challenging. In this talk Inny and Andrew will talk about what monitoring and logging are not sufficient anymore (if they ever were), observability basics, and demo an observability platform that you can use to start your observability journey today.

Publicado en: Tecnología
  • Inicia sesión para ver los comentarios

Observability for developer ( Inny So & Andrew Jones, ThoughtWorks) Kafka Summit SF 2019

  1. 1. Observability for Everyone Andrew Jones & Inny So @whereismytaco @mini_inny ThoughtWorks Australia Kafka Summit San Fransisco 2019
  2. 2. Buzz word?
  3. 3. A system is observable if the behaviour of the entire system can be determined by only looking at its inputs and outputs Kalman 1961 General theory of control system
  4. 4. Observable == logging && metrics
  5. 5. Let us tell you a story
  6. 6. The host The app Another host The db
  7. 7. The host The app Another host The db Nagioshost metrics host metrics
  8. 8. The host The host The host The host The host The app Another host The db Nagios Logs Logs ELK App logs App logs
  9. 9. Dev-ops responsibility split
  10. 10. The host The host The host The host The host The app Another host The db Nagios Logs Logs ELK Statsd App metricApp metric App metric App metric
  11. 11. The host The host The host The host The host The host The host The app Another host The db Nagios Logs Logs ELK Statsd New Relic
  12. 12. The host The host The host The host The host The host The host The app Another host The db Nagios Logs Logs Splunk New Relic
  13. 13. One tool to rule them all
  14. 14. The host The host The host The host The host The host The host The app Another host The db Nagios Logs Logs Splunk New Relic A host ms ms Logs
  15. 15. k8s k8s The host The host The host The host The host The host k8s The app RDS The db Logs Logs Cloudwatch New Relic k8s ms ms RDS mdb
  16. 16. k8s k8s The host The host The hostThe host The hostThe host k8s The app RDS The db Logs Logs Cloudwatch New Relic k8s ms ms RDS md Pagerduty
  17. 17. Developer experience
  18. 18. k8s k8s The host The host The hostThe host The hostThe host k8s The app RDS The db Logs Logs Cloudwatch New Relic k8s ms ms RDS md Pagerduty Prometheus Sumologic
  19. 19. Dashboard culture
  20. 20. (╯°□°)╯︵ ┻━┻
  21. 21. Sound familiar?
  22. 22. The evolution
  23. 23. The evolution Monolithic Microservice
  24. 24. The evolution Monolithic Static Infrastructure Microservice Dynamic Infrastructure
  25. 25. The evolution Monolithic Static Infrastructure Logging & 
 Metrics Microservice Dynamic Infrastructure ?
  26. 26. haveWe sufferedenough !
  27. 27. Let’s make on-call fun!
  28. 28. How do we fix it?
  29. 29. Event-Driven Architecture
  30. 30. Event-Driven Architecture Event stream
  31. 31. Event-Driven Architecture Event A APP A Event stream
  32. 32. Event-Driven Architecture Event A Event A APP B APP C APP A Event stream
  33. 33. Logging and Metrics
  34. 34. Events
  35. 35. Customer with id c892745 login our system at 10:15:31 on 7/7/2019 it took our system 300ms to return a 401 response
  36. 36. Event Vs Metrics Event { “method”: “POST”, “request_time_ms”’: 300, “endpoint”: “/login’’, “status”: 401, “customer_id”: “c892745”, “api_version”: 1.1.2, “build_number”: 125, “env”: prod, } Metrics { timer_data: { ‘stat.timer’: { mean: 250 median: 250 sum: 300 upper: 300 …. } Tags:[ “c892745”, “401”, “POST” } }
  37. 37. Event Vs Logs Event { “method”: “POST”, “request_time_ms”’: 300, “endpoint”: “/login’’, “status”: 401, “customer_id”: “c892745”, “api_version”: 1.1.2, “build_number”: 125, “env”: prod, } Logs 2019/7/7 10:15:20 [INFO] Starting application 2019/7/7 10:15:22 [INFO] Reading Configuration 2019/7/7 10:15:23 [WARNING] undeclared region, default to Australia 2019/7/7 10:15:25 [INFO] /login success 2019/7/7 10:15:26 [WARNING] Incorrect type conversation, will use default 2019/7/7 10:15:27 [INFO] /getProductList/ 87923407/ 2019/7/7 10:15:31 [INFO] /login error for customer c892745
  38. 38. Event = (logging + metric) * 100000
  39. 39. Event-Driven Architecture Event A APP A Event stream Event A ?
  40. 40. All the events
  41. 41. What do we need?
  42. 42. We need Kafka!
  43. 43. Kafka can give us: Evolvablity Scalability
 Reliability Integrity
  44. 44. Alerts? Dashboard?
  45. 45. Event-Driven Architecture Event A APP A Kafka
  46. 46. Event-Driven Architecture Event A APP A Kafka Event A Consumer
  47. 47. Event-Driven Architecture Event A APP A Kafka Event A Consumer Alert B
  48. 48. Event-Driven Architecture Alert B Event A APP A Kafka Event A Consumer Alerts Alert B
  49. 49. Event-Driven Architecture Alert B Event A APP A Kafka Event A Consumer Email Alerts Dashboard ……. Alert B
  50. 50. Benefit of alerts as code
  51. 51. Benefit of alerts as code
  52. 52. Benefit of alerts as code Its Just code!
  53. 53. Do we still need dashboard?
  54. 54. Instrument everything 
 
 (or else)
  55. 55. k8s k8s The host The host The hostThe host The hostThe host k8s The app RDS The db Logs Logs Cloudwatch New Relic k8s ms ms RDS md Pagerduty Prometheus Sumologic
  56. 56. k8s k8s The host The host The hostThe host The hostThe host k8s The app RDS The db Logs Logs k8s ms ms RDS md
  57. 57. k8s k8s The host The host The hostThe host The hostThe host k8s The app RDS The db Logs Logs k8s ms ms RDS md
  58. 58. k8s k8s The host The host The hostThe host The hostThe host k8s The app RDS The db Logs Logs k8s ms ms RDS md Kafka
  59. 59. k8s k8s The host The host The hostThe host The hostThe host k8s The app RDS The db Logs Logs k8s ms ms RDS md KafkaEvents
  60. 60. k8s k8s The host The host The hostThe host The hostThe host k8s The app RDS The db Logs Logs k8s ms ms RDS md Kafka Datalake Events BigQuery
  61. 61. k8s k8s The host The host The hostThe host The hostThe host k8s The app RDS The db Logs Logs k8s ms ms RDS md Kafka Datalake Consumer Events BigQuery
  62. 62. k8s k8s The host The host The hostThe host The hostThe host k8s The app RDS The db Logs Logs k8s ms ms RDS md Pagerduty Kafka Datalake Consumer Events BigQuery
  63. 63. 500ms request time?!
  64. 64. 500ms request time?!
  65. 65. 500ms request time?! 450ms request time?!
  66. 66. Timing looks ok to database!
  67. 67. Timing looks ok to database!
  68. 68. Timing looks ok to database! New Version!
  69. 69. Everyone can be a superhero
  70. 70. Observability for everyone
  71. 71. Andrew Jones & Inny So @whereismytaco @mini_inny Thank you

×