Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

You cant Test everything, but you should monitor it

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 30 Anuncio

You cant Test everything, but you should monitor it

Descargar para leer sin conexión

We had an incident which did occur in our warehouse at KRUU. The downloading of the photos was very slow from one to the other day - well we thought that the problem started on this day.

Actually we did notice this very late and the problem started two years ago but we did notice this very late due to the reduced rentals because of Covid19.

This will never happen again thanks to our new metrics and alerting powered by the OpenSearch!

We had an incident which did occur in our warehouse at KRUU. The downloading of the photos was very slow from one to the other day - well we thought that the problem started on this day.

Actually we did notice this very late and the problem started two years ago but we did notice this very late due to the reduced rentals because of Covid19.

This will never happen again thanks to our new metrics and alerting powered by the OpenSearch!

Anuncio
Anuncio

Más Contenido Relacionado

Más reciente (20)

Anuncio

You cant Test everything, but you should monitor it

  1. 1. Hi, I am Michi! Head of Code at @michilehr
  2. 2. YOU CAN’T TEST EVERYTHING BUT YOU SHOULD MONITOR IT! 11. October 2022 @ Nerd-BBQ
  3. 3. "As Europe's leading photo booth provider, we have made it our mission to help our brides and grooms with their complete journey to their dream wedding. This is something we work tirelessly on with our team." What is doing? Philipp Schreiber - Co-Founder, KRUU.com
  4. 4. Photo Booth Cycle
  5. 5. The Incident
  6. 6. Good ~12 MB/s
  7. 7. 0.95 MB/s Bad
  8. 8. 1. When did it start? 2. Why did it happen? 3. How to prevent? 4. How to notice early? Investigate
  9. 9. When did it start? We had data in our Slack Channel, but…
  10. 10. 1. Write a script to extract the data as CSV 2. Import data to MySQL 3. Write query to aggregate by day 4. Create nice graph
  11. 11. Started long time ago…
  12. 12. What happened? Network configuration error
  13. 13. How to prevent? No idea. Things like this happen
  14. 14. How to notice early?
  15. 15. How to notice early?
  16. 16. Alerts!
  17. 17. Query
  18. 18. Trigger
  19. 19. Notification
  20. 20. ● 404 alert by threshold ● Auth failure alert by threshold to detect brute force ● … What next?
  21. 21. Thank you for your time! Questions? Feedback? Notes?

×