Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Automatically Detecting
Anomalies and Outliers in
Real-Time
Homin Lee, Data Scientist
Outline
● Monitoring
● Alerting
● Outlier vs. Anomaly Detection
● Outlier Detection Algorithms
● Anomaly Detection Algorit...
Monitor Everything
Monitor Everything
Datadog gathers performance data from all your application components.
Monitor Everything
Monitor Everything
Monitor Everything?
Alerting
Alerting?
Alerting?
Outlier and Anomaly Detection
Outlier
Detection
Outlier Detection
Outlier Detection
Outlier Detection
Outlier Detection Algorithms
MAD
median absolute deviation
DBSCAN
density-based spatial clustering of applications with no...
Robust Outlier Detection Algorithms
Median Absolute Deviation
MAD(D) = median( { |di
- median(D)| } )
Median Absolute Deviation
MAD(D) = median( { |di
- median(D)| } )
D = { 1, 2, 3, 4, 5, 6, 100 }
Median Absolute Deviation
MAD(D) = median( { |di
- median(D)| } )
D = { 1, 2, 3, 4, 5, 6, 100 }
median = 4
Median Absolute Deviation
MAD(D) = median( { |di
- median(D)| } )
D = { 1, 2, 3, 4, 5, 6, 100 }
median = 4
deviations = { ...
Median Absolute Deviation
MAD(D) = median( { |di
- median(D)| } )
D = { 1, 2, 3, 4, 5, 6, 100 }
median = 4
deviations = { ...
Median Absolute Deviation
MAD(D) = median( { |di
- median(D)| } )
D = { 1, 2, 3, 4, 5, 6, 100 }
median = 4
deviations = { ...
Median Absolute Deviation
MAD(D) = median( { |di
- median(D)| } )
D = { 1, 2, 3, 4, 5, 6, 100 }
median = 4
deviations = { ...
Median Absolute Deviation
Parameters: Tolerance, Pct
} tol. = 3.0
DBSCAN
DBSCAN
Parameters:
epsilon, min_samples
DBSCAN
1 dd/2d/4 3d/4
DBSCAN
1 dd/2d/4 3d/4
DBSCAN
1 dd/2d/4 3d/4
~ median(dist from median series) × tolerance
MAD or DBSCAN?
MAD or DBSCAN?
Some subtleties
Some subtleties
Some subtleties
Anomaly
Detection
An Investigation
past 30 minutes
An Investigation
past 30 minutes
past day
An Investigation
past day
past week
past 30 minutes
An Investigation
past day
past week
past 30 minutes
past 5 weeks
An Investigation
past day
ANOMALY
past 30 minutes
past week
past 5 weeks
Anomalies
A time series point is an anomaly if:
● Given the past points in the series ( ), the point
in question ( ) is un...
Anomalies
A time series point is an anomaly if:
● Given the past points in the series ( ), the point
in question ( ) is un...
Our Approach
1. Extract as much signal as we can from the time series.
2. Use robust statistical measures when creating th...
What’s Normal?
What’s Normal?
What’s Normal?
What’s Normal?
Past Performance...
past 30 minutes past day
past week past 5 weeks
Past Performance...
Decomposition
Decomposition
Decomposition
Decomposition
Decomposition
Autocorrelation
Signal vs. Noise
Signal vs. Noise
Signal vs. Noise vs. Signal
Real-time Anomaly Detection
Anomaly Detection
Robust Anomaly Detection
Robust Anomaly Detection
Robust Anomaly Detection
Robust Anomaly Detection
Alerting
Alerting
Recap
● Extract as much signal as you can.
● Use robust statistical measures.
● Alert judiciously.
● Don’t over-optimize.
Anomalies or Noise?
Thanks!
Appendix
DASHBOARDS
Build Real-Time Interactive Dashboards
CORRELATION
Search And Correlate Metrics And Events
See It All In One Pl...
COLLABORATION
Share What You Saw, Write What You Did
METRIC ALERTS
Get Alerted On Critical Issues
DEVELOPER API
Instrument...
Flexible Pricing
To Match Your Dynamic Infrastructure.
Free
Up to 5 Hosts
1 Day retention
Custom metrics and events
Discus...
Próxima SlideShare
Cargando en…5
×

Dataday Texas 2016 - Datadog

Data scientist Homin Lee talks about automatically detecing infrastructure anomalies and outliers in real-time with Datadog at Dataday Texas 2016.

  • Inicia sesión para ver los comentarios

Dataday Texas 2016 - Datadog

  1. 1. Automatically Detecting Anomalies and Outliers in Real-Time Homin Lee, Data Scientist
  2. 2. Outline ● Monitoring ● Alerting ● Outlier vs. Anomaly Detection ● Outlier Detection Algorithms ● Anomaly Detection Algorithms
  3. 3. Monitor Everything
  4. 4. Monitor Everything Datadog gathers performance data from all your application components.
  5. 5. Monitor Everything
  6. 6. Monitor Everything
  7. 7. Monitor Everything?
  8. 8. Alerting
  9. 9. Alerting?
  10. 10. Alerting?
  11. 11. Outlier and Anomaly Detection
  12. 12. Outlier Detection
  13. 13. Outlier Detection
  14. 14. Outlier Detection
  15. 15. Outlier Detection
  16. 16. Outlier Detection Algorithms MAD median absolute deviation DBSCAN density-based spatial clustering of applications with noise
  17. 17. Robust Outlier Detection Algorithms
  18. 18. Median Absolute Deviation MAD(D) = median( { |di - median(D)| } )
  19. 19. Median Absolute Deviation MAD(D) = median( { |di - median(D)| } ) D = { 1, 2, 3, 4, 5, 6, 100 }
  20. 20. Median Absolute Deviation MAD(D) = median( { |di - median(D)| } ) D = { 1, 2, 3, 4, 5, 6, 100 } median = 4
  21. 21. Median Absolute Deviation MAD(D) = median( { |di - median(D)| } ) D = { 1, 2, 3, 4, 5, 6, 100 } median = 4 deviations = { -3, -2, -1, 0, 1, 2, 96 }
  22. 22. Median Absolute Deviation MAD(D) = median( { |di - median(D)| } ) D = { 1, 2, 3, 4, 5, 6, 100 } median = 4 deviations = { -3, -2, -1, 0, 1, 2, 96 } abs deviations = { 0, 1, 1, 2, 2, 3, 96 }
  23. 23. Median Absolute Deviation MAD(D) = median( { |di - median(D)| } ) D = { 1, 2, 3, 4, 5, 6, 100 } median = 4 deviations = { -3, -2, -1, 0, 1, 2, 96 } abs deviations = { 0, 1, 1, 2, 2, 3, 96 } MAD = 2
  24. 24. Median Absolute Deviation MAD(D) = median( { |di - median(D)| } ) D = { 1, 2, 3, 4, 5, 6, 100 } median = 4 deviations = { -3, -2, -1, 0, 1, 2, 96 } abs deviations = { 0, 1, 1, 2, 2, 3, 96 } MAD = 2 (std dev = 33.8)
  25. 25. Median Absolute Deviation Parameters: Tolerance, Pct } tol. = 3.0
  26. 26. DBSCAN
  27. 27. DBSCAN Parameters: epsilon, min_samples
  28. 28. DBSCAN 1 dd/2d/4 3d/4
  29. 29. DBSCAN 1 dd/2d/4 3d/4
  30. 30. DBSCAN 1 dd/2d/4 3d/4 ~ median(dist from median series) × tolerance
  31. 31. MAD or DBSCAN?
  32. 32. MAD or DBSCAN?
  33. 33. Some subtleties
  34. 34. Some subtleties
  35. 35. Some subtleties
  36. 36. Anomaly Detection
  37. 37. An Investigation past 30 minutes
  38. 38. An Investigation past 30 minutes past day
  39. 39. An Investigation past day past week past 30 minutes
  40. 40. An Investigation past day past week past 30 minutes past 5 weeks
  41. 41. An Investigation past day ANOMALY past 30 minutes past week past 5 weeks
  42. 42. Anomalies A time series point is an anomaly if: ● Given the past points in the series ( ), the point in question ( ) is unlikely given your model of the past;
  43. 43. Anomalies A time series point is an anomaly if: ● Given the past points in the series ( ), the point in question ( ) is unlikely given your model of the past; and you should alert on a set of anomalies if: ● they are a symptom of an issue you care about ( ).
  44. 44. Our Approach 1. Extract as much signal as we can from the time series. 2. Use robust statistical measures when creating the model. 3. Give the user control over when they get alerted.
  45. 45. What’s Normal?
  46. 46. What’s Normal?
  47. 47. What’s Normal?
  48. 48. What’s Normal?
  49. 49. Past Performance... past 30 minutes past day past week past 5 weeks
  50. 50. Past Performance...
  51. 51. Decomposition
  52. 52. Decomposition
  53. 53. Decomposition
  54. 54. Decomposition
  55. 55. Decomposition
  56. 56. Autocorrelation
  57. 57. Signal vs. Noise
  58. 58. Signal vs. Noise
  59. 59. Signal vs. Noise vs. Signal
  60. 60. Real-time Anomaly Detection
  61. 61. Anomaly Detection
  62. 62. Robust Anomaly Detection
  63. 63. Robust Anomaly Detection
  64. 64. Robust Anomaly Detection
  65. 65. Robust Anomaly Detection
  66. 66. Alerting
  67. 67. Alerting
  68. 68. Recap ● Extract as much signal as you can. ● Use robust statistical measures. ● Alert judiciously. ● Don’t over-optimize.
  69. 69. Anomalies or Noise?
  70. 70. Thanks!
  71. 71. Appendix
  72. 72. DASHBOARDS Build Real-Time Interactive Dashboards CORRELATION Search And Correlate Metrics And Events See It All In One Place Your Servers, Your Clouds, Your Metrics, Your Apps, Your team. Together.
  73. 73. COLLABORATION Share What You Saw, Write What You Did METRIC ALERTS Get Alerted On Critical Issues DEVELOPER API Instrument Your Apps, Write New Integrations See It All In One Place Your Servers, Your Clouds, Your Metrics, Your Apps, Your team. Together.
  74. 74. Flexible Pricing To Match Your Dynamic Infrastructure. Free Up to 5 Hosts 1 Day retention Custom metrics and events Discussion group supported Pro Up to 500 Hosts $15 Per Host / Month 13 Month retention Custom metrics and events Metric alerts* Email supported Enterprise 500+ Hosts Contact us for pricing: +1 866 329 4466 sales@datadoghq.com Customized retention Custom metrics and events Metric alerts* Email and phone supported

×