Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
@nathanielvcook
WATCH ANYTHING,
WATCH EVERYTHING
ANOMALY DETECTION BY NATHANIEL COOK
@nathanielvcook
In DevOps we are good at collecting metrics
Why? Because the tooling makes it easy and it's in our
culture...
@nathanielvcook
The Problem - Scalability
● Dashboarding doesn’t scale
● Static thresholds don’t scale
● Tooling isn’t eas...
@nathanielvcookHow many anomalies does this graph have?
@nathanielvcookHow many anomalies does this graph have?
@nathanielvcook
TICK Stack
@nathanielvcook
Ways we can “watch” metrics
● With our eyes
● Static Thresholds
● Machine Learning / Statistical models
@nathanielvcook
Machine Learning 101
1. Get a set of training data
2. Create a model from the data
3. Compare new raw metr...
@nathanielvcook
Standard Deviation Model
1. Yesterday’s data at the same time of day.
2. Compute the mean and standard dev...
@nathanielvcook
Visualizing error bands. How would you express this process in code?
@nathanielvcookvar yesterday = batch
|query('SELECT mean(value), stddev(value) FROM request_latency')
.offset(1d)
.period(...
@nathanielvcook
Predictive Model
Holt-Winters: A forecasting method from the 60s.
Find anomalies by predicting a trend for...
@nathanielvcook
Predictive model for detecting unexpected data.
var training = batch
|query('SELECT max(value) FROM reques...
@nathanielvcook
Custom Model
Morgoth: An unsupervised anomaly detection framework.
Find anomalies by using a custom anomal...
@nathanielvcook
Custom algorithm
stream
|from()
.measurement('request_count')
|window()
.period(5m)
.every(5m)
@morgoth()
...
@nathanielvcook
How do you pick a model?
● This is the golden question.
● No one model that does best.
● Simple is better,...
@nathanielvcook
Properties of an Anomaly Detection Method:
● False Positive Rate (FPR)-- Boy who cried wolf
● False Negati...
@nathanielvcook
Try it out
1. Pick a metric
2. Pick a model
3. Evaluate the model on a set of historical data
4. Rate the ...
@nathanielvcook
Kapacitor makes this easy
● Select historical data and replay it against your task:
kapacitor replay-live ...
@nathanielvcook
Automate “watching”
your metrics
@nathanielvcook
Q&A / More Resources:
● Anomaly Detection 101 -- Elizabeth (Betsy) Nichols Ph.D. https://www.
youtube.com/...
Próxima SlideShare
Cargando en…5
×

Watch everything, Watch anything

1.319 visualizaciones

Publicado el

Get your feet wet doing anomaly detection with Kapacitor and the TICK stack

Publicado en: Software
  • Sé el primero en comentar

Watch everything, Watch anything

  1. 1. @nathanielvcook WATCH ANYTHING, WATCH EVERYTHING ANOMALY DETECTION BY NATHANIEL COOK
  2. 2. @nathanielvcook In DevOps we are good at collecting metrics Why? Because the tooling makes it easy and it's in our culture. Is it not hard to collect millions of unique metrics at tens of terabytes a month.
  3. 3. @nathanielvcook The Problem - Scalability ● Dashboarding doesn’t scale ● Static thresholds don’t scale ● Tooling isn’t easy enough We need to automate watching metrics, aka anomaly detection.
  4. 4. @nathanielvcookHow many anomalies does this graph have?
  5. 5. @nathanielvcookHow many anomalies does this graph have?
  6. 6. @nathanielvcook TICK Stack
  7. 7. @nathanielvcook Ways we can “watch” metrics ● With our eyes ● Static Thresholds ● Machine Learning / Statistical models
  8. 8. @nathanielvcook Machine Learning 101 1. Get a set of training data 2. Create a model from the data 3. Compare new raw metrics to the model 4. (If you are cool update the model again)
  9. 9. @nathanielvcook Standard Deviation Model 1. Yesterday’s data at the same time of day. 2. Compute the mean and standard deviation of the training data. 3. The current data is anomalous if: abs(data - mean) > (threshold * stddev) Threshold -- is the number of standard deviations to expect around the mean. Typically it’s greater than 2.
  10. 10. @nathanielvcook Visualizing error bands. How would you express this process in code?
  11. 11. @nathanielvcookvar yesterday = batch |query('SELECT mean(value), stddev(value) FROM request_latency') .offset(1d) .period(1h) .every(5m) .align() |shift(1d) var today = batch |query('SELECT mean(value) FROM request_latency') .period(1h) .every(5m) .align() yesterday |join(today) .as('yesterday', 'today') |alert() .crit(lambda: abs("today.mean" - "yesterday.mean") > (3.5 * "yesterday.stddev")) This code is TICKscript the DSL Kapacitor uses to define tasks.
  12. 12. @nathanielvcook Predictive Model Holt-Winters: A forecasting method from the 60s. Find anomalies by predicting a trend for our current data. 1. Get previous 30 days of data. 2. Using Holt-Winters forecast today day. 3. If the predicted values differ significantly from real values we found an anomaly.
  13. 13. @nathanielvcook Predictive model for detecting unexpected data. var training = batch |query('SELECT max(value) FROM request_count') .offset(1d) .groupBy(time(1d)) .period(30d) .every(1d) var predicted = training |holtWinters('max', 1, 7, 1d) |last('max') .as('value') var current = batch |query('SELECT max(value) FROM request_count') .period(1d) .every(1d) |last('max') .as('value') predicted |join(current) .as('predicted', 'current') |alert() .crit(lambda: abs("predicted.value" - "current.value") / "predicted.value" > 0.2)
  14. 14. @nathanielvcook Custom Model Morgoth: An unsupervised anomaly detection framework. Find anomalies by using a custom anomaly detection framework. 1. Not needed 2. Give each window an anomaly score via Morgoth. 3. Check the anomaly score.
  15. 15. @nathanielvcook Custom algorithm stream |from() .measurement('request_count') |window() .period(5m) .every(5m) @morgoth() .field('value') .scoreField('anomaly_score') .sigma(3.5) |alert() .crit(lambda: "anomaly_score" > 0.9)
  16. 16. @nathanielvcook How do you pick a model? ● This is the golden question. ● No one model that does best. ● Simple is better, start with something simple. ● Let data help you choose a model.
  17. 17. @nathanielvcook Properties of an Anomaly Detection Method: ● False Positive Rate (FPR)-- Boy who cried wolf ● False Negative Rate (FNR) -- Missed anomalies ● Detection Delay (DD) Ask yourself: What is the cost of each?
  18. 18. @nathanielvcook Try it out 1. Pick a metric 2. Pick a model 3. Evaluate the model on a set of historical data 4. Rate the model based on its FPR, FNR and DD values. If the model isn’t good enough try a different one or improve your existing one.
  19. 19. @nathanielvcook Kapacitor makes this easy ● Select historical data and replay it against your task: kapacitor replay-live batch -task request_count_alert -past 180d -rec-time ● Save static data sets to use as test fixtures. kapacitor record batch -task request_count_alert -past 180d ● Store anomalies back into InfluxDB to compute FPR and FNR.
  20. 20. @nathanielvcook Automate “watching” your metrics
  21. 21. @nathanielvcook Q&A / More Resources: ● Anomaly Detection 101 -- Elizabeth (Betsy) Nichols Ph.D. https://www. youtube.com/watch?v=5vrY4RbeWkM ● Kapacitor is Open Source check it out on Github https://github. com/influxdata/kapacitor ● Wikipedia is your friend. There are many good explanations of how to employ various anomaly detection techniques.

×