Más contenido relacionado La actualidad más candente (20) Similar a Anomaly Detection - New York Machine Learning (20) Anomaly Detection - New York Machine Learning1. © 2014 MapR Technologies 1
Anomaly Detection
How to Find What You Didn’t
Know to Look For
© MapR Technologies, confidential
October 14, 2014
2. © 2014 MapR Technologies 2
Anomaly Detection:
How To Find What You Didn’t Know to Look For
Ted Dunning, Chief Applications Architect MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
Ellen Friedman, Consultant and Commentator
Email ellenf@apache.org
Twitter @Ellen_Friedman
3. A New Look at Anomaly Detection
by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
e-book available courtesy of MapR
http://bit.ly/1jQ9QuL
© 2014 MapR Technologies 3
4. Practical Machine Learning series (O’Reilly)
• Machine learning is becoming mainstream
• Need pragmatic approaches that take into account real world
business settings:
– Time to value
– Limited resources
– Availability of data
– Expertise and cost of team to develop and to maintain system
• Look for approaches with big benefits for the effort expended
© 2014 MapR Technologies 4
6. © 2014 MapR Technologies 6
Who Needs Anomaly Detection?
Utility providers using
smart meters
7. © 2014 MapR Technologies 7
Who Needs Anomaly Detection?
Feedback from
manufacturing assembly
lines
8. © 2014 MapR Technologies 8
Who Needs Anomaly Detection?
Monitoring data traffic on
communication networks
9. © 2014 MapR Technologies 9
What is Anomaly Detection?
• The goal is to discover rare events
– especially those that shouldn’t have happened
• Find a problem before other people see it
– especially before it causes a problem for customers
• Why is this a challenge?
– I don’t know what an anomaly looks like (yet)
11. © 2014 MapR Technologies 11
Spot the Anomaly
Looks pretty
anomalous
to me
12. © 2014 MapR Technologies 12
Spot the Anomaly
Will the real anomaly
please stand up?
13. © 2014 MapR Technologies 13
Basic idea:
Find “normal” first
14. © 2014 MapR Technologies 14
Steps in Anomaly Detection
• Build a model: Collect and process data for training a model
• Use the machine learning model to determine what is the normal
pattern
• Decide how far away from this normal pattern you’ll consider to
be anomalous
• Use the AD model to detect anomalies in new data
– Methods such as clustering for discovery can be helpful
15. How hard is it to set an alert for anomalies?
Grey data is from normal events; x’s are anomalies.
Where would you set the threshold?
© 2014 MapR Technologies 15
16. © 2014 MapR Technologies 16
Basic idea:
Set adaptive thresholds
17. © 2014 MapR Technologies 17
What Are We Really Doing
• We want action when something breaks
(dies/falls over/otherwise gets in trouble)
• But action is expensive
• So we don’t want too many false alarms
• And we don’t want too many false negatives
• What’s the right threshold to set for alerts?
– We need to trade off costs
19. © 2014 MapR Technologies 19
A Second Look
99.9%-ile
21. © 2014 MapR Technologies 21
How Hard Can it Be?
Online
Summarizer
x > t ? Alarm !
99.9%-ile
t
x
22. © 2014 MapR Technologies 22
Detecting Anomalies in Sporadic Events
0.0 0.2 0.4 0.6 0.8 1.0
0 5000 10000 15000 20000
pnorm(centroids[order(centroids)])
counts[order(centroids)]
23. © 2014 MapR Technologies 23
Using t-Digest
• Apache Mahout uses t-digest as an on-line percentile estimator
– very high accuracy for extreme tails
– new in version Mahout v 0.9
• t-digest also available elsewhere
– in streamlib (open source library on github)
– standalone (github and Maven Central)
• What’s the big deal with anomaly detection?
• This looks like a solved problem
24. © 2014 MapR Technologies 24
Already Done? Etsy Skyline?
25. © 2014 MapR Technologies 25
What About This?
0 5 10 15
offset + noise + pulse1 + pulse2
−2 0 2 4 6 8 10
A
B
26. © 2014 MapR Technologies 26
Model Delta Anomaly Detection
+ δ
Online
Summarizer
δ > t ?
t
99.9%-ile
Alarm !
Model
-
27. The Real Inside Scoop
• The model-delta anomaly detector is really just a sum of random
© 2014 MapR Technologies 27
variables
– the model we know about already
– and a normally distributed error
• The output (delta) is (roughly) the log probability of the sum
distribution (really δ2)
• Thinking about probability distributions is good
• But how do you handle AD in systems with sporadic events?
28. © 2014 MapR Technologies 28
Spot the Anomaly
Anomaly?
30. © 2014 MapR Technologies 30
Where’s Waldo?
This is the real
anomaly
31. © 2014 MapR Technologies 31
Normal Isn’t Just Normal
• What we want is a model of what is normal
• What doesn’t fit the model is the anomaly
• For simple signals, the model can be simple …
x ~ N(0,e )
• The real world is rarely so accommodating
47. © 2014 MapR Technologies 47
Windows on the World
• The set of windowed signals is a nice model of our original signal
• Clustering can find the prototypes
– Fancier techniques available using sparse coding
• The result is a dictionary of shapes
• New signals can be encoded by shifting, scaling and adding
shapes from the dictionary
48. © 2014 MapR Technologies 48
Most Common Shapes (for EKG)
49. < 1 bit / sample
© 2014 MapR Technologies 49
Reconstructed signal
Original
signal
Reconstructed
signal
Reconstruction
error
50. © 2014 MapR Technologies 50
An Anomaly
Original technique for finding
1-d anomaly works against
reconstruction error
51. © 2014 MapR Technologies 51
Close-up of anomaly
Not what you want your
heart to do.
And not what the model
expects it to do.
52. © 2014 MapR Technologies 52
A Different Kind of Anomaly
53. © 2014 MapR Technologies 53
Model Delta Anomaly Detection
+ δ
Online
Summarizer
δ > t ?
t
99.9%-ile
Alarm !
Model
-
54. © 2014 MapR Technologies 54
The Real Inside Scoop
• The model-delta anomaly detector is really just a sum of random
variables
– the model we know about already
– and a normally distributed error
• The output (delta) is (roughly) the log probability of the sum
distribution (really δ2)
• Thinking about probability distributions is good
56. Sporadic Web Traffic to an e-Business Site
© 2014 MapR Technologies 56
It’s important to know if traffic is stopped or
delayed because of a problem…
But visits to site normally come at
varying intervals.
How long after the last event
should you begin to worry?
57. Sporadic Web Traffic to an e-Business Site
© 2014 MapR Technologies 57
It’s important to know if traffic is stopped or
delayed because of a problem…
But visits to site normally come at
varying intervals.
And how do you let your CEO
sleep through the night?
58. © 2014 MapR Technologies 58
Basic idea:
Time interval between events is how
to convert to something useful you
can measure
59. Sporadic Events: Finding Normal and Anomalous Patterns
• Time between intervals is much more usable than absolute times
© 2014 MapR Technologies 59
• Counts don’t link as directly to probability models
• Time interval is log ρ
• This is a big deal
60. © 2014 MapR Technologies 60
Event Stream (timing)
• Events of various types arrive at irregular intervals
– we can assume Poisson distribution
• The key question is whether frequency has changed relative to
expected values
– This shows up as a change in interval
• Want alert as soon as possible
61. © 2014 MapR Technologies 61
Converting Event Times to Anomaly
99.9%-ile
99.99%-ile
62. But in the real world, event
© 2014 MapR Technologies 62
rates often change
64. © 2014 MapR Technologies 64
Model-Scaled Intervals Solve the Problem
65. © 2014 MapR Technologies 65
Model Delta Anomaly Detection
+ δ
Online
Summarizer
δ > t ?
t
99.9%-ile
Alarm !
Model
-
log p
66. © 2014 MapR Technologies 66
Detecting Anomalies in Sporadic Events
Incoming
events
99.97%-ile
Alarm
Δn
Rate
predictor
Rate
history
t-digest
δ> t
t i δ λ(t i - t i - n)
λ
t
67. © 2014 MapR Technologies 67
Detecting Anomalies in Sporadic Events
Incoming
events
99.97%-ile
Alarm
Δn
Rate
predictor
Rate
history
t-digest
δ> t
t i δ λ(t i - t i - n)
λ
t
68. © 2014 MapR Technologies 68
Slipped Week: Simple Rate Predictor
Nov 02 Nov 07 Nov 12 Nov 17 Nov 22 Nov 27 Dec 02
0 100 200 300 400 500
Main Page Traffic
Date
Hits (x 1000)
A B C D
69. © 2014 MapR Technologies 69
Poisson Distribution
• Time between events is exponentially distributed
Dt ~ le-lt
• This means that long delays are exponentially rare
P(Dt > T) = e-lT
-logP(Dt > T) = lT
• If we know λ we can select a good threshold
– or we can pick a threshold empirically
70. © 2014 MapR Technologies 70
Seasonality Poses a Challenge
Nov 17 Nov 27 Dec 07 Dec 17 Dec 27
0 2 4 6 8
Christmas Traffic
Date
Hits / 1000
71. © 2014 MapR Technologies 71
Something more is needed …
Nov 17 Nov 27 Dec 07 Dec 17 Dec 27
0 2 4 6 8
Christmas Traffic
Date
Hits / 1000
72. © 2014 MapR Technologies 72
We need a better rate predictor…
Incoming
events
99.97%-ile
Alarm
Δn
Rate
predictor
Rate
history
t-digest
δ> t
t i δ λ(t i - t i - n)
λ
t
73. © 2014 MapR Technologies 73
A New Rate Predictor for Sporadic Events
74. Improved Prediction with Adaptive Modeling
© 2014 MapR Technologies 74
Dec 17 Dec 19 Dec 21 Dec 23 Dec 25 Dec 27 Dec 29
0 2 4 6 8
Christmas Prediction
Date
Hits (x 1000)
75. Anomaly Detection + Classification Useful Pair
© 2014 MapR Technologies 75
• Use the AD model to detect anomalies in new data
– Methods such as clustering for discovery can be helpful
• Once you have well-defined models in your system, you may
also want to use classification to tag those
• Continue to use the AD model to find new anomalies
76. © 2014 MapR Technologies 76
Recap (out of order)
• Anomaly detection is best done with a probability model
• -log p is a good way to convert to anomaly measure
• Adaptive quantile estimation (t-digest) works for auto-setting
thresholds
77. © 2014 MapR Technologies 77
Recap
• Different systems require different models
• Continuous time-series
– sparse coding to build signal model
• Events in time
– rate model base on variable rate Poisson
– segregated rate model
• Events with labels
– language modeling
– hidden Markov models
78. © 2014 MapR Technologies 78
Why Use Anomaly Detection?
79. © 2014 MapR Technologies 79
Keep in mind…
• Model normal, then find
anomalies
-
• t-digest for adaptive threshold
• Probabilistic models for
complex patterns
0 5 10 15
−2 0 2 4 6 8 10
offset + noise + pulse1 + pulse2
A
B
80. Dec 17 Dec 19 Dec 21 Dec 23 Dec 25 Dec 27 Dec 29
© 2014 MapR Technologies 80
0 2 4 6 8
Christmas Prediction
Date
Hits (x 1000)
Keep in mind…
• Time intervals are key for
sporadic events
• Complex time shift to predict
rate with seasonality
• Sequence of events reveals
phishing attack
81. A New Look at Anomaly Detection
by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
e-book available courtesy of MapR
http://bit.ly/1jQ9QuL
© 2014 MapR Technologies 81
82. Coming in October: Time Series Databases
by Ted Dunning and Ellen Friedman © Oct 2014 (published by O’Reilly)
© 2014 MapR Technologies 82
84. © 2014 MapR Technologies 85
© MapR Technologies, confidential
Notas del editor
Talk track: 2nd in series, first was on how to build a simple recommender. This one on anomaly detection is being sold by O’Reilly on Amazon,
but for a limited time MapR is giving away the e-book for free. Here’s the link where you can register to get one. Talk track: ELLEN New ways to do it that take into account real world business goals, realistic resources, new types of data and best time to value… Talk track: mistakes affect huge numbers of people…. Talk track: … even more so on auto mated assembly line STILL ELLEN Talk track: Say “Build a model” Talk track: Say “Build a model; model what is normal. Then determine what is not… ELLEN/TRANSITION SLIDE TED Ellen talking point: Ted authored it and contributed to open-source; others are now contributing adjustments, used in several places Talk track: Now where do you put the threshold? Adaptive model is the solution… Ellen: Talk track: We talk about this in the book with the EKG example where the normal pattern is fairly regular but very complex shape.. Ellen comment for transition: Talk track: How handle that and in what situations does that matter? ELLEN: set up TO TED/ CEO story Talk track: This is what it looks like to have events such as those on website that come in at randomized times (people come when they want to) but the underlying average rate in this case is constant, in other words, a fairly steady stream of traffic.
This looks at lot like the first signal we talked about: a randomized but even signal… We can use t-digest on it to set thresholds, everything works just grand. (Like radio activity Geiger counter clicks) Talk track: (Describe figure) Horizontal axis is days, with noon in the middle of each day. The faint shadow shows the underlying rate of events.The vertical axis is the time interval between events. Notice that as the rate of events is high, the time interval between events is small, but when the rate of events slows down, the time between events is much larger.
Ellen: For this reason, we cannot set a simple threshold: if set low in day, we have an alert every night even though we expect a longer interval then. If we set it too high, we miss the real problems when traffic really is abnormally delayed or stopped altogether. What can you do to solve this?
Ted: We build a model, multiple the modelled rate x the interval, we get a number we can threshold accurately. Talk track: (Description of graph) Shadow Ted: this was figure 5-2 in the book Talk track:
You need a rate predictor
Ellen: sometimes simple is good enough Ted: This was figure 5.3 Talk track: This slide is here for reference when you download the slides Ted: This was figure 5.4 Ted: This was figure 5.4 Ted: this was figure 5-2 in the book We can look at yesterday and day before but need to look at the shape from previous days … but look at today for whether traffic is scaling Ted: This was figure 5.4