Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013

Andrew B. Gardner
agardner@momentics.com
http://linkd.in/1byADxC

 Anomalies

Data Science Fairy Tale
 Topics in Anomaly Detection
 Seizure Detection Example
 Summary


anomaly something that deviates from what is standard, normal, or expected

data cleansing
3-5% mislabeled ground truth in MNIST database
9

1

0

1

7

2

3

9

5

0

3

6

6

0

7

5

0

7

6

3

stock price
Volkswagen (VOW.DE) short squeeze, 10/28/2008

transactions

video surveillance

email
Date: Sat, 12 Aug 2012 14:39:59 UTC
From: "Iglobal"
<tryme@yourdomain.com>
To: ”Mr. Foo1" <foo1@freemail.com>
Subject: Foo1, Please Confirm Your
Position!
Hi Foo1,
Welcome To The $7 Plan. I Bring in 3 to 5
New Members In Every Day, I can show you
how easily. Its to much Fun.
Solution #1 It costs too much every month.
Not with the $7 Plan! The TOTAL cost is $7
per month. The $7.00 Plan is still holding
your position and we have people that are
waiting to place under you. That's right only

Credit Card Fraud
 Campaign Response





Traffic
Persons of Interest




Spam
Intrusion / Malware

c
o
u
n
t
e
r

f
e
i
t

h
e
a
l
t
h
c

a
r
e

c
o
n
d
i
t
i

o
n

s
e
i
z
u
r
e

s



Many names



One key (counter-intuitive) idea:
focus on the hay…

… not the needle





Machine learning (ooh)
Unsupervised*
Classification*
User
Device
Sensors

Signals
(Data)




Alerts | intervention
Online | batch

Features

Outputs

Detector
(Classifier)

Nothing is more expensive than a missed opportunity.
– H. Jackson Brown, Jr.







Advantages
Data haystacks .01%
Unusual = interesting
Models $$$
Labels $$$
…
Disadvantages?

We sell healthy, green apples!



Bob ... knows apples

common (n=13)

rare (n=1)


Bob “The 8th Dwarf”
8 Dwarf Orchards, Inc.

… sells healthy apples



… studies data science



… does “Big Apple Data”

Goal: label instances
(green vs. red)

watercore

greens


green = +1

red = -1

Feature Space

Labels



mass density (g/cm3)





reds

Training

zi

Inputs
xi

zi

yi
f :X

Y

Test Examples

watercore

Test Examples – Results

Confusion Matrix
Green (G)

not-green
(NG)

Label G

13 (TP)

4 (FP)

Label NG

1 (FN)

1 (TN)


Key idea: trade-off mislabeling each class (P vs. N)

Sensitivity

Confusion matrix
True Classes
Green (G)

TPR = TP / (TP+FN) = 13/14

not-green (NG)

Specificity
Label G

13 (TP)

4 (FP)

Label NG

1 (FN)

1 (TN)

P

N

SPC= TN / (FP+TN) = 1/5

False Positive Rate
FPR= FP / (TP+FP) = 4/17

errors on the “positive” class, Green.
errors on the “negative” class, not-green.

Idea: distance to “average” example
centroid based anomaly detection

examples
 centroid
 threshold
 anomaly

watercore





false positive
anomaly score

Trait

classic

anomaly

Sensitivity

.928

1.00

Specificity

.200

.833

Feature dependent?
Require labels?
Magic numbers?

Performance

Goal: find densest regions in feature space

Standard deviation




Tukey statistic (IQR)



watercore



Mahalanobis distance


Flexible



Density based



Robust



watercore



Tunable


How? the one-class support vector machine








x

xx

“Flood” graph


x

Pick fraction, e.g. 0.5

Mark waterlines



Note support

The One-class Support Vector Machine Does This




Outlier impact
Rich data
 Graphs

 Spatio-temporal
 Text

Use labels
 Online / latency
 Features
 Clustering & alternatives


You Are Here

APPROACHES

SAMPLE METHODS

Statistical methods
 Distance based methods
 Rule systems
 Profiling Methods
 Model based approaches












Kernel methods
PCA & subspace methods
OCNM & OCSVM
CUSUM
Nearest neighbors
Decision trees
Replicator Neural Networks
Clustering

V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey.” (2009)



Problem: Detect seizures in patients from IEEG



Solution: Use one-class SVM to train on 15-minutes of
baseline



Performance: Improve state-of-the art latency
(5 secs) to -13 secs, auto channel selection, unsupervised
technique, …



Reference: “One-Class Novelty Detection for Seizure
Analysis from Intracranial EEG,” Journal of Machine
Learning Research ‘06









Neurological disorder
Electrographic seizures
1% of population
30% non-controllable
EEG, IEEG, MRI, fMRI, PET, etc.
Cyberonics, Neuropace, NeuroVista,…

an “obvious” electrographic seizure

9 minutes

Traditional Model
Brain Electrical Activity

Novelty Model
Brain Electrical Activity

baseline

baseline

pre-seizure

seizure

other
(e.g., seizures, artifacts,
etc.)

Idea: Capture Spectral Changes

Sliding Windows

Spectrum
frequency

EEG

time



Teager Energy



Curve Length



Short-Term Energy
slide & compute

Baseline IEEG
2000

1000
0
-1000
-2000
-10

-5

0

5

0

5

P(seizure)
1

0.5

0
-10

-5

time (minutes)

Ictal IEEG
2000
1000
0
-1000
-2000
-10

-5

0

5

0

5

P(seizure)
1

0.5

0
-10

-5

time (minutes)

Nothing is more expensive than a missed opportunity.
– H. Jackson Brown, Jr.

Advantages






Data haystacks .01%
Unusual = interesting
Models $$$
Labels $$$
…

Challenges







Features FTW
Normal = ?
Deviation = ?
False positives
Adaptation
…




Questions?
Connect!

Andrew B. Gardner
agardner@momentics.com
http://linkd.in/1byADxC

V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey.” (2009)

Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Último

Último (20)

Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013

Notas del editor