SlideShare una empresa de Scribd logo
1 de 30
Andrew B. Gardner
agardner@momentics.com
http://linkd.in/1byADxC
 Anomalies

Data Science Fairy Tale
 Topics in Anomaly Detection
 Seizure Detection Example
 Summary

anomaly something that deviates from what is standard, normal, or expected
data cleansing
3-5% mislabeled ground truth in MNIST database
9

1

0

1

7

2

3

9

5

0

3

6

6

0

7

5

0

7

6

3

stock price
Volkswagen (VOW.DE) short squeeze, 10/28/2008
transactions

video surveillance

email
Date: Sat, 12 Aug 2012 14:39:59 UTC
From: "Iglobal"
<tryme@yourdomain.com>
To: ”Mr. Foo1" <foo1@freemail.com>
Subject: Foo1, Please Confirm Your
Position!
Hi Foo1,
Welcome To The $7 Plan. I Bring in 3 to 5
New Members In Every Day, I can show you
how easily. Its to much Fun.
Solution #1 It costs too much every month.
Not with the $7 Plan! The TOTAL cost is $7
per month. The $7.00 Plan is still holding
your position and we have people that are
waiting to place under you. That's right only

Credit Card Fraud
 Campaign Response





Traffic
Persons of Interest




Spam
Intrusion / Malware
c
o
u
n
t
e
r

f
e
i
t

h
e
a
l
t
h
c

a
r
e

c
o
n
d
i
t
i

o
n

s
e
i
z
u
r
e

s


Many names



One key (counter-intuitive) idea:
focus on the hay…

… not the needle




Machine learning (ooh)
Unsupervised*
Classification*
User
Device
Sensors

Signals
(Data)




Alerts | intervention
Online | batch

Features

Outputs

Detector
(Classifier)
Nothing is more expensive than a missed opportunity.
– H. Jackson Brown, Jr.







Advantages
Data haystacks .01%
Unusual = interesting
Models $$$
Labels $$$
…
Disadvantages?
We sell healthy, green apples!



Bob ... knows apples

common (n=13)

rare (n=1)


Bob “The 8th Dwarf”
8 Dwarf Orchards, Inc.

… sells healthy apples



… studies data science



… does “Big Apple Data”
Goal: label instances
(green vs. red)

watercore

greens


green = +1

red = -1

Feature Space

Labels



mass density (g/cm3)





reds

Training

zi

Inputs
xi

zi

yi
f :X

Y
Test Examples

watercore

Test Examples – Results

Confusion Matrix
Green (G)

not-green
(NG)

Label G

13 (TP)

4 (FP)

Label NG

1 (FN)

1 (TN)

mass density (g/cm3)
Key idea: trade-off mislabeling each class (P vs. N)

Sensitivity

Confusion matrix
True Classes
Green (G)

TPR = TP / (TP+FN) = 13/14

not-green (NG)

Specificity
Label G

13 (TP)

4 (FP)

Label NG

1 (FN)

1 (TN)

P

N

SPC= TN / (FP+TN) = 1/5

False Positive Rate
FPR= FP / (TP+FP) = 4/17

errors on the “positive” class, Green.
errors on the “negative” class, not-green.
Idea: distance to “average” example
centroid based anomaly detection

examples
 centroid
 threshold
 anomaly

watercore




mass density (g/cm3)

false positive
anomaly score
Trait

classic

anomaly

Sensitivity

.928

1.00

Specificity

.200

.833

Feature dependent?
Require labels?
Magic numbers?

Performance
Goal: find densest regions in feature space

Standard deviation



mass density (g/cm3)

Tukey statistic (IQR)



watercore



Mahalanobis distance
Goal: find densest regions in feature space

Flexible



Density based



Robust



watercore



Tunable

mass density (g/cm3)

How? the one-class support vector machine
Goal: find densest regions in feature space







x

xx

“Flood” graph


x

Pick fraction, e.g. 0.5

Mark waterlines



Note support

The One-class Support Vector Machine Does This



Outlier impact
Rich data
 Graphs

 Spatio-temporal
 Text

Use labels
 Online / latency
 Features
 Clustering & alternatives


You Are Here
APPROACHES

SAMPLE METHODS

Statistical methods
 Distance based methods
 Rule systems
 Profiling Methods
 Model based approaches












Kernel methods
PCA & subspace methods
OCNM & OCSVM
CUSUM
Nearest neighbors
Decision trees
Replicator Neural Networks
Clustering

V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey.” (2009)


Problem: Detect seizures in patients from IEEG



Solution: Use one-class SVM to train on 15-minutes of
baseline



Performance: Improve state-of-the art latency
(5 secs) to -13 secs, auto channel selection, unsupervised
technique, …



Reference: “One-Class Novelty Detection for Seizure
Analysis from Intracranial EEG,” Journal of Machine
Learning Research ‘06








Neurological disorder
Electrographic seizures
1% of population
30% non-controllable
EEG, IEEG, MRI, fMRI, PET, etc.
Cyberonics, Neuropace, NeuroVista,…
an “obvious” electrographic seizure

9 minutes
Traditional Model
Brain Electrical Activity

Novelty Model
Brain Electrical Activity

baseline

baseline

pre-seizure

seizure

other
(e.g., seizures, artifacts,
etc.)
Idea: Capture Spectral Changes

Sliding Windows

Spectrum
frequency

EEG

time



Teager Energy



Curve Length



Short-Term Energy
slide & compute
Baseline IEEG
2000

1000
0
-1000
-2000
-10

-5

0

5

0

5

P(seizure)
1

0.5

0
-10

-5

time (minutes)
Ictal IEEG
2000
1000
0
-1000
-2000
-10

-5

0

5

0

5

P(seizure)
1

0.5

0
-10

-5

time (minutes)
Nothing is more expensive than a missed opportunity.
– H. Jackson Brown, Jr.

Advantages






Data haystacks .01%
Unusual = interesting
Models $$$
Labels $$$
…

Challenges







Features FTW
Normal = ?
Deviation = ?
False positives
Adaptation
…



Questions?
Connect!

Andrew B. Gardner
agardner@momentics.com
http://linkd.in/1byADxC

V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey.” (2009)

Más contenido relacionado

Destacado

Artificial neural network for misuse detection
Artificial neural network for misuse detectionArtificial neural network for misuse detection
Artificial neural network for misuse detectionSajan Sahu
 
masters seminar_Detection
masters seminar_Detectionmasters seminar_Detection
masters seminar_Detectionashek1520
 
Robust techniques for background subtraction in urban
Robust techniques for background subtraction in urbanRobust techniques for background subtraction in urban
Robust techniques for background subtraction in urbantaylor_1313
 
Network anomaly detection based on statistical
Network anomaly detection based on statistical Network anomaly detection based on statistical
Network anomaly detection based on statistical jimmy9090909
 
sgp 30 slides
sgp 30 slidessgp 30 slides
sgp 30 slidesBrianne
 
Intrusion Detection Presentation
Intrusion Detection PresentationIntrusion Detection Presentation
Intrusion Detection PresentationMustafash79
 
Ascites in domestic animals
Ascites in domestic animalsAscites in domestic animals
Ascites in domestic animalsDr. Prabhu kumar
 
Stem cells: Information environment
Stem cells: Information environmentStem cells: Information environment
Stem cells: Information environmentArete-Zoe, LLC
 
2017 slideshare
2017 slideshare2017 slideshare
2017 slideshareomhealth
 
Animal Farm Chapter 4
Animal Farm Chapter 4Animal Farm Chapter 4
Animal Farm Chapter 4mrbelprez
 
Intestinal obstruction in small animals
Intestinal obstruction in small animalsIntestinal obstruction in small animals
Intestinal obstruction in small animalsDr Alok Bharti
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Longhow Lam
 
Anomaly Detection Via PCA
Anomaly Detection Via PCAAnomaly Detection Via PCA
Anomaly Detection Via PCADeepak Kumar
 
Teaching ML with scikit-learn at Telecom ParisTech
Teaching ML with scikit-learn at Telecom ParisTechTeaching ML with scikit-learn at Telecom ParisTech
Teaching ML with scikit-learn at Telecom ParisTechagramfort
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnagramfort
 

Destacado (20)

Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Artificial neural network for misuse detection
Artificial neural network for misuse detectionArtificial neural network for misuse detection
Artificial neural network for misuse detection
 
masters seminar_Detection
masters seminar_Detectionmasters seminar_Detection
masters seminar_Detection
 
Robust techniques for background subtraction in urban
Robust techniques for background subtraction in urbanRobust techniques for background subtraction in urban
Robust techniques for background subtraction in urban
 
Network anomaly detection based on statistical
Network anomaly detection based on statistical Network anomaly detection based on statistical
Network anomaly detection based on statistical
 
Animal Quotes
Animal QuotesAnimal Quotes
Animal Quotes
 
sgp 30 slides
sgp 30 slidessgp 30 slides
sgp 30 slides
 
Intrusion Detection Presentation
Intrusion Detection PresentationIntrusion Detection Presentation
Intrusion Detection Presentation
 
Ascites in domestic animals
Ascites in domestic animalsAscites in domestic animals
Ascites in domestic animals
 
Stem cells: Information environment
Stem cells: Information environmentStem cells: Information environment
Stem cells: Information environment
 
2017 slideshare
2017 slideshare2017 slideshare
2017 slideshare
 
Animal Farm Chapter 4
Animal Farm Chapter 4Animal Farm Chapter 4
Animal Farm Chapter 4
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Intestinal obstruction in small animals
Intestinal obstruction in small animalsIntestinal obstruction in small animals
Intestinal obstruction in small animals
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
 
Anomaly Detection Via PCA
Anomaly Detection Via PCAAnomaly Detection Via PCA
Anomaly Detection Via PCA
 
Teaching ML with scikit-learn at Telecom ParisTech
Teaching ML with scikit-learn at Telecom ParisTechTeaching ML with scikit-learn at Telecom ParisTech
Teaching ML with scikit-learn at Telecom ParisTech
 
Dev gene therapy
Dev gene therapyDev gene therapy
Dev gene therapy
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learn
 

Último

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013

Notas del editor

  1. (1:00)Thank organizers &amp; attendeesMy background thesisInvitation to connect
  2. (1:00)Anomaly detection is intuitiveRequires a contextRequires a measure
  3. (0:45)MNIST database of handwritten digits. Longstanding story about accuracy of the data set.Volkswagen share price from 210EUR -&gt; 1005EUR. Porsche disclosed holdings, including options that intended to acquire the underlying in. This was going to deplete the float, which caused a run by short sellers. (http://www.risk.net/risk-magazine/feature/1498381/the-volkswagen-squeeze)Anomalies focus our attention
  4. (0:45)Anomalies have intrinsic valuebusiness, social and scientific valuetransactions, like insurance, purchases, returns, etc., looking for unusual good and bad behavior. Canonical example is credit card fraud, for instance my recent “purchase” of wine in SpainVideo surveillance, directly examining people, vehicles, and scenes for gait, position, counts, etc. to determine unusual traffic, intent, directionEmail – canonical example is the spam scam. Anomalous to me individually by content, sender, etc.Anomalous to recipients of an ISP because of the number of spreadMalware – anomalous mailings by me
  5. (0:45)Often overlookedTwo axesExpensive to acquire examplesExpensive to miss anomaliesCurrency – secret service tv episodeConditions – life safety, services, etcSeizures
  6. Anomalies everywhereChanging perspective
  7. Machine learning makes it happenIdeal vs. real systemAlertsbc of intervention costOnline is rareWorkflow is similar
  8. Data growthUnusual eventsExpensive to modelLabeled examples are rare, expensivePrioritized focus
  9. Meet bobRed apples are “poison” so build a healthy (green) apple detector
  10. RFA request for applesCount all combinations of “what I said It was” x “what it actually was” -&gt; confusion matrixNote the unforeseen apple examples: rotten, yellow, etc.These unanticipated counter-examples are one reason why traditional classification “breaks”
  11. Confusion matrices are … confusingReduce to two statistics (sens, spec)Fpr is related to specSens: how well do we do on green applesSpec: how well do we do on the othersExample: can build a perfect green apple detector by labeling all apples green. That’s highly sensitive, but not specific
  12. Watercore is a real produce feature!This works pretty well for some problems, but there are issues as we will see…
  13. Tukey = nonparametric, spherical region of supportStddev = parametric, spherical region of supportMahalanobis = elliptical, generalization of stddev, tighter bounds but more expensive to computeIn practice, mahalanobis performs nicely
  14. Ideal case: find statistically significant “islands”Curiously, outliers distort this taskThe one-class SVM is the canonical, golden algorithm to achieve this Oracle Data Mining implements one-class svmThere are better variants, now, like OCNM
  15. Outlier pruning before modeling can helpRich data has representation challengesHow do you encode feature vectors?What is an anomaly?How do you define normal?Semisupervised technique: do anomaly detection + use labels for classifyingIf online system, concerned with latencyFeatures matter, even more so for anomaly detectionClustering is an alternative and related problem. Many other related problems. Maybe worth considering.
  16. Good survey paperThey create a taxonomy of techniquesExamples of AD techniques listed Note familiar methods: lots of ML algorithms can be reworked as anomaly detectionStrategies:Find a technique that works for your dataMap your data so it works with your favorite techniqueInvent your own technique
  17. When non-controllable, looking at Surgical brain resection (gold standard)Implantable device (experimental)alternative
  18. Real 20-minictal EEGSeizures not so obvious in raw time series form
  19. We pick simple but robust features from the speech and signal processing literatureTime series almost never useful in raw formUse sliding window approachesHow to pick window width?What about multiscale phenomena
  20. Interictal (baseline) features vsictal (seizure)Notice that feature distributions shift during seizure = anomaly
  21. Data growthUnusual eventsExpensive to modelLabeled examples are rare, expensivePrioritized focus