SlideShare una empresa de Scribd logo
1 de 60
Descargar para leer sin conexión
2nd edition | July 4-6, 2022
1
BigML, Inc #DutchMLSchool
Shallow and Deep Methods for
Anomaly Detection
Thomas G. Dietterich
Chief Scientist, BigML
2
BigML, Inc #DutchMLSchool
• Anomaly Detection Use Cases
• Four Basic Methods for Anomaly Detection with Engineered Features
• Benchmarking Study
• Incorporating Feedback
• Deep Versions of the Four Basic Methods
• Classifier-Based Anomaly Detection using the Max Logit Score
• Familiarity Hypothesis
• Challenges for the Future
Outline
3
BigML, Inc #DutchMLSchool
Anomaly Detection Use Cases
4
BigML, Inc #DutchMLSchool 5
•Data Cleaning
•Remove corrupted data from the training data
•Example: Typos in feature values, feature values interchanged, test results from two patients
combined
•Fault Detection, Fraud Detection, Cyber Attack
•At training or test time, faulty or illegal behavior creates anomalous data
•Open Category Detection
•At test time, the classifier is given an instance of a novel category
•Example: Self-driving car (trained in Europe) encounters a kangaroo (in Australia)
•Out-of-Distribution Detection
•At test time, the classifier is given an instance collected in a different way
•Example: Chest X-Ray classifier trained only on front views is shown a side view
•Example: Self-driving car trained in clear conditions must operate during rainy conditions
Use Cases
BigML, Inc #DutchMLSchool 6
•Claim: Every deployed ML
classifier should include an
anomaly detector to detect
queries that lie outside the
region of competence of the
classifier
•Also useful as a performance
indicator to detect that you
need to retrain the classifier
Protecting a Classifier
𝑥𝑥𝑞𝑞
Anomaly
Detector
𝐴𝐴 𝑥𝑥𝑞𝑞 > 𝜏𝜏?
Classifier 𝑓𝑓
Training
Examples
(𝑥𝑥𝑖𝑖, 𝑦𝑦𝑖𝑖) no
�
𝑦𝑦 = 𝑓𝑓(𝑥𝑥𝑞𝑞)
yes reject
BigML, Inc #DutchMLSchool 7
•Definition: An “anomaly” is a data point generated by a process that is
different than the process generating the “nominal” data
•Let 𝐷𝐷0 be the probability distribution of the nominal process
•Let 𝐷𝐷𝑎𝑎 be the probability distribution of the anomaly process
•Two formal settings
• Clean training data
• Contaminated training data
Anomaly Detection Definitions
BigML, Inc #DutchMLSchool 8
• Given:
• Training data: 𝑥𝑥1, 𝑥𝑥2, … , 𝑥𝑥𝑁𝑁
• All data come from 𝐷𝐷0 the “nominal” distribution
• Test data: 𝑥𝑥𝑁𝑁+1, … , 𝑥𝑥𝑁𝑁+𝑀𝑀 from a mixture of 𝐷𝐷0 and 𝐷𝐷𝑎𝑎 (the anomaly
distribution)
• Find:
• The data points in the test data that belong to 𝐷𝐷𝑎𝑎
• Examples:
• Protecting a classifier
• Detecting manufacturing defects / equipment failure
Clean Training Data
BigML, Inc #DutchMLSchool 9
• Given:
• Training data: 𝑥𝑥1, 𝑥𝑥2, … , 𝑥𝑥𝑁𝑁 from a mixture of 𝐷𝐷0 and 𝐷𝐷𝑎𝑎 (the anomaly
distribution)
• Find:
• The data points in the training data that belong to 𝐷𝐷𝑎𝑎
• Use Cases:
• Data cleaning
• Fraud detection, Insider Threat detection
• These two cases can be combined
• Contaminated training data + Separate contaminated test data
Contaminated Training Data
BigML, Inc #DutchMLSchool
Four Basic Methods for Anomaly
Detection with Engineered Features
10
BigML, Inc #DutchMLSchool 11
•Distance-Based Methods
•Anomaly score
𝐴𝐴 𝑥𝑥𝑞𝑞 = min
𝑥𝑥∈𝐷𝐷
𝑥𝑥𝑞𝑞 − 𝑥𝑥
•Density Estimation Methods
•Surprise: 𝐴𝐴 𝑥𝑥𝑞𝑞 = − log 𝑃𝑃𝐷𝐷(𝑥𝑥𝑞𝑞)
•Model the joint distribution
𝑃𝑃𝐷𝐷(𝑥𝑥) of the input data points
𝑥𝑥1, … ∈ 𝐷𝐷
Theoretical Approaches to Anomaly Detection
•Quantile Methods
•Find a smooth function 𝑓𝑓 such that
𝑥𝑥: 𝑓𝑓 𝑥𝑥 ≥ 0 contains 1 − 𝛼𝛼 of the
training data
•Anomaly score 𝐴𝐴 𝑥𝑥 = −𝑓𝑓(𝑥𝑥)
•Reconstruction Methods
•Train an auto-encoder: 𝑥𝑥 ≈
𝐷𝐷 𝐸𝐸 𝑥𝑥 , where 𝐸𝐸 is the encoder and
𝐷𝐷 is the decoder
•Anomaly score
𝐴𝐴 𝑥𝑥𝑞𝑞 = 𝑥𝑥𝑞𝑞 − 𝐷𝐷 𝐸𝐸 𝑥𝑥𝑞𝑞
BigML, Inc #DutchMLSchool 12
•Define a distance 𝑑𝑑(𝑥𝑥𝑖𝑖, 𝑥𝑥𝑗𝑗)
• 𝐴𝐴 𝑥𝑥𝑞𝑞 = min
𝑥𝑥∈𝐷𝐷
𝑑𝑑(𝑥𝑥𝑞𝑞, 𝑥𝑥)
•Requires a good distance metric
Approach 1: Distance-Based Methods
𝑥𝑥𝑞𝑞
𝑥𝑥𝑞𝑞
BigML, Inc #DutchMLSchool 13
• Approximates L1 (Manhattan) Distance
• (Guha, et al., ICML 2016)
• Construct a fully random binary tree
• choose attribute 𝑗𝑗 at random
• choose splitting threshold 𝜃𝜃 uniformly from
min 𝑥𝑥⋅𝑗𝑗 , max 𝑥𝑥⋅𝑗𝑗
• until every data point is in its own leaf
• let 𝑑𝑑(𝑥𝑥𝑖𝑖) be the depth of point 𝑥𝑥𝑖𝑖
• repeat 𝐿𝐿 times
• let ̅
𝑑𝑑(𝑥𝑥𝑖𝑖) be the average depth of 𝑥𝑥𝑖𝑖
• 𝐴𝐴 𝑥𝑥𝑖𝑖 = 2
−
�
𝑑𝑑 𝑥𝑥𝑖𝑖
𝑟𝑟 𝑥𝑥𝑖𝑖
• 𝑟𝑟(𝑥𝑥𝑖𝑖) is the expected depth
Isolation Forest [Liu, Ting, Zhou, 2011]
𝑥𝑥⋅𝑗𝑗
𝑥𝑥⋅𝑗𝑗 > 𝜃𝜃
𝑥𝑥⋅2 > 𝜃𝜃2 𝑥𝑥⋅8 > 𝜃𝜃3
𝑥𝑥⋅3 > 𝜃𝜃4 𝑥𝑥⋅1 > 𝜃𝜃5
𝑥𝑥𝑖𝑖
BigML, Inc #DutchMLSchool 14
• Given a data set 𝑥𝑥1, … , 𝑥𝑥𝑁𝑁 where
𝑥𝑥𝑖𝑖 ∈ ℝ𝑑𝑑
• We assume the data have been drawn
iid from an unknown probability
density: 𝑥𝑥𝑖𝑖 ∼ 𝑃𝑃 𝑥𝑥𝑖𝑖
• Goal: Estimate 𝑃𝑃
• Anomaly Score: 𝐴𝐴 𝑥𝑥𝑞𝑞 = − log 𝑃𝑃 𝑥𝑥𝑞𝑞
• “surprisal” from information theory
• Why density estimation?
• Gives a more global view by combining
distances to all data points
Approach 2: Density Estimation
BigML, Inc #DutchMLSchool 15
•Introduce sparse random
projections Π𝑙𝑙 into 1-
dimensional space
•Fit a density estimator
𝑃𝑃𝑙𝑙 Π𝑙𝑙 𝑥𝑥 in each 1-d space
• 𝐴𝐴 𝑥𝑥 =
1
𝐿𝐿
∑𝑙𝑙=1
𝐿𝐿
− log 𝑃𝑃𝑙𝑙 Π𝑙𝑙 𝑥𝑥𝑞𝑞
Example: LODA
(Pevny, 2015)
BigML, Inc #DutchMLSchool 16
• Vapnik’s principle: We only need to
estimate the “decision boundary” between
nominal and anomalous
• Surround the data by a function 𝑓𝑓 that
captures 1 − 𝜖𝜖 of the training data
• One-Class Support Vector Machine
(OCSVM)
• 𝑓𝑓 is a hyperplane in “kernel space”
• Support Vector Data Description (SVDD)
• 𝑓𝑓 is a sphere is “kernel space”
• Issue
• Need to choose 𝜖𝜖 at learning time rather
than run time
Approach 3: Quantile Methods
BigML, Inc #DutchMLSchool 17
• NavLab self-driving van (Pomerleau, 1992)
• Primary head: Predict steering angle from
input image
• Secondary head: Predict the input image
(“auto-encoder”)
• 𝐴𝐴 𝑥𝑥𝑞𝑞 = 𝑥𝑥𝑞𝑞 − �
𝑥𝑥𝑞𝑞
• If reconstruction is poor, this suggests that
the steering angle should not be trusted
• Principle: Anomaly Detection through
Failure
• Define a task on which the learned system
should fail for anomalies
Approach 4: Reconstruction Methods
Pomerleau, NIPS 1992
BigML, Inc #DutchMLSchool 18
• NASA Mars Science Laboratory ChemCam
instrument
• Collects 6144 spectral bands on rock samples
from 7m distance using laser stimulation
• Goal: active learning to find interesting spectra
• DEMUD
• Incremental PCA applied to samples one at a time
• Fit only to the samples labeled as “uninteresting” by
the user
• Show the user the most un-uninteresting sample
(sample with highest PCA reconstruction error)
• Rapidly discovers interesting samples
• Wagstaff, et al. (2013)
Application: Finding Unusual Chemical Spectra
BigML, Inc #DutchMLSchool 19
• Distance-Based Methods
• k-NN: Mean distance to 𝑘𝑘-nearest neighbors
• LOF: Local Outlier Factor (Breunig, et al., 2000)
• ABOD: kNN Angle-Based Outlier Detector (Kriegel, et al., 2008)
• IFOR: Isolation Forest (Liu, et al., 2008)
• Density-Based Approaches
• RKDE: Robust Kernel Density Estimation (Kim & Scott, 2008)
• EGMM: Ensemble Gaussian Mixture Model (our group)
• LODA: Lightweight Online Detector of Anomalies (Pevny, 2016)
• Quantile-Based Methods
• OCSVM: One-class SVM (Schoelkopf, et al., 1999)
• SVDD: Support Vector Data Description (Tax & Duin, 2004)
Benchmarking Study [Andrew Emmott, 2015, 2020]
BigML, Inc #DutchMLSchool 20
• Select 19 data sets from UC Irvine repository
• Choose one or more classes to be “anomalies”; the rest are “nominals”
• Manipulate
• Relative frequency
• Point difficulty
• Irrelevant features
• Clusteredness
• 20 replicates of each configuration
• Result: 11,888 Non-trivial Benchmark Datasets
Benchmarking Methodology
BigML, Inc #DutchMLSchool 21
• Linear ANOVA
• log
𝐴𝐴𝐴𝐴𝐴𝐴
1 −𝐴𝐴𝐴𝐴𝐴𝐴
~ 𝑟𝑟𝑟𝑟 + 𝑝𝑝𝑝𝑝 + 𝑐𝑐𝑐𝑐 + 𝑖𝑖𝑖𝑖 + 𝑝𝑝𝑠𝑠𝑠𝑠𝑠𝑠 + 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
• rf: relative frequency
• pd: point difficulty
• cl: normalized clusteredness
• ir: irrelevant features
• pset: “Parent” set
• algo: anomaly detection algorithm
• Assess the algo effect while controlling for all other factors
• 𝐴𝐴𝐴𝐴𝐴𝐴: area under the ROC curve for the nominal vs. anomaly binary decision
Analysis of Variance
BigML, Inc #DutchMLSchool 22
• 19 UCI Datasets
• 9 Leading “feature-based” algorithms
• 11,888 non-trivial benchmark datasets
• Mean AUC effect for “nominal” vs. “anomaly” decisions
• Controlling for
• Parent data set
• Difficulty of individual queries
• Fraction of anomalies
• Irrelevant features
• Clusteredness of anomalies
• Baseline method: Distance to nominal mean (“tmd”)
• Best methods: K-nearest neighbors and Isolation Forest
• Worst methods: Kernel-based OCSVM and SVDD
Benchmarking Study Results
0.62
0.64
0.66
0.68
0.70
0.72
0.74
0.76
0.78
knn iforest egmm rkde lof abod loda svdd tmd ocsvm
Mean AUC Effect
BigML, Inc #DutchMLSchool 23
• Show top-ranked candidate to the
user
• User labels candidate
• Label is used to update the anomaly
detector
• Two methods
• AAD [Das, et al, ICDM 2016]
• GLAD-OMD (modified version of
iForest) [Siddiqui, et al., KDD 2018]
Incorporating User Feedback: Initial Work
Data
Anomaly
Detection
Best
Candidate
User
Anomaly Analysis
yes
no
BigML, Inc #DutchMLSchool 24
User Feedback Yields Big Improvements in
Anomaly Discovery
APT Engagement 3 Results
BigML, Inc #DutchMLSchool
Deep Versions of the Four Basic Methods
25
BigML, Inc #DutchMLSchool 26
• Input image 𝑥𝑥
• Network backbone, also called
the “encoder”: 𝑧𝑧 = 𝐸𝐸 𝑥𝑥
• Latent representation 𝑧𝑧
• “Logits” ℓ𝑘𝑘 = 𝑤𝑤𝑘𝑘 ⋅ 𝑧𝑧
• Predicted probabilities
̂
𝑝𝑝 𝑦𝑦 = 𝑘𝑘 𝑥𝑥 =
exp ℓ𝑘𝑘(𝑧𝑧)
∑𝑘𝑘′ exp ℓ𝑘𝑘′(𝑧𝑧)
Deep Anomaly Detection in Image Classification
Convolutional Neural Network Classifier
Image
𝑥𝑥
Penultimate Layer
𝑧𝑧
Logits ℓ𝑘𝑘 = 𝑤𝑤𝑘𝑘
⊤
𝑧𝑧
Probabilities
�
𝑝𝑝(𝑦𝑦 = 𝑘𝑘|𝑥𝑥)
̂
𝑝𝑝(𝑦𝑦 = 𝑘𝑘|𝑥𝑥)
“Backbone” encoder 𝐸𝐸
BigML, Inc #DutchMLSchool 27
•K-nearest neighbor in the
latent space
•Issue: What distance metric to
use?
•Cosine distance is the most
popular:
𝑑𝑑 𝑧𝑧1, 𝑧𝑧2 =
𝑧𝑧1 ⋅ 𝑧𝑧2
𝑧𝑧1 ‖𝑧𝑧2‖
Distance-Based Methods
BigML, Inc #DutchMLSchool 28
•Mahalanobis Method
• Fit a joint multivariate Gaussian
• Each class 𝑘𝑘 has its own mean 𝜇𝜇𝑘𝑘
• Shared covariance matrix Σ
•Given a new 𝑥𝑥,
log 𝑃𝑃(𝑥𝑥) ∝ min
𝑘𝑘
𝑥𝑥 − 𝜇𝜇𝑘𝑘
⊤
Σ−1
𝑥𝑥 − 𝜇𝜇𝑘𝑘
This is known as the squared
Mahalanobis distance
Density-Based Methods
BigML, Inc #DutchMLSchool 29
• Residual Flow Deep Density Estimator
• (Chen, Behrmann, Duvenaud, et al. NeurIPS 2019)
• Standard Cross-Entropy Supervised Loss
• Claim: This helps focus 𝑃𝑃 𝑥𝑥 on relevant aspects of the images
• Anomaly Score: 𝐴𝐴 𝑥𝑥𝑞𝑞 = − log 𝑃𝑃(𝑥𝑥𝑞𝑞)
Open Hybrid: Classification + Density Estimation
(Tack, Li, Guo, Guo, 2020)
BigML, Inc #DutchMLSchool 30
• The method is somewhat tricky to work with
• Set 𝑐𝑐 as the mean of a small set of points passed through the untrained network
• No bias weights
• These help prevent “hypersphere collapse”
Quantile Method: Deep SVDD (Ruff, et al. ICML 2018)
BigML, Inc #DutchMLSchool 31
• Encoder: 𝑧𝑧 = 𝐸𝐸 𝑥𝑥
• Decoder: �
𝑥𝑥 = 𝐷𝐷(𝑧𝑧)
• Challenge: How to constrain 𝐸𝐸 and
𝐷𝐷 so that the autoencoder fails on
anomalies but succeeds on nominal
images?
• Autoencoders often learn general-
purpose image compression
methods
Reconstruction Methods: Deep Autoencoders
𝑥𝑥
𝑧𝑧
�
𝑥𝑥
𝐸𝐸 𝐷𝐷
BigML, Inc #DutchMLSchool
Classifier-Based Anomaly Detection
using the Max Logit Score
32
BigML, Inc #DutchMLSchool 33
•Garrepalli (2020)
• Train classifier to optimize
softmax likelihood (minimize
“cross-entropy loss”)
• Maximum logit score is better
than two distance methods:
• Isolation Forest
• LOF (a nearest-neighbor method)
Surprise: The Max Logit Score
0.68 0.67
0.63
0.72
0.51
0.44
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
H (y|x) Max SoftMax-
prob.
Max BCE-prob Max-logit Iforest LOF
AUROC
Anomaly Measures on Latent Representations for CIFAR-100
BigML, Inc #DutchMLSchool 34
• Vaze, Han, Vedaldi, Zisserman (2021): “Open
Set Recognition: A Good Classifier is All You
Need” (ICLR 2022; arXiv 2110.06207)
• Carefully train a classifier using the latest tricks
• Standard cross-entropy combined with the
following:
• Cosine learning rate schedule
• Learning rate warmup
• RandAugment augmentations
• Label Smoothing
• Anomaly score: max logit
• − max
𝑘𝑘
ℓ𝑘𝑘
More Evidence for Max Logit
Protocol from Lawrence Neal et al. (2018)
BigML, Inc #DutchMLSchool 35
•Novel class difficulty based on
semantic distance
• CUB: Bird species
• Air: Aircraft
• ImageNet
Still More Evidence for Max Logit
BigML, Inc #DutchMLSchool 36
Why?
Let’s Examine the Learned Representations
BigML, Inc #DutchMLSchool 37
• DenseNet with 384-dimensional
latent space.
• CIFAR-10: 6 known classes, 4 novel
classes
• UMAP visualization
• Light green: novel classes
• Darker greens: known classes
• Note that many novel classes stay
toward the center of the space;
others overlap with known classes
• Training was not required to “pull
them out” so that they could be
discriminated
How are open set images represented by deep
learning?
Alex Guyer
6 Known
Classes
4 Novel
Classes
BigML, Inc #DutchMLSchool 38
Similar Results from Other Groups
[Tack, et al. NeurIPS 2020] [Vaze, et al. arXiv 2110.06207]
BigML, Inc #DutchMLSchool 39
• Convolutional neural network learns “features” that
detect image patches relevant to the classification
task
• The logit layer weights these features to make the
classification decision
• Novel classes activate fewer of these features, so
their activation vectors are smaller
• Hypothesis: The networks don’t detect that an
elephant is novel because of trunk and tusks but
because its head doesn’t activate known features
The Familiarity Hypothesis
The network doesn’t
detect novelty, it detects
the absence of familiarity
BigML, Inc #DutchMLSchool 40
Novel images strongly activate fewer
features
• CIFAR 10: 6 known classes; 4 novel
classes
• DenseNet (𝑧𝑧 has 324 dimensions)
• Activation threshold 𝜃𝜃
• Count number of features whose
activation exceeds 𝜃𝜃
• OOD images activate fewer
features
Evidence: Number of Activated Features
Alex Guyer (unpublished)
BigML, Inc #DutchMLSchool 41
Are they features “on” the object vs. the
background?
• Strategy: blur the object and see how the
feature activations change
• activations that change must be on the object
• Details:
• PASCAL VOC Segmented Images
• Blur the original image (31x31 kernel; sd=31)
• Form composite image where blurred region
replaces the segmented region
Which features are responsible for the drop in
activation?
https://www.peko-step.com/en/tool/blur.html
BigML, Inc #DutchMLSchool 42
Blurring Examples
Note: This does not remove all object-related information (e.g.,
object boundary), so we don’t detect all on-object features
BigML, Inc #DutchMLSchool 43
• “presence feature”
• 𝐵𝐵𝐵𝐵 𝑖𝑖, 𝑗𝑗 > 0. Blurring decreases the
activity of the feature. Its net effect is to
measure the presence of one or more
image patterns
• Its activity is high when those patterns
are present
• “absence feature”
• 𝐵𝐵𝐵𝐵 𝑖𝑖, 𝑗𝑗 < 0. Blurring increases the
activity of the feature. Its net effect is to
measure the absence of one or more
image patterns
• Its activity is high when those patterns
are absent
• Define the “blurring effect” of feature 𝑗𝑗 on
image 𝑖𝑖
𝐵𝐵𝐵𝐵 𝑖𝑖, 𝑗𝑗 = 𝑧𝑧𝑖𝑖𝑖𝑖 − ̃
𝑧𝑧𝑖𝑖𝑖𝑖
where
• 𝑧𝑧𝑖𝑖𝑖𝑖 is the activation of latent feature 𝑗𝑗 on
image 𝑖𝑖
• ̃
𝑧𝑧𝑖𝑖𝑖𝑖 is the activation of latent feature 𝑗𝑗 on
blurred image 𝑖𝑖
Blurring Effect
BigML, Inc #DutchMLSchool 44
•On average, the activation of
a feature changes when the
object (of class 𝑘𝑘) is blurred
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘
=
1
𝑁𝑁𝑘𝑘
�
𝑖𝑖:𝑦𝑦𝑖𝑖=𝑘𝑘
𝑧𝑧𝑖𝑖𝑖𝑖𝑖𝑖 − ̃
𝑧𝑧𝑖𝑖𝑖𝑖𝑖𝑖
•Feature 𝑗𝑗 is a net presence
feature for class 𝑘𝑘 if
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 > 0.02
•Feature 𝑗𝑗 is a net absence
feature for class 𝑘𝑘 if
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 < −0.02
•Otherwise 𝑗𝑗 is net neutral for
class 𝑘𝑘
“On Object” score of feature 𝑗𝑗 for class 𝑘𝑘
BigML, Inc #DutchMLSchool 45
• Logit score is ℓ𝑗𝑗𝑗𝑗 = ∑𝑗𝑗 𝑤𝑤𝑗𝑗𝑗𝑗𝑧𝑧𝑖𝑖𝑖𝑖
• Contribution of 𝑗𝑗 in image 𝑖𝑖 to class 𝑘𝑘:
• 𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖 = 𝑤𝑤𝑗𝑗𝑗𝑗𝑧𝑧𝑖𝑖𝑖𝑖 (in normal images)
• ̃
𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖 = 𝑤𝑤𝑗𝑗𝑗𝑗 ̃
𝑧𝑧𝑖𝑖𝑖𝑖 (in blurred images)
• Mean contribution
• ̅
𝑐𝑐𝑗𝑗𝑗𝑗 =
1
𝑁𝑁𝑘𝑘
∑ 𝑖𝑖 𝑦𝑦𝑖𝑖 = 𝑘𝑘 𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖
• ̅̃
𝑐𝑐𝑗𝑗𝑗𝑗 =
1
𝑁𝑁𝑘𝑘
∑ 𝑖𝑖 𝑦𝑦𝑖𝑖 = 𝑘𝑘 ̃
𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖
Feature Taxonomy
𝒘𝒘𝒋𝒋𝒋𝒋 > 𝟎𝟎 𝒘𝒘𝒋𝒋𝒋𝒋 < 𝟎𝟎
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘
> 0.02
positive
presence
negative
presence
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘
< 0.02
positive
absence
negative
absence
Sun & Li: On the Effectiveness of Sparsification for Detecting the
Deep Unknowns. arXiv 2111.09805
BigML, Inc #DutchMLSchool 46
Mean feature types for class 3
1.00
0.00
On-Object
Index
(presence)
On-Object
Index
(absence)
positive features
negative features
red = presence
blue = absence
BigML, Inc #DutchMLSchool 47
Zoomed View: Blurring reduces ̅
𝑐𝑐𝑗𝑗𝑗𝑗
Mean unblurred
contribution
Mean blurred contribution
• Blurring…
• reduces the contribution of
positive presence features (red
dots)
• reduces the contribution of
negative absence features (blue
dots)
1.00
0.00
On-Object
Index
(presence)
On-Object
Index
(absence)
BigML, Inc #DutchMLSchool 48
Decomposing the Logit Score: Four Cases
Positive presence:
𝑤𝑤𝑗𝑗𝑗𝑗 > 0 and
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 > 0
Positive absence:
𝑤𝑤𝑗𝑗𝑗𝑗 > 0 and
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 < 0
Negative presence:
𝑤𝑤𝑗𝑗𝑗𝑗 > 0 and
𝑂𝑂𝑂𝑂(𝑗𝑗, 𝑘𝑘) > 0
Negative absence:
𝑤𝑤𝑗𝑗𝑗𝑗 < 0 and
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 < 0
BigML, Inc #DutchMLSchool 49
Visualizing Individual Images: OOD Instance 838
BigML, Inc #DutchMLSchool 50
OOD Instance 770
BigML, Inc #DutchMLSchool 51
OOD Instance 432
BigML, Inc #DutchMLSchool 52
• Note that the Positive Presence
features dominate the max logit
score
• The Negative Absence and
Positive Absence features
(purple and blue lines) make a
small contribution
• Negative Presence features
make no contribution
• Conclusion: Decreases in
activations of positive presence
account for most of the max
logit score
Decomposing the Novelty Scores
BigML, Inc #DutchMLSchool 53
•Red line: trend for Positive
Presence contribution to max
logit score
•Black line: smooth estimate of
classification accuracy
(“known” vs “novel”)
Decreases in Positive Presence Features
Account for Novelty Detection Accuracy
BigML, Inc #DutchMLSchool 54
•Blakemore, Colin, and Grahame F.
Cooper. “Development of the brain
depends on the visual environment.”
(1970): 477-478.
• Kittens raised in environments with
only horizontal or only vertical lines
• “They were virtually blind for contours
perpendicular to the orientation they
had experienced.”
•Chomsky: “Poverty of the stimulus”
Can we expect computer vision systems to perceive
things they have not been trained on?
Source: Li Yang Ku
https://computervisionblog.wordpress.com/2013/06/01/ca
ts-and-vision-is-vision-acquired-or-innate/
BigML, Inc #DutchMLSchool 55
• Familiarity-based anomaly detection advantages:
• Easy to implement – Anomaly signal (max logit) can be extracted from the
classifier. No separate anomaly detection model is needed
• Training on additional, auxiliary classes improves both classification and
anomaly detection performance
• Familiarity-based anomaly detection weaknesses
• Partially-occluded nominal objects will be flagged as anomalies
• If an image contains both a novel object and a known object, the novel object
will not be detected
• Adversarial attacks can easily cause false anomalies and missed anomalies
Implications
BigML, Inc #DutchMLSchool
Open Challenges
56
BigML, Inc #DutchMLSchool 57
• Can we learn deep representations that can represent outliers?
• Nonstationarity
• As the world changes, the anomaly detection model must also change
• Explanation
• Users often want explanations of why something is labeled as anomalous in order to provide feedback or
take other actions
• Setting alarm thresholds
• How can we set a threshold to control the false alarm and missed alarm rates?
• Incremental (continual) learning in deep networks
• How can we efficiently update a trained neural network to incorporate user feedback?
• Anomaly detection in temporal, spatial, and spatio-temporal data, in video data, etc.
• Anomaly detection at multiple scales
Challenges for Anomaly Detection
BigML, Inc #DutchMLSchool
Summary
58
BigML, Inc #DutchMLSchool
• Four Basic Methods
• Distances, densities, density quantiles, and reconstruction
• Distances work best; Isolation Forest is very robust
• Anomaly Detection in Deep Learning
• The four basic methods have been extended to deep learning
• They often do not work well when applied to learned representations
• Classifier Max Logit Score Gives Very Competitive Performance
• Computed as a side effect of standard deep classifiers
• Measures familiarity rather than novelty, which makes it risky in many settings
• Advances in Deep Anomaly Detection Require Learning Better Representations
Shallow and Deep Methods for Anomaly Detection
59
Co-organized by:
Companies Presenting:
60

Más contenido relacionado

Similar a DutchMLSchool 2022 - History and Developments in ML

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means ClusteringJunghoon Kim
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptxImXaib
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Daniel Roggen
 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsKimin Lee
 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsQuantUniversity
 
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Kevin Mader
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Lionel Briand
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...David Zibriczky
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning ResearchBrodmann17
 

Similar a DutchMLSchool 2022 - History and Developments in ML (20)

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labels
 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal Datasets
 
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
03 presentation-bothiesson
03 presentation-bothiesson03 presentation-bothiesson
03 presentation-bothiesson
 
07 learning
07 learning07 learning
07 learning
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
 

Más de BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
 

Más de BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 

Último

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Último (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

DutchMLSchool 2022 - History and Developments in ML

  • 1. 2nd edition | July 4-6, 2022 1
  • 2. BigML, Inc #DutchMLSchool Shallow and Deep Methods for Anomaly Detection Thomas G. Dietterich Chief Scientist, BigML 2
  • 3. BigML, Inc #DutchMLSchool • Anomaly Detection Use Cases • Four Basic Methods for Anomaly Detection with Engineered Features • Benchmarking Study • Incorporating Feedback • Deep Versions of the Four Basic Methods • Classifier-Based Anomaly Detection using the Max Logit Score • Familiarity Hypothesis • Challenges for the Future Outline 3
  • 4. BigML, Inc #DutchMLSchool Anomaly Detection Use Cases 4
  • 5. BigML, Inc #DutchMLSchool 5 •Data Cleaning •Remove corrupted data from the training data •Example: Typos in feature values, feature values interchanged, test results from two patients combined •Fault Detection, Fraud Detection, Cyber Attack •At training or test time, faulty or illegal behavior creates anomalous data •Open Category Detection •At test time, the classifier is given an instance of a novel category •Example: Self-driving car (trained in Europe) encounters a kangaroo (in Australia) •Out-of-Distribution Detection •At test time, the classifier is given an instance collected in a different way •Example: Chest X-Ray classifier trained only on front views is shown a side view •Example: Self-driving car trained in clear conditions must operate during rainy conditions Use Cases
  • 6. BigML, Inc #DutchMLSchool 6 •Claim: Every deployed ML classifier should include an anomaly detector to detect queries that lie outside the region of competence of the classifier •Also useful as a performance indicator to detect that you need to retrain the classifier Protecting a Classifier 𝑥𝑥𝑞𝑞 Anomaly Detector 𝐴𝐴 𝑥𝑥𝑞𝑞 > 𝜏𝜏? Classifier 𝑓𝑓 Training Examples (𝑥𝑥𝑖𝑖, 𝑦𝑦𝑖𝑖) no � 𝑦𝑦 = 𝑓𝑓(𝑥𝑥𝑞𝑞) yes reject
  • 7. BigML, Inc #DutchMLSchool 7 •Definition: An “anomaly” is a data point generated by a process that is different than the process generating the “nominal” data •Let 𝐷𝐷0 be the probability distribution of the nominal process •Let 𝐷𝐷𝑎𝑎 be the probability distribution of the anomaly process •Two formal settings • Clean training data • Contaminated training data Anomaly Detection Definitions
  • 8. BigML, Inc #DutchMLSchool 8 • Given: • Training data: 𝑥𝑥1, 𝑥𝑥2, … , 𝑥𝑥𝑁𝑁 • All data come from 𝐷𝐷0 the “nominal” distribution • Test data: 𝑥𝑥𝑁𝑁+1, … , 𝑥𝑥𝑁𝑁+𝑀𝑀 from a mixture of 𝐷𝐷0 and 𝐷𝐷𝑎𝑎 (the anomaly distribution) • Find: • The data points in the test data that belong to 𝐷𝐷𝑎𝑎 • Examples: • Protecting a classifier • Detecting manufacturing defects / equipment failure Clean Training Data
  • 9. BigML, Inc #DutchMLSchool 9 • Given: • Training data: 𝑥𝑥1, 𝑥𝑥2, … , 𝑥𝑥𝑁𝑁 from a mixture of 𝐷𝐷0 and 𝐷𝐷𝑎𝑎 (the anomaly distribution) • Find: • The data points in the training data that belong to 𝐷𝐷𝑎𝑎 • Use Cases: • Data cleaning • Fraud detection, Insider Threat detection • These two cases can be combined • Contaminated training data + Separate contaminated test data Contaminated Training Data
  • 10. BigML, Inc #DutchMLSchool Four Basic Methods for Anomaly Detection with Engineered Features 10
  • 11. BigML, Inc #DutchMLSchool 11 •Distance-Based Methods •Anomaly score 𝐴𝐴 𝑥𝑥𝑞𝑞 = min 𝑥𝑥∈𝐷𝐷 𝑥𝑥𝑞𝑞 − 𝑥𝑥 •Density Estimation Methods •Surprise: 𝐴𝐴 𝑥𝑥𝑞𝑞 = − log 𝑃𝑃𝐷𝐷(𝑥𝑥𝑞𝑞) •Model the joint distribution 𝑃𝑃𝐷𝐷(𝑥𝑥) of the input data points 𝑥𝑥1, … ∈ 𝐷𝐷 Theoretical Approaches to Anomaly Detection •Quantile Methods •Find a smooth function 𝑓𝑓 such that 𝑥𝑥: 𝑓𝑓 𝑥𝑥 ≥ 0 contains 1 − 𝛼𝛼 of the training data •Anomaly score 𝐴𝐴 𝑥𝑥 = −𝑓𝑓(𝑥𝑥) •Reconstruction Methods •Train an auto-encoder: 𝑥𝑥 ≈ 𝐷𝐷 𝐸𝐸 𝑥𝑥 , where 𝐸𝐸 is the encoder and 𝐷𝐷 is the decoder •Anomaly score 𝐴𝐴 𝑥𝑥𝑞𝑞 = 𝑥𝑥𝑞𝑞 − 𝐷𝐷 𝐸𝐸 𝑥𝑥𝑞𝑞
  • 12. BigML, Inc #DutchMLSchool 12 •Define a distance 𝑑𝑑(𝑥𝑥𝑖𝑖, 𝑥𝑥𝑗𝑗) • 𝐴𝐴 𝑥𝑥𝑞𝑞 = min 𝑥𝑥∈𝐷𝐷 𝑑𝑑(𝑥𝑥𝑞𝑞, 𝑥𝑥) •Requires a good distance metric Approach 1: Distance-Based Methods 𝑥𝑥𝑞𝑞 𝑥𝑥𝑞𝑞
  • 13. BigML, Inc #DutchMLSchool 13 • Approximates L1 (Manhattan) Distance • (Guha, et al., ICML 2016) • Construct a fully random binary tree • choose attribute 𝑗𝑗 at random • choose splitting threshold 𝜃𝜃 uniformly from min 𝑥𝑥⋅𝑗𝑗 , max 𝑥𝑥⋅𝑗𝑗 • until every data point is in its own leaf • let 𝑑𝑑(𝑥𝑥𝑖𝑖) be the depth of point 𝑥𝑥𝑖𝑖 • repeat 𝐿𝐿 times • let ̅ 𝑑𝑑(𝑥𝑥𝑖𝑖) be the average depth of 𝑥𝑥𝑖𝑖 • 𝐴𝐴 𝑥𝑥𝑖𝑖 = 2 − � 𝑑𝑑 𝑥𝑥𝑖𝑖 𝑟𝑟 𝑥𝑥𝑖𝑖 • 𝑟𝑟(𝑥𝑥𝑖𝑖) is the expected depth Isolation Forest [Liu, Ting, Zhou, 2011] 𝑥𝑥⋅𝑗𝑗 𝑥𝑥⋅𝑗𝑗 > 𝜃𝜃 𝑥𝑥⋅2 > 𝜃𝜃2 𝑥𝑥⋅8 > 𝜃𝜃3 𝑥𝑥⋅3 > 𝜃𝜃4 𝑥𝑥⋅1 > 𝜃𝜃5 𝑥𝑥𝑖𝑖
  • 14. BigML, Inc #DutchMLSchool 14 • Given a data set 𝑥𝑥1, … , 𝑥𝑥𝑁𝑁 where 𝑥𝑥𝑖𝑖 ∈ ℝ𝑑𝑑 • We assume the data have been drawn iid from an unknown probability density: 𝑥𝑥𝑖𝑖 ∼ 𝑃𝑃 𝑥𝑥𝑖𝑖 • Goal: Estimate 𝑃𝑃 • Anomaly Score: 𝐴𝐴 𝑥𝑥𝑞𝑞 = − log 𝑃𝑃 𝑥𝑥𝑞𝑞 • “surprisal” from information theory • Why density estimation? • Gives a more global view by combining distances to all data points Approach 2: Density Estimation
  • 15. BigML, Inc #DutchMLSchool 15 •Introduce sparse random projections Π𝑙𝑙 into 1- dimensional space •Fit a density estimator 𝑃𝑃𝑙𝑙 Π𝑙𝑙 𝑥𝑥 in each 1-d space • 𝐴𝐴 𝑥𝑥 = 1 𝐿𝐿 ∑𝑙𝑙=1 𝐿𝐿 − log 𝑃𝑃𝑙𝑙 Π𝑙𝑙 𝑥𝑥𝑞𝑞 Example: LODA (Pevny, 2015)
  • 16. BigML, Inc #DutchMLSchool 16 • Vapnik’s principle: We only need to estimate the “decision boundary” between nominal and anomalous • Surround the data by a function 𝑓𝑓 that captures 1 − 𝜖𝜖 of the training data • One-Class Support Vector Machine (OCSVM) • 𝑓𝑓 is a hyperplane in “kernel space” • Support Vector Data Description (SVDD) • 𝑓𝑓 is a sphere is “kernel space” • Issue • Need to choose 𝜖𝜖 at learning time rather than run time Approach 3: Quantile Methods
  • 17. BigML, Inc #DutchMLSchool 17 • NavLab self-driving van (Pomerleau, 1992) • Primary head: Predict steering angle from input image • Secondary head: Predict the input image (“auto-encoder”) • 𝐴𝐴 𝑥𝑥𝑞𝑞 = 𝑥𝑥𝑞𝑞 − � 𝑥𝑥𝑞𝑞 • If reconstruction is poor, this suggests that the steering angle should not be trusted • Principle: Anomaly Detection through Failure • Define a task on which the learned system should fail for anomalies Approach 4: Reconstruction Methods Pomerleau, NIPS 1992
  • 18. BigML, Inc #DutchMLSchool 18 • NASA Mars Science Laboratory ChemCam instrument • Collects 6144 spectral bands on rock samples from 7m distance using laser stimulation • Goal: active learning to find interesting spectra • DEMUD • Incremental PCA applied to samples one at a time • Fit only to the samples labeled as “uninteresting” by the user • Show the user the most un-uninteresting sample (sample with highest PCA reconstruction error) • Rapidly discovers interesting samples • Wagstaff, et al. (2013) Application: Finding Unusual Chemical Spectra
  • 19. BigML, Inc #DutchMLSchool 19 • Distance-Based Methods • k-NN: Mean distance to 𝑘𝑘-nearest neighbors • LOF: Local Outlier Factor (Breunig, et al., 2000) • ABOD: kNN Angle-Based Outlier Detector (Kriegel, et al., 2008) • IFOR: Isolation Forest (Liu, et al., 2008) • Density-Based Approaches • RKDE: Robust Kernel Density Estimation (Kim & Scott, 2008) • EGMM: Ensemble Gaussian Mixture Model (our group) • LODA: Lightweight Online Detector of Anomalies (Pevny, 2016) • Quantile-Based Methods • OCSVM: One-class SVM (Schoelkopf, et al., 1999) • SVDD: Support Vector Data Description (Tax & Duin, 2004) Benchmarking Study [Andrew Emmott, 2015, 2020]
  • 20. BigML, Inc #DutchMLSchool 20 • Select 19 data sets from UC Irvine repository • Choose one or more classes to be “anomalies”; the rest are “nominals” • Manipulate • Relative frequency • Point difficulty • Irrelevant features • Clusteredness • 20 replicates of each configuration • Result: 11,888 Non-trivial Benchmark Datasets Benchmarking Methodology
  • 21. BigML, Inc #DutchMLSchool 21 • Linear ANOVA • log 𝐴𝐴𝐴𝐴𝐴𝐴 1 −𝐴𝐴𝐴𝐴𝐴𝐴 ~ 𝑟𝑟𝑟𝑟 + 𝑝𝑝𝑝𝑝 + 𝑐𝑐𝑐𝑐 + 𝑖𝑖𝑖𝑖 + 𝑝𝑝𝑠𝑠𝑠𝑠𝑠𝑠 + 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 • rf: relative frequency • pd: point difficulty • cl: normalized clusteredness • ir: irrelevant features • pset: “Parent” set • algo: anomaly detection algorithm • Assess the algo effect while controlling for all other factors • 𝐴𝐴𝐴𝐴𝐴𝐴: area under the ROC curve for the nominal vs. anomaly binary decision Analysis of Variance
  • 22. BigML, Inc #DutchMLSchool 22 • 19 UCI Datasets • 9 Leading “feature-based” algorithms • 11,888 non-trivial benchmark datasets • Mean AUC effect for “nominal” vs. “anomaly” decisions • Controlling for • Parent data set • Difficulty of individual queries • Fraction of anomalies • Irrelevant features • Clusteredness of anomalies • Baseline method: Distance to nominal mean (“tmd”) • Best methods: K-nearest neighbors and Isolation Forest • Worst methods: Kernel-based OCSVM and SVDD Benchmarking Study Results 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 knn iforest egmm rkde lof abod loda svdd tmd ocsvm Mean AUC Effect
  • 23. BigML, Inc #DutchMLSchool 23 • Show top-ranked candidate to the user • User labels candidate • Label is used to update the anomaly detector • Two methods • AAD [Das, et al, ICDM 2016] • GLAD-OMD (modified version of iForest) [Siddiqui, et al., KDD 2018] Incorporating User Feedback: Initial Work Data Anomaly Detection Best Candidate User Anomaly Analysis yes no
  • 24. BigML, Inc #DutchMLSchool 24 User Feedback Yields Big Improvements in Anomaly Discovery APT Engagement 3 Results
  • 25. BigML, Inc #DutchMLSchool Deep Versions of the Four Basic Methods 25
  • 26. BigML, Inc #DutchMLSchool 26 • Input image 𝑥𝑥 • Network backbone, also called the “encoder”: 𝑧𝑧 = 𝐸𝐸 𝑥𝑥 • Latent representation 𝑧𝑧 • “Logits” ℓ𝑘𝑘 = 𝑤𝑤𝑘𝑘 ⋅ 𝑧𝑧 • Predicted probabilities ̂ 𝑝𝑝 𝑦𝑦 = 𝑘𝑘 𝑥𝑥 = exp ℓ𝑘𝑘(𝑧𝑧) ∑𝑘𝑘′ exp ℓ𝑘𝑘′(𝑧𝑧) Deep Anomaly Detection in Image Classification Convolutional Neural Network Classifier Image 𝑥𝑥 Penultimate Layer 𝑧𝑧 Logits ℓ𝑘𝑘 = 𝑤𝑤𝑘𝑘 ⊤ 𝑧𝑧 Probabilities � 𝑝𝑝(𝑦𝑦 = 𝑘𝑘|𝑥𝑥) ̂ 𝑝𝑝(𝑦𝑦 = 𝑘𝑘|𝑥𝑥) “Backbone” encoder 𝐸𝐸
  • 27. BigML, Inc #DutchMLSchool 27 •K-nearest neighbor in the latent space •Issue: What distance metric to use? •Cosine distance is the most popular: 𝑑𝑑 𝑧𝑧1, 𝑧𝑧2 = 𝑧𝑧1 ⋅ 𝑧𝑧2 𝑧𝑧1 ‖𝑧𝑧2‖ Distance-Based Methods
  • 28. BigML, Inc #DutchMLSchool 28 •Mahalanobis Method • Fit a joint multivariate Gaussian • Each class 𝑘𝑘 has its own mean 𝜇𝜇𝑘𝑘 • Shared covariance matrix Σ •Given a new 𝑥𝑥, log 𝑃𝑃(𝑥𝑥) ∝ min 𝑘𝑘 𝑥𝑥 − 𝜇𝜇𝑘𝑘 ⊤ Σ−1 𝑥𝑥 − 𝜇𝜇𝑘𝑘 This is known as the squared Mahalanobis distance Density-Based Methods
  • 29. BigML, Inc #DutchMLSchool 29 • Residual Flow Deep Density Estimator • (Chen, Behrmann, Duvenaud, et al. NeurIPS 2019) • Standard Cross-Entropy Supervised Loss • Claim: This helps focus 𝑃𝑃 𝑥𝑥 on relevant aspects of the images • Anomaly Score: 𝐴𝐴 𝑥𝑥𝑞𝑞 = − log 𝑃𝑃(𝑥𝑥𝑞𝑞) Open Hybrid: Classification + Density Estimation (Tack, Li, Guo, Guo, 2020)
  • 30. BigML, Inc #DutchMLSchool 30 • The method is somewhat tricky to work with • Set 𝑐𝑐 as the mean of a small set of points passed through the untrained network • No bias weights • These help prevent “hypersphere collapse” Quantile Method: Deep SVDD (Ruff, et al. ICML 2018)
  • 31. BigML, Inc #DutchMLSchool 31 • Encoder: 𝑧𝑧 = 𝐸𝐸 𝑥𝑥 • Decoder: � 𝑥𝑥 = 𝐷𝐷(𝑧𝑧) • Challenge: How to constrain 𝐸𝐸 and 𝐷𝐷 so that the autoencoder fails on anomalies but succeeds on nominal images? • Autoencoders often learn general- purpose image compression methods Reconstruction Methods: Deep Autoencoders 𝑥𝑥 𝑧𝑧 � 𝑥𝑥 𝐸𝐸 𝐷𝐷
  • 32. BigML, Inc #DutchMLSchool Classifier-Based Anomaly Detection using the Max Logit Score 32
  • 33. BigML, Inc #DutchMLSchool 33 •Garrepalli (2020) • Train classifier to optimize softmax likelihood (minimize “cross-entropy loss”) • Maximum logit score is better than two distance methods: • Isolation Forest • LOF (a nearest-neighbor method) Surprise: The Max Logit Score 0.68 0.67 0.63 0.72 0.51 0.44 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 H (y|x) Max SoftMax- prob. Max BCE-prob Max-logit Iforest LOF AUROC Anomaly Measures on Latent Representations for CIFAR-100
  • 34. BigML, Inc #DutchMLSchool 34 • Vaze, Han, Vedaldi, Zisserman (2021): “Open Set Recognition: A Good Classifier is All You Need” (ICLR 2022; arXiv 2110.06207) • Carefully train a classifier using the latest tricks • Standard cross-entropy combined with the following: • Cosine learning rate schedule • Learning rate warmup • RandAugment augmentations • Label Smoothing • Anomaly score: max logit • − max 𝑘𝑘 ℓ𝑘𝑘 More Evidence for Max Logit Protocol from Lawrence Neal et al. (2018)
  • 35. BigML, Inc #DutchMLSchool 35 •Novel class difficulty based on semantic distance • CUB: Bird species • Air: Aircraft • ImageNet Still More Evidence for Max Logit
  • 36. BigML, Inc #DutchMLSchool 36 Why? Let’s Examine the Learned Representations
  • 37. BigML, Inc #DutchMLSchool 37 • DenseNet with 384-dimensional latent space. • CIFAR-10: 6 known classes, 4 novel classes • UMAP visualization • Light green: novel classes • Darker greens: known classes • Note that many novel classes stay toward the center of the space; others overlap with known classes • Training was not required to “pull them out” so that they could be discriminated How are open set images represented by deep learning? Alex Guyer 6 Known Classes 4 Novel Classes
  • 38. BigML, Inc #DutchMLSchool 38 Similar Results from Other Groups [Tack, et al. NeurIPS 2020] [Vaze, et al. arXiv 2110.06207]
  • 39. BigML, Inc #DutchMLSchool 39 • Convolutional neural network learns “features” that detect image patches relevant to the classification task • The logit layer weights these features to make the classification decision • Novel classes activate fewer of these features, so their activation vectors are smaller • Hypothesis: The networks don’t detect that an elephant is novel because of trunk and tusks but because its head doesn’t activate known features The Familiarity Hypothesis The network doesn’t detect novelty, it detects the absence of familiarity
  • 40. BigML, Inc #DutchMLSchool 40 Novel images strongly activate fewer features • CIFAR 10: 6 known classes; 4 novel classes • DenseNet (𝑧𝑧 has 324 dimensions) • Activation threshold 𝜃𝜃 • Count number of features whose activation exceeds 𝜃𝜃 • OOD images activate fewer features Evidence: Number of Activated Features Alex Guyer (unpublished)
  • 41. BigML, Inc #DutchMLSchool 41 Are they features “on” the object vs. the background? • Strategy: blur the object and see how the feature activations change • activations that change must be on the object • Details: • PASCAL VOC Segmented Images • Blur the original image (31x31 kernel; sd=31) • Form composite image where blurred region replaces the segmented region Which features are responsible for the drop in activation? https://www.peko-step.com/en/tool/blur.html
  • 42. BigML, Inc #DutchMLSchool 42 Blurring Examples Note: This does not remove all object-related information (e.g., object boundary), so we don’t detect all on-object features
  • 43. BigML, Inc #DutchMLSchool 43 • “presence feature” • 𝐵𝐵𝐵𝐵 𝑖𝑖, 𝑗𝑗 > 0. Blurring decreases the activity of the feature. Its net effect is to measure the presence of one or more image patterns • Its activity is high when those patterns are present • “absence feature” • 𝐵𝐵𝐵𝐵 𝑖𝑖, 𝑗𝑗 < 0. Blurring increases the activity of the feature. Its net effect is to measure the absence of one or more image patterns • Its activity is high when those patterns are absent • Define the “blurring effect” of feature 𝑗𝑗 on image 𝑖𝑖 𝐵𝐵𝐵𝐵 𝑖𝑖, 𝑗𝑗 = 𝑧𝑧𝑖𝑖𝑖𝑖 − ̃ 𝑧𝑧𝑖𝑖𝑖𝑖 where • 𝑧𝑧𝑖𝑖𝑖𝑖 is the activation of latent feature 𝑗𝑗 on image 𝑖𝑖 • ̃ 𝑧𝑧𝑖𝑖𝑖𝑖 is the activation of latent feature 𝑗𝑗 on blurred image 𝑖𝑖 Blurring Effect
  • 44. BigML, Inc #DutchMLSchool 44 •On average, the activation of a feature changes when the object (of class 𝑘𝑘) is blurred 𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 = 1 𝑁𝑁𝑘𝑘 � 𝑖𝑖:𝑦𝑦𝑖𝑖=𝑘𝑘 𝑧𝑧𝑖𝑖𝑖𝑖𝑖𝑖 − ̃ 𝑧𝑧𝑖𝑖𝑖𝑖𝑖𝑖 •Feature 𝑗𝑗 is a net presence feature for class 𝑘𝑘 if 𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 > 0.02 •Feature 𝑗𝑗 is a net absence feature for class 𝑘𝑘 if 𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 < −0.02 •Otherwise 𝑗𝑗 is net neutral for class 𝑘𝑘 “On Object” score of feature 𝑗𝑗 for class 𝑘𝑘
  • 45. BigML, Inc #DutchMLSchool 45 • Logit score is ℓ𝑗𝑗𝑗𝑗 = ∑𝑗𝑗 𝑤𝑤𝑗𝑗𝑗𝑗𝑧𝑧𝑖𝑖𝑖𝑖 • Contribution of 𝑗𝑗 in image 𝑖𝑖 to class 𝑘𝑘: • 𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖 = 𝑤𝑤𝑗𝑗𝑗𝑗𝑧𝑧𝑖𝑖𝑖𝑖 (in normal images) • ̃ 𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖 = 𝑤𝑤𝑗𝑗𝑗𝑗 ̃ 𝑧𝑧𝑖𝑖𝑖𝑖 (in blurred images) • Mean contribution • ̅ 𝑐𝑐𝑗𝑗𝑗𝑗 = 1 𝑁𝑁𝑘𝑘 ∑ 𝑖𝑖 𝑦𝑦𝑖𝑖 = 𝑘𝑘 𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖 • ̅̃ 𝑐𝑐𝑗𝑗𝑗𝑗 = 1 𝑁𝑁𝑘𝑘 ∑ 𝑖𝑖 𝑦𝑦𝑖𝑖 = 𝑘𝑘 ̃ 𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖 Feature Taxonomy 𝒘𝒘𝒋𝒋𝒋𝒋 > 𝟎𝟎 𝒘𝒘𝒋𝒋𝒋𝒋 < 𝟎𝟎 𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 > 0.02 positive presence negative presence 𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 < 0.02 positive absence negative absence Sun & Li: On the Effectiveness of Sparsification for Detecting the Deep Unknowns. arXiv 2111.09805
  • 46. BigML, Inc #DutchMLSchool 46 Mean feature types for class 3 1.00 0.00 On-Object Index (presence) On-Object Index (absence) positive features negative features red = presence blue = absence
  • 47. BigML, Inc #DutchMLSchool 47 Zoomed View: Blurring reduces ̅ 𝑐𝑐𝑗𝑗𝑗𝑗 Mean unblurred contribution Mean blurred contribution • Blurring… • reduces the contribution of positive presence features (red dots) • reduces the contribution of negative absence features (blue dots) 1.00 0.00 On-Object Index (presence) On-Object Index (absence)
  • 48. BigML, Inc #DutchMLSchool 48 Decomposing the Logit Score: Four Cases Positive presence: 𝑤𝑤𝑗𝑗𝑗𝑗 > 0 and 𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 > 0 Positive absence: 𝑤𝑤𝑗𝑗𝑗𝑗 > 0 and 𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 < 0 Negative presence: 𝑤𝑤𝑗𝑗𝑗𝑗 > 0 and 𝑂𝑂𝑂𝑂(𝑗𝑗, 𝑘𝑘) > 0 Negative absence: 𝑤𝑤𝑗𝑗𝑗𝑗 < 0 and 𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 < 0
  • 49. BigML, Inc #DutchMLSchool 49 Visualizing Individual Images: OOD Instance 838
  • 50. BigML, Inc #DutchMLSchool 50 OOD Instance 770
  • 51. BigML, Inc #DutchMLSchool 51 OOD Instance 432
  • 52. BigML, Inc #DutchMLSchool 52 • Note that the Positive Presence features dominate the max logit score • The Negative Absence and Positive Absence features (purple and blue lines) make a small contribution • Negative Presence features make no contribution • Conclusion: Decreases in activations of positive presence account for most of the max logit score Decomposing the Novelty Scores
  • 53. BigML, Inc #DutchMLSchool 53 •Red line: trend for Positive Presence contribution to max logit score •Black line: smooth estimate of classification accuracy (“known” vs “novel”) Decreases in Positive Presence Features Account for Novelty Detection Accuracy
  • 54. BigML, Inc #DutchMLSchool 54 •Blakemore, Colin, and Grahame F. Cooper. “Development of the brain depends on the visual environment.” (1970): 477-478. • Kittens raised in environments with only horizontal or only vertical lines • “They were virtually blind for contours perpendicular to the orientation they had experienced.” •Chomsky: “Poverty of the stimulus” Can we expect computer vision systems to perceive things they have not been trained on? Source: Li Yang Ku https://computervisionblog.wordpress.com/2013/06/01/ca ts-and-vision-is-vision-acquired-or-innate/
  • 55. BigML, Inc #DutchMLSchool 55 • Familiarity-based anomaly detection advantages: • Easy to implement – Anomaly signal (max logit) can be extracted from the classifier. No separate anomaly detection model is needed • Training on additional, auxiliary classes improves both classification and anomaly detection performance • Familiarity-based anomaly detection weaknesses • Partially-occluded nominal objects will be flagged as anomalies • If an image contains both a novel object and a known object, the novel object will not be detected • Adversarial attacks can easily cause false anomalies and missed anomalies Implications
  • 57. BigML, Inc #DutchMLSchool 57 • Can we learn deep representations that can represent outliers? • Nonstationarity • As the world changes, the anomaly detection model must also change • Explanation • Users often want explanations of why something is labeled as anomalous in order to provide feedback or take other actions • Setting alarm thresholds • How can we set a threshold to control the false alarm and missed alarm rates? • Incremental (continual) learning in deep networks • How can we efficiently update a trained neural network to incorporate user feedback? • Anomaly detection in temporal, spatial, and spatio-temporal data, in video data, etc. • Anomaly detection at multiple scales Challenges for Anomaly Detection
  • 59. BigML, Inc #DutchMLSchool • Four Basic Methods • Distances, densities, density quantiles, and reconstruction • Distances work best; Isolation Forest is very robust • Anomaly Detection in Deep Learning • The four basic methods have been extended to deep learning • They often do not work well when applied to learned representations • Classifier Max Logit Score Gives Very Competitive Performance • Computed as a side effect of standard deep classifiers • Measures familiarity rather than novelty, which makes it risky in many settings • Advances in Deep Anomaly Detection Require Learning Better Representations Shallow and Deep Methods for Anomaly Detection 59