The document describes the Stochastic Outlier Selection (SOS) algorithm. SOS is an unsupervised outlier selection method that computes outlier probabilities based on the concept of affinity between data points. It takes a data point matrix as input and outputs outlier probabilities for each point. SOS works by calculating the affinity each data point has with other points based on their distances. A point is more likely to be an outlier if it has low overall affinity with other points. The algorithm has one parameter, perplexity, which controls the number of neighbors considered in affinity calculations. SOS is inspired by the t-SNE dimensionality reduction technique and can identify outliers without supervision.
18. Anomalies and outliers
. . . . . . . . . . . .
Stochastic Outlier Selection
. . . . . . . . . . . . . . . . . . . . . . .
Euler diagram
Set legend
Experiments and results
. . . . . . . . . . .
Set notation
real-world observations
labeled by expert as anomalous
labeled by expert as normal
A
X ∩A
.
X
(a)
(b)
X
A
Outlier Selection and One-Class Classification
X
Jeroen Janssens
19. Anomalies and outliers
. . . . . . . . . . . .
Stochastic Outlier Selection
. . . . . . . . . . . . . . . . . . . . . . .
Euler diagram
Set legend
Experiments and results
. . . . . . . . . . .
Set notation
real-world observations
A
X
X
(c)
labeled by expert as anomalous
labeled by expert as normal
A
X ∩A
data points
unrecorded
D
X ∩D
.
X
(a)
(b)
X
A
Outlier Selection and One-Class Classification
D
Jeroen Janssens
20. Anomalies and outliers
. . . . . . . . . . . .
Stochastic Outlier Selection
. . . . . . . . . . . . . . . . . . . . . . .
Euler diagram
Set legend
X
(d)
A
.
Outlier Selection and One-Class Classification
D
anomalies represented as data points
normalities represented as data points
Experiments and results
. . . . . . . . . . .
Set notation
CA = D ∩ A
CN = D ∩ A
Jeroen Janssens
21. Anomalies and outliers
. . . . . . . . . . . .
Stochastic Outlier Selection
. . . . . . . . . . . . . . . . . . . . . . .
Euler diagram
Set legend
X
(d)
A
.
D
X
(e)
A
CO D
Outlier Selection and One-Class Classification
Experiments and results
. . . . . . . . . . .
Set notation
anomalies represented as data points
normalities represented as data points
CA = D ∩ A
CN = D ∩ A
classified by algorithm as an outlier
classified by algorithm as an inlier
CO
CI = D ∩ CO
Jeroen Janssens
22. Anomalies and outliers
. . . . . . . . . . . .
Stochastic Outlier Selection
. . . . . . . . . . . . . . . . . . . . . . .
Euler diagram
Set legend
X
(d)
A
.
D
X
(e)
CO D
A
X
(f)
A
CO D
Outlier Selection and One-Class Classification
Experiments and results
. . . . . . . . . . .
Set notation
anomalies represented as data points
normalities represented as data points
CA = D ∩ A
CN = D ∩ A
classified by algorithm as an outlier
classified by algorithm as an inlier
CO
CI = D ∩ CO
hits
false alarms
misses
correct rejects
H = CA ∩ CO
FA = CN ∩ CO
M= CA ∩ CI
CR = CN ∩ CI
Jeroen Janssens
23. Anomalies and outliers
. . . . . . . . . . . .
Stochastic Outlier Selection
. . . . . . . . . . . . . . . . . . . . . . .
Experiments and results
. . . . . . . . . . .
Confusion matrix
Expert labels the observation as a(n)
Outlier Selection and One-Class Classification
Outlier (CO )
Normality (CN )
hit
.
Hi
false alarm
.
FA
.
Inlier (CI )
Algorithm classi es the data point as an
Anomaly (CA )
miss
.
Mi
correct. reject
CR
Jeroen Janssens
24. Labels by the expert
Data set
. .anomaly .normality
.
. .data point
.
.
.
.
Classi cations by the algorithm
. .inlier .outlier
.
.
Outcome
. .hit .false alarm .miss .correct reject
.
.
.
.
27. Anomalies and outliers
. . . . . . . . . . . .
Stochastic Outlier Selection
. . . . . . . . . . . . . . . . . . . . . . .
Experiments and results
. . . . . . . . . . .
A data point is selected as an outlier when all the other data points
have insufficient affinity with it.
Outlier Selection and One-Class Classification
Jeroen Janssens
60. Stochastic Outlier Selection
. . . . . . . . . . . . . . . . . . . . . . .
. LSOD
. . . ..
...
.
.
. . . ..
..
.. .
.. .
.
. .
. . ..
.
.
. .. .
.
. .
.
. ..
. ..
. ..
..
.. .
.
.
. ..
.
..
.
.
.
.
..
0.5
.
Eco Bre Del Wa Win Co Bio Veh Gla Bos He Arrh Ha Bre Liv SPE Hep
b
e
ast ft P ve
t a
M
a
s
e lo
li
C
a
W. um form Recon Ge ed icle S s Iden on Hort Dis ythm erma st W. r Diso TF H titis
ilho ti
Ori p
usi ease ia n’s S New rde ear
Ge gnit ne
gin
uet cat ng
rs t
ur v
ner ion
al
tes ion
iva
ato
l
r
Outlier Selection and One-Class Classification
. .
..
.
.
Iris
.
0.
. LOCI .
.
.
.. .
.
.
..
.
0.6
. LOF .
.
.
.. .
.. . .
.
.. . .
.
. . ..
.
... . .
.
...
.. . .
. ..
.
.. . .
0.7
Experiments and results
. . . . . . . . . . .
. KNNDD .
.
weighted AUC
. SOS .
.
. ..
.
0.8
.
. .. . .
.. .
0.9
.
. .. .
1
....
Real-world datasets
. .. ..
Anomalies and outliers
. . . . . . . . . . . .
Jeroen Janssens
61. Anomalies and outliers
. . . . . . . . . . . .
Stochastic Outlier Selection
. . . . . . . . . . . . . . . . . . . . . . .
Experiments and results
. . . . . . . . . . .
Synthetic datasets
Parameter λ
Data set
Determines
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Radius of ring
Cardinality of cluster and ring
Distance between clusters
Cardinality of one cluster and ring
Density of one cluster and ring
Radius of square ring
Radius of square ring
Outlier Selection and One-Class Classification
λ start
step size
λ end
5
100
4
0
1
2
2
0.1
5
0.1
5
0.05
0.05
0.05
2
5
0
45
0
0.8
0.8
Jeroen Janssens