SlideShare una empresa de Scribd logo
1 de 99
Anomaly Detection using
Deep One-Class Classifier
Proceedings of the 35th International Conference on Machine
Learning, Stockholm, Sweden, PMLR 80, 2018
Anomaly Detection and Localization
Using GAN and One-Class Classifier
Satellite Image Forgery Detection and Localization
Using GAN and One-Class Classifier
https://arxiv.org/abs/1802.04881
Previous
Approach I
Anomaly Detection
• 정상치에서 벗어난 관측치들을 detect  One-class classification
혹은 one-class description
여기서는
• Generative adversarial network 또는 Auto-encoder를 이용하여 정상
image에 대한 feature를 mapping한 후 one-class support vector
machine (SVM)으로 분포를 결정. Query image에 대하여 결정된 분
포내에 존재하는지 여부 확인
Problem formulation
• 학습된 image외에 unseen or unfamiliar object가 발견될 경우, 그
림과 같이 binary mask로 영역을 표시
Trained Image Trained Image mask
Query Image w/
unfamiliar object
Query Image mask w/
unfamiliar object
Method
𝐴 𝑒
X
h 𝐴 𝑑
𝑋
X
min
𝐺
max
𝐷
𝑉(𝐷, 𝐺) = 𝐸 𝑋~𝑝 𝑑𝑎𝑡𝑎
log 𝐷 𝑋 + log(1 − 𝐷 𝐺 𝑋 )
𝑋 = 𝐺 𝑋 = 𝐴 𝑑 ℎ = 𝐴 𝑑 𝐴 𝑒(𝑋)
• Auto-encoder를 이용하여 image로부터 feature(h) 구하고 이를 다시
복원. 복원된 image와 원 image를 이용하여 GAN을 훈련  Auto-
encoder 보다 약간의 성능향상
• 정상 image에 대한 latent space의 distribution을 찾아 냄.
Method
One class Classifier
Method
Normal image의 cluster
Abnormal image의 features
- Training된 Auto-encode의 Encoder에 Query image를 입력하여
latent vector를 계산
- 계산 된 latent vector가 정상 image의 cluster내에 포함되는지 여부 판단
 여기서는 RADIAL BASES FUNCTIONS(Gauss Kernel,
Parametric modeling of Cluster) 을 사용한 One class SVM을 사용
Features from normal patches(i.e., red dots) cluster together, whereas
features from abnormal patches (i.e., blue dots) are more distant.
we solve the problem of classifying nonlinearly separable pattern in a hybrid
manner involving two stages:
• First: Transform a given set of nonlinearly separable patterns into a new
set for which, under certain conditions, the likelihood of the transformed
patterns becoming linearly separable is high.
• Second: the solution of the classification problem is completed by using
Stochastic Gradient Descent.
Non-linear SVM Classifier
using the RBF(Radial-basis function) kernel
We find w and b by solving the following objective function using Quadratic
Programming.
To define an optimal hyperplane we need to maximize the width of the
margin(w).
Linear SVM(Support Vector Machines)
Support vector
• The simplest way to separate two groups of data is with a straight line (1
dimension), flat plane (2 dimensions) or an N-dimensional hyperplane.
• However, there are situations where a nonlinear region can separate the
groups more efficiently.
• The kernel function transform the data into a higher dimensional feature
space to make it possible to perform the linear separation.
Non-Linear SVM(Support Vector Machines)
kernel trick
To Map from input space to feature space to simplify classification task
Non-linear SVM Classifier using the RBF(Radial-basis function) kernel is
adopted
Non-Linear SVM(Support Vector Machines)
Feature space에서의 inner product(a measure of similarity)
Key Idea of Kernel Methods
K(𝑥𝑖, 𝑥𝑗)
K(𝑥𝑖, 𝑥𝑗) = Φ(𝑥𝑖)· Φ(𝑥𝑗)
Normal Condition :
Cluster bound :
exp{−
[ 𝑥1−𝑐1
2+ 𝑥2−𝑐2
2]
2𝜎2 } ≥ {0<Threshold<<1}
𝑥1 − 𝑐1
2
+ 𝑥2 − 𝑐2
2
≤ r2
𝐾1 + 𝐾2 ≤ r2
x1
x2
.(c1,c2)
r
K1
K2
r2
r2
Key Idea of Kernel Methods
RBFN architecture
Σ
Input layer
Hidden layer
(RBFs)
Output layer
W1 W2 WM
x1 x2 xn
No weight
f(x)
Each of n components of
the input vector x feeds
forward to m basis
functions whose outputs
are linearly combined with
weights w (i.e. dot product
x∙w) into the network
output f(x).
The output layer performs a simple weighted sum (i.e. w ∙x).
If the RBFN is used for regression then this output is fine.
However, if pattern classification is required, then a hard-
limiter or sigmoid function could be placed on the output
neurons to give 0/1 output values
Input data set ∶ 𝑋 = { 𝑥1 𝑥2 … 𝑥 𝑁}
Σ
RBFN architecture
RBFN architecture
 For Gaussian basis functions
 s x w w x c
w w
x c
p i i p i
i
M
i
pj ij
ijj
n
i
M
( )
exp
( )
  
  
  










0
1
0
2
2
11 2


 Assume the variance  across each dimension are
equal
s x w w x cp i
i
pj ij
j
n
i
M
( ) exp ( )

   






0 2
2
11
1
2
→ → →
→
Σ Σ
Category 1 Category 2
Category 1
Category 2
RBFN for classification
RBFN Learning
• Design decision
• number of hidden neurons
• max of neurons = number of input patterns
• more neurons – more complex, smaller tolerance
• Parameters to be learnt
• centers
• radii
• A hidden neuron is more sensitive to data points near its center.
This sensitivity may be tuned by adjusting the radius.
• smaller radius  fits training data better (overfitting)
• larger radius  less sensitivity, less overfitting, network of
smaller size, faster execution
• weights between hidden and output layers
The question now is:
How to train the RBF network?
In other words, how to find:
 The number and the parameters of hidden units (the basis functions)
using unlabeled data (unsupervised learning).
 K-Mean Clustering Algorithm
 The weights between the hidden layer and the output layer.
 Recursive Least-Squares Estimation Algorithm
RBFN Learning
xp
K-means
K-Nearest
Neighbor
Basis
Functions
Linear
Regression
ci
ci
i
A w
RBFN Learning
 Use the K-mean algorithm to find ci
RBFN Learning
K-mean Algorithm
step1: K initial clusters are chosen randomly from the samples
to form K groups.
step2: Each new sample is added to the group whose mean is
the closest to this sample.
step3: Adjust the mean of the group to take account of the new
points.
step4: Repeat step2 until the distance between the old means
and the new means of all clusters is smaller than a
predefined tolerance.
Outcome: There are K clusters with means representing
the centroid of each clusters.
Advantages: (1) A fast and simple algorithm.
(2) Reduce the effects of noisy samples.
 Use K nearest neighbor rule to find the function
width 
k-th nearest neighbor of ci
 The objective is to cover the training points so that a
smooth fit of the training samples can be achieved
2
1
1


K
k
iki cc
K
→ →
 RBF learning by gradient descent
 Let andi p
pj ij
ijj
n
p p px
x c
e x d x s x( ) exp ( ) ( ) ( )
   
 








 


1
2
2
2
1 
 E e xp
p
N



1
2 1
2
( ) .

we have






E
w
E E
ci ij ij
, , and
Apply
→ → → →
→
N : No. of batch
we have the following update equations
 RBF learning by gradient descent
Gaussian Mixture Models and
Expectation-Maximization
Algorithm
28
Normal Distribution (1D Gaussian)
 
2
2
1
( , ) exp
22
x
f x

 
 
 
  
 
 
,mean
2 ,std
29
 d = 2
 x = random data point (2D vector)
 = mean value (2D vector)
 = covariance matrix (2D matrix)
2D Gaussians
 
   1
1
( , ) exp
22 det( )
T
d
x x
f x
 


   
   
 
  


 The same equation holds for a 3D Gaussian
30
2D Gaussians
 
   1
1
( , ) exp
22 det( )
T
d
x x
f x
 


   
   
 
  


31
Exploring Covariance Matrix
  
2
2
1
( , )
cov( , )1
cov( , )
i i i
N
T w
i i
i h
x random vector w h
w h
x x
N h w

 


 
      
 

 is symmetric
 has eigendecomposition (svd)


 * * T
V D V 

1 2 ... d    
32
Covariance Matrix Geometry

1
2
* *
1*
2*
T
V D V
a v
b v


 


b
a
33
3D Gaussians
  
2
2
1 2
( , , )
cov( , ) cov( , )
1
cov( , ) cov( , )
cov( , ) cov( , )
i
rN
T
i i g
i
b
x r g b
g r b r
x x r g b g
N
r b g b

  



 
 
      
 
 

34
GMMs – Gaussian Mixture Models
W
H
 Suppose we have 1000 data points in 2D space (w,h)
35
W
H
GMMs – Gaussian Mixture Models
 Assume each data point is normally distributed
 Obviously, there are 5 sets of underlying gaussians
36
The GMM assumption
 There are K components (Gaussians)
 Each k is specified with three parameters: weight, mean,
covariance matrix
 The total density function is:
 
   1
1
1
1
1
( ) exp
22 det( )
{ , , }
0 1
T
K
j j j
j d
j
j
K
j j j j
K
j j
j
x x
f x
weight j
 


 
  




   
   
   
  
   


37
The EM algorithm (Dempster, Laird and Rubin, 1977)
Raw data GMMs (K = 6) Total Density Function
i
i
38
EM Basics
 Objective:
Given N data points, find maximum likelihood estimation of :
 Algorithm:
1. Guess initial
2. Perform E step (expectation)
 Based on , associate each data point with specific gaussian
3. Perform M step (maximization)
 Based on data points clustering, maximize
4. Repeat 2-3 until convergence (~tens iterations)

1argmax ( ,..., )Nf x x

  



39
EM Details
 E-Step (estimate probability that point t associated to gaussian j):
 M-Step (estimate new parameters):
,
1
( , )
1,..., 1,...,
( , )
j t j j
t j K
i t i ii
f x
w j K t N
f x
 
 

  

,
1
,1
,1
,1
,1
1
( )( )
N
new
j t j
t
N
t j tnew t
j N
t jt
N new new T
t j t j t jnew t
j N
t jt
w
N
w x
w
w x x
w


 







 
 





40
EM Example
Gaussian j
data point t
blue: wt,j
41
EM Example
42
EM Example
43
EM Example
44
EM Example
45
EM Example
46
EM Example
47
EM Example
RBF networks MLP
Learning speed Very Fast Very Slow
Convergence Almost guarantee Not guarantee
Response time Slow Fast
Memory
requirement
Very large Small
Hardware
implementation
IBM ZISC036
Nestor Ni1000
www-5.ibm.com/fr/cdlab/zisc.html
Voice Direct 364
www.sensoryinc.com
Generalization Usually better Usually poorer
Hyper-parameter ?
Initial values
are given !
Simulation
• The color image under analysis is split into patches (either
overlapping or not) of size 64x64 pixels.
• A adversarially trained auto-encoder encodes the patches into a low
dimensional representation called feature vector h(a 2,048
dimensional vector).
• A one-class SVM fed with h is used to detect forged patches as
anomalies with respect to features distribution learned from normal
patches.
• Once all patches are classified, a label mask for the entire image is
obtained by grouping together all the patch labels.
• Small - Object size is smaller than the patch size (approximately 32
pixel per side).
• Medium - Object size is comparable to patch size (approximately 64
pixel per side).
• Large - Object size is larger than patch size (approximately128 pixel
per side).
Simulation
검출대상물의 크기에 따라 성능평가
Simulation
Query Image I w/
unfamiliar object
Query Image II w/
unfamiliar object
GT mask I GT mask II
Unsupervised Anomaly Detection with
GANs to Guide Marker Discovery
https://arxiv.org/abs/1703.05921
Postech 이도엽씨가 구현한 Tensorflow 코드
https://github.com/LeeDoYup/AnoGAN
Previous
Approach II
이 연구에서는 아래 그림처럼 정상 data만으로 학습시킨 GAN
모델를 이용하여 Query data에 대하여 정상여부는 물론
비정상 시 비정상 영역을 찾아내고자 함.
1. 정상 data를 이용하여 Generator & Discriminator의 훈련
- Deep convolutional generative adversarial network을 이용하여
latent space(z)로 부터 Generator를 이용하여 생성된 image와
Real image를 구별하도록 Discriminator를 훈련
 정상 data의 latent space(z) 분포를 학습
2. 비정상 data여부와 비정상 영역 파악
- 훈련된 Generator & Discriminator의 parameter를 고정한 채
Query image에 대한 latent space(z)로의 mapping 작업을 수행
훈련된 정상 data의 경우, 기학습된 정상 data의 latent space(z) 로
mapping이 되지만, 비정상 data의 경우 벗어남
 cost function의 오차가 발생
Anomaly Detection은 다음과 같이 2단계로 이루어짐
1. GAN을 이용하여 정상 data 모델링하기
: 정상 data의 generative model(distribution)을 GAN을 이용하여 학습
정상 𝑑𝑎𝑡𝑎 𝐼 𝑚, with m = 1,2,.....,M, where 𝐼 𝑚 ∈ 𝑅 𝑎𝑥𝑏
임의의 위치에서 랜덤하게 cxc크기의 K 2-D image
patches를 추출 x = 𝑥 𝑘,𝑚 ∈ ℵ with k = 1,2,……,K.
D and G are simultaneously optimized through the following two-
player minimax game with value function V (G,D)
The discriminator is trained to maximize the probability of assigning
real training examples the “real” and samples from 𝑝 𝑔the “fake” label
2. Query data의 latent space Mapping
Query image x가 주어질 경우, 이와 가장 유사한 가상 image인 G(z) 에
해당하는 latent space상의 점 z을 찾는다.
x 와 G(z)의 유사여부는 query image가 generator의 훈련시 사용된 정상
data의 분포 𝑝 𝑔를 어느 정도 따르느냐에 의해 결정
z을 찾기 위하여 , latent space distribution Z에서 랜덤하게 샘플된 z1 을
기훈련된 generator에 입력하여 얻은 출력 G(z1)와 x의 차(loss ft’n)를 최
소화하도록 backpropagation을 통하여 latent space의 점z2로 update
z
정상 image의 Latent space(z)가 1차원이라고 가정하고
Z은 다음과 같은 분포로 가정하면
𝜇 𝑧
z𝜇 𝑧
Query image에 대한 latent space(z) mapping은
i) 임의의 값 𝑧1에서 시작하여 loss ft’n을 최소화하도록 update
ii) 주어진 Γ번째 iteration 후 𝑧Γ이 allowable range안에 들어왔는지
여부에 때라 정상, 비정상을 구분
𝑧1 𝑧2 𝑧Γ
Allowable range
• Overall loss or Anomaly score:
• Anomaly score consists of two parts:
• Residual Loss - visual similarity
• Discrimination Loss - enforces the generated image to lie on the manifold
Query Image의 Mapping에 대한 Loss function 정의
Improved discrimination loss based on feature matching
• f(.) – output of intermediate layer of the discriminator
• It is some statistics of an input image
This approach utilizes the trained discriminator not as classifier
but as a feature extractor
3. Anomaly Detection
Anomaly score : query image x가 정상 image에 얼마나 부합하는지 여부
R(x) : Γ번의 backpropagation후 Residual loss
D(x) : Γ번의 backpropagation후 Discrimination Loss
비정상 image : A(x) is large
정상 image : A(x) is small
𝑥 𝑅 = 𝑥 − 𝐺 𝑧Γ
Residual error : image내의 비정상 영역을 나타냄
4. Experiments
실험대상은 망막층을 3차원적으로 관측하는 빛간섭단층촬영(OCT) 영상
• Data, Data Selection and Preprocessing
i) Training sets :
- 2D image patches extracted from 270 clinical OCT volumes of healthy subjects
- The gray values were normalized to range from -1 to 1.
- Extracted in total 1,000,000 2D training patches with an image resolution of
64x64 pixels at randomly sampled positions.
ii) Testing sets :
- patches were extracted from 10 additional healthy cases and 10
pathological cases, which contained retinal fluid
- Test set in total consisted of 8,192 image patches and comprised
normal and pathological samples
iii) Model description
- Adopt DCGAN architecture that resulted in stable GAN training on
images of sizes 64x64 pixels.
- Utilized intermediate representations with 512-256-128-64 channels
(instead of 1024-512-256-128)
- Discrimination loss : Feature representations of the last convolution
layer of the discriminator was used
- Training was performed for 20 epochs utilizing Adam optimizer.
- Ran 500 backpropagation steps for the mapping of new images to the
latent space.
- Used λ= 0.1 in loss function
i) Generative capability of the DCGAN
5. Experiments
Given image
Generated image
Residual overlay
Pixel-level annotations
of retinal fluid
Normal image Anomalous image
ii) Detection performance
ROC curves
Distribution of the residual score(c)
and of the discrimination score(d)
Latent space에서 정상 data(trained data 및 test data 중 정상)간의 분포는
유사하나 Test data 중 비정상과는 확실한 차이를 나타냄
Problems in Previous Approach
- Can’t control the shape and boundary of cluster
- Can’t control the ambiguous point at the boundary
 Let’s find a way to control the shape of cluster
and ambiguous point at the boundary
SVDD is the smallest enclosing ball problem and it’s alternatives are
• The minimum enclosing ball problem with errors
• The minimum enclosing ball problem in a RKHS(Repoducing
Kernel Hilbert Spaces)
• The two class Support vector data description (SVDD)
Support Vector Data Description (SVDD)
• One class is the target class, and all other data is outlier data.
• Create a spherically shaped boundary around the complete target set.
• To minimize the chance of accepting outliers, the volume of this description
is minimized.
• Outlier sensitivity can be controlled by changing the ball-shaped boundary
into a more flexible boundary.
• Example outliers can be included into the training procedure to find a more
efficient description.
SOLUTIONS FOR SOLVING DATA DESCRIPTION
1. The minimum enclosing ball problem [Tax and Duin, 2004]
centerRadius, R
2. The minimum enclosing ball problem with errors
- We assume vectors x are column vectors.
- We have a training set {xi }, i = 1, . . , N for which we want to obtain a description.
- We further assume that the data shows variances in all feature directions.
NORMAL DATA DESCRIPTION
• The sphere is characterized by center a and radius R > 0.
• We minimize the volume of the sphere by minimizing R², and demand that
the sphere contains all training objects xi.
• To allow the possibility of outliers in the training set, the distance from xi to
the center a should not be strictly smaller than R², but larger distances should
be penalized.
- Minimization problem:
F(R, a) = R² + C∑ξi
with constraints ||xi − a||² ≤ R² + ξi, ξi ≥ 0
2. The minimum enclosing ball problem with errors
NORMAL DATA DESCRIPTION
Lagrange function :
L(R, a, αi, γi, ξi ) = R² + C∑ξi − ∑αi {R² + ξi − (‖xi‖² − 2a · xi + ‖a‖²)} − ∑γi ξi
L should be minimized with respect to R, c, ξi and maximized
with respect to αi and γi:
} With subject to: 0 ≤ αi ≤ C
2. The minimum enclosing ball problem with errors
2. The minimum enclosing ball problem with errors
NORMAL DATA DESCRIPTION
} Support vectors
There are 3 cases
𝑅2
= 𝑋 𝑏 − 𝑎 2
= 𝑋 𝑏 ⋅ 𝑋 𝑏 - 2 𝑖 𝛼𝑖 (𝑋𝑖 ⋅ 𝑋 𝑏 ) + 𝑖,𝑗 𝛼𝑖 𝛼𝑗 (𝑋𝑖⋅ 𝑋𝑗)
Hypersphere’s center can be determined as
𝑎 =
𝑖
𝛼𝑖 𝑿𝒊
Hypersphere’s radius can be determined by selecting an arbitrary
support vector on the boundary 𝑋 𝑏
TEST A NEW DATA Xk
To test if a new data Xk is within the sphere, the distance to the center
of Sphere has to be calculated. A test data Xk is Normal when this
distance is smaller than radius
||xk − a||² ≤ R2
2. The minimum enclosing ball problem with errors
2. The minimum enclosing ball problem with errors
Please refer to Python Code for SVDD :
https://wikidocs.net/3431
SVDD with negative examples
- When negative examples (objects which should be rejected) are available,
they can be incorporated in the training to improve the description.
- In contrast with the training (target) examples which should be within the
sphere, the negative examples should be outside it.
 Minimization problem:
With constraints:
}
2. The minimum enclosing ball problem with errors
3. The minimum enclosing ball problem in a RKHS
Gaussian kernel:
With subject to: 0 ≤ αi ≤ C
• Minimum enclosing ball problem with errors
• Inner product can be substituted by a general kernel function like
Gaussian kernel
𝑋 𝑘 − 𝑎 2
= K(𝑋 𝑘, 𝑋 𝑘) - 2 𝑖 𝛼𝑖 K(𝑋𝑖, 𝑋 𝑘) + 𝑖,𝑗 𝛼𝑖 𝛼𝑗 K(𝑋𝑖, 𝑋𝑗) ≤ 𝑅2
3. The minimum enclosing ball problem in a RKHS
- For small values of s all objects
become support vectors.
Test object is selected when:
- For very large s the solution
approximates the original
spherically shaped solution.
- Decreasing the parameter C
constraints the values for αi
more, and more objects
become support vectors.
- Also with decreasing C the
error on the target class
increases, but the covered
volume of the data description
decreases.
4. The two class Support vector data description (SVDD)
The two class SVDD vs. one class SVDD
Deep SVDD learns a neural network transformation Ф(· ; W) with weights W
from input space X∈ R
d
to output space F ∈ R
p
that attempts to map most of the
data network representations into a hypersphere characterized by center c
and radius R of minimum volume.
Mappings of normal examples fall within, whereas mappings of anomalies fall
outside the hypersphere.
Deep Support Vector Data Description (Deep SVDD)
Given some training data on X, we define
the soft-boundary Deep SVDD objective as
- First term : minimizing R2 minimizes the volume of the hypersphere.
- Second term is a penalty term for points lying outside the sphere after
being passed through the network, i.e. if its distance to the center
is greater than radius R
- The last term is a regularizer on the network parameters W
Deep Support Vector Data Description (Deep SVDD)
To achieve this the network must extract the common factors of variation
of the data.
As a result, normal examples of the data are closely mapped to center c,
whereas anomalous examples are mapped further away from the center
or outside of the hypersphere.
Through this we obtain a compact description of the normal class.
Anomal data Anomal dataNomal data Nomal data
Deep Support Vector Data Description (Deep SVDD)
One-Class Deep SVDD objective
SVDD simply employs a quadratic loss for penalizing the distance of
every network representation to c
One-Class Deep SVDD contracts the sphere by minimizing the mean
distance of all data representations to the center.
For a given test point x ϵ X,
anomaly score s can be defined for both variants of Deep SVDD by
the distance of the point to the center of the hypersphere
Anomaly Score Anomaly Score
Conventional Approach Deep SVDD
Normal Anomal Normal Anomal
Anomaly Score
distribution
distribution
One-class classification on MNIST and CIFAR-10
Each convolutional module consists of a convolutional layer followed by
leaky ReLU activations and 2x2 max-pooling.
On MNIST, a CNN with two modules, 8x(5x5x1)-filters followed by 4x(5x5x1)-
filters, and a final dense layer of 32 units.
On CIFAR-10, a CNN with three modules, 32x(5x5x3)-filters,
64x(5x5x3)-filters, and 128x(5x5x3)-filters, followed by a final dense layer of
128 units.
a batch size of 200 and set the weight decay hyper-parameter λ = 10-6
Network architectures
Both MNIST and CIFAR-10 have ten different classes from which we
create ten one-class classification setups.
In each setup, one of the classes is the normal class and samples from the
remaining classes are used to represent anomalies.
Only train with training set examples from the respective normal class.
Training set sizes of n≈6,000 for MNIST and n=5,000 for CIFAR-10.
Both test sets have 10,000 samples including samples from the nine
anomalous classes for each setup.
Pre-process all images with global contrast normalization using the L1
norm and finally rescale to [0; 1] via min-max-scaling.
One-class classification on MNIST and CIFAR-10
Data setup
One-class classification on MNIST and CIFAR-10
Average AUCs in % with StdDevs (over 10 seeds) per method and one-class
experiment on MNIST and CIFAR-10
Anomaly Detection using
One-Class Neural Networks
arXiv:1802.06360v1
Code : https://github.com/raghavchalapathy/oc-nn
We wanna make NN like this !
Model architecture of Auto-encoder and
the proposed one-class neural networks
One-Class Support Vector Machine
Objective is to find a Hyper plane and distance from origin, which is
positive on subset A and negative on every thing out side A.
Maximize distance from hyper plane to origin
Subset A
Hypersphere
Hyperplane
𝑟
Negative
𝑤
In order to obtain w and r , we need to solve the following
optimization problem,
One-Class Support Vector Machine
where w is the norm perpendicular to the hyper-plane and r is the
distance of the hyper-plane from origin.
Distance of Feature vector from origin
A simple feed forward network with one hidden layer
having linear or sigmoid activation g(·) and one output node
OC-NN objective can be formulated as:
where w is the scalar output obtained from the hidden to output
layer, V is the weight matrix from input to hidden units. Xn is an
input vector
One-Class NN
Discriminative Feature Learning
A Discriminative Feature Learning
For generic object, scene or action recognition. The deeply learned
features need to be not only separable but also discriminative.
• Only softmax loss has been considered in classification problem
 SOFTMAX LOSS : encouraging the separability of features.
• Discriminative feature learning approach considers center loss as well
 CENTER LOSS: simultaneously learning a center for deep
features of each class and penalizing the distances between
the deep features and their corresponding class centers.
 JOINT SUPERVISION: minimizing the intra-class variations while
keeping the features of different classes separable
A Discriminative Feature Learning
A Discriminative Feature Learning
Detailed Discussion on Center Loss
• Easy-to-Implement. The gradient and update equation
are easy to derive and the resulting CNN model is
trainable.
• Easy-to-Train. Centers are updated based on mini-batch
with an adjustable learning rate.
• Easy-to-Input. Center loss enjoys the same requirement as
the softmax loss and needs no complex sample mining
and recombination, which is inevitable in contrastive loss
and triple loss.
• Easy-to-Converge. Faster than softmax loss only
• With only softmax loss (λ=0), the deeply learned features are
separable, but not discriminative (significant intra-class variations).
• With proper λ, the discriminative power of deep features can be
significantly enhanced, which is crucial for classification problem
A Discriminative Feature Learning

Más contenido relacionado

La actualidad más candente

Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장Sunggon Song
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentationBushra Jbawi
 
Deep Learning Introduction Lecture
Deep Learning Introduction LectureDeep Learning Introduction Lecture
Deep Learning Introduction Lectureshivam chaurasia
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection철 김
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHarshana Madusanka Jayamaha
 
Visual Object Category Recognition
Visual Object Category RecognitionVisual Object Category Recognition
Visual Object Category RecognitionAshish Gupta
 
Metric learning ICML2010 tutorial
Metric learning  ICML2010 tutorialMetric learning  ICML2010 tutorial
Metric learning ICML2010 tutorialzukun
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro9xdot
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsDezyreAcademy
 

La actualidad más candente (20)

Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
 
Machine learning
Machine learningMachine learning
Machine learning
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
 
Deep Learning Introduction Lecture
Deep Learning Introduction LectureDeep Learning Introduction Lecture
Deep Learning Introduction Lecture
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural network
 
Visual Object Category Recognition
Visual Object Category RecognitionVisual Object Category Recognition
Visual Object Category Recognition
 
MobileViTv1
MobileViTv1MobileViTv1
MobileViTv1
 
Metric learning ICML2010 tutorial
Metric learning  ICML2010 tutorialMetric learning  ICML2010 tutorial
Metric learning ICML2010 tutorial
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 

Similar a Anomaly detection using deep one class classifier

The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector MachinesSilicon Mentor
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier홍배 김
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Support Vector Machine.pptx
Support Vector Machine.pptxSupport Vector Machine.pptx
Support Vector Machine.pptxHarishNayak44
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines SimplyEmad Nabil
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 

Similar a Anomaly detection using deep one class classifier (20)

The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector Machines
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Support Vector Machine.pptx
Support Vector Machine.pptxSupport Vector Machine.pptx
Support Vector Machine.pptx
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 

Más de 홍배 김

Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...홍배 김
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing홍배 김
 
Lecture Summary : Camera Projection
Lecture Summary : Camera Projection Lecture Summary : Camera Projection
Lecture Summary : Camera Projection 홍배 김
 
Learning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robotsLearning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robots홍배 김
 
Robotics of Quadruped Robot
Robotics of Quadruped RobotRobotics of Quadruped Robot
Robotics of Quadruped Robot홍배 김
 
Basics of Robotics
Basics of RoboticsBasics of Robotics
Basics of Robotics홍배 김
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명홍배 김
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN홍배 김
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain홍배 김
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...홍배 김
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance홍배 김
 
Anomaly Detection with GANs
Anomaly Detection with GANsAnomaly Detection with GANs
Anomaly Detection with GANs홍배 김
 
Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)홍배 김
 
Convolution 종류 설명
Convolution 종류 설명Convolution 종류 설명
Convolution 종류 설명홍배 김
 
Learning by association
Learning by associationLearning by association
Learning by association홍배 김
 
알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder홍배 김
 
Binarized CNN on FPGA
Binarized CNN on FPGABinarized CNN on FPGA
Binarized CNN on FPGA홍배 김
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE홍배 김
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법 홍배 김
 

Más de 홍배 김 (20)

Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Lecture Summary : Camera Projection
Lecture Summary : Camera Projection Lecture Summary : Camera Projection
Lecture Summary : Camera Projection
 
Learning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robotsLearning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robots
 
Robotics of Quadruped Robot
Robotics of Quadruped RobotRobotics of Quadruped Robot
Robotics of Quadruped Robot
 
Basics of Robotics
Basics of RoboticsBasics of Robotics
Basics of Robotics
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
 
Anomaly Detection with GANs
Anomaly Detection with GANsAnomaly Detection with GANs
Anomaly Detection with GANs
 
Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)
 
Convolution 종류 설명
Convolution 종류 설명Convolution 종류 설명
Convolution 종류 설명
 
Learning by association
Learning by associationLearning by association
Learning by association
 
알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder
 
Binarized CNN on FPGA
Binarized CNN on FPGABinarized CNN on FPGA
Binarized CNN on FPGA
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 

Último

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Anomaly detection using deep one class classifier

  • 1. Anomaly Detection using Deep One-Class Classifier Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018
  • 2. Anomaly Detection and Localization Using GAN and One-Class Classifier Satellite Image Forgery Detection and Localization Using GAN and One-Class Classifier https://arxiv.org/abs/1802.04881 Previous Approach I
  • 3. Anomaly Detection • 정상치에서 벗어난 관측치들을 detect  One-class classification 혹은 one-class description 여기서는 • Generative adversarial network 또는 Auto-encoder를 이용하여 정상 image에 대한 feature를 mapping한 후 one-class support vector machine (SVM)으로 분포를 결정. Query image에 대하여 결정된 분 포내에 존재하는지 여부 확인
  • 4. Problem formulation • 학습된 image외에 unseen or unfamiliar object가 발견될 경우, 그 림과 같이 binary mask로 영역을 표시 Trained Image Trained Image mask Query Image w/ unfamiliar object Query Image mask w/ unfamiliar object
  • 5. Method 𝐴 𝑒 X h 𝐴 𝑑 𝑋 X min 𝐺 max 𝐷 𝑉(𝐷, 𝐺) = 𝐸 𝑋~𝑝 𝑑𝑎𝑡𝑎 log 𝐷 𝑋 + log(1 − 𝐷 𝐺 𝑋 ) 𝑋 = 𝐺 𝑋 = 𝐴 𝑑 ℎ = 𝐴 𝑑 𝐴 𝑒(𝑋) • Auto-encoder를 이용하여 image로부터 feature(h) 구하고 이를 다시 복원. 복원된 image와 원 image를 이용하여 GAN을 훈련  Auto- encoder 보다 약간의 성능향상 • 정상 image에 대한 latent space의 distribution을 찾아 냄.
  • 7. Method Normal image의 cluster Abnormal image의 features - Training된 Auto-encode의 Encoder에 Query image를 입력하여 latent vector를 계산 - 계산 된 latent vector가 정상 image의 cluster내에 포함되는지 여부 판단  여기서는 RADIAL BASES FUNCTIONS(Gauss Kernel, Parametric modeling of Cluster) 을 사용한 One class SVM을 사용 Features from normal patches(i.e., red dots) cluster together, whereas features from abnormal patches (i.e., blue dots) are more distant.
  • 8. we solve the problem of classifying nonlinearly separable pattern in a hybrid manner involving two stages: • First: Transform a given set of nonlinearly separable patterns into a new set for which, under certain conditions, the likelihood of the transformed patterns becoming linearly separable is high. • Second: the solution of the classification problem is completed by using Stochastic Gradient Descent. Non-linear SVM Classifier using the RBF(Radial-basis function) kernel
  • 9. We find w and b by solving the following objective function using Quadratic Programming. To define an optimal hyperplane we need to maximize the width of the margin(w). Linear SVM(Support Vector Machines) Support vector
  • 10. • The simplest way to separate two groups of data is with a straight line (1 dimension), flat plane (2 dimensions) or an N-dimensional hyperplane. • However, there are situations where a nonlinear region can separate the groups more efficiently. • The kernel function transform the data into a higher dimensional feature space to make it possible to perform the linear separation. Non-Linear SVM(Support Vector Machines) kernel trick
  • 11. To Map from input space to feature space to simplify classification task Non-linear SVM Classifier using the RBF(Radial-basis function) kernel is adopted Non-Linear SVM(Support Vector Machines) Feature space에서의 inner product(a measure of similarity)
  • 12. Key Idea of Kernel Methods K(𝑥𝑖, 𝑥𝑗) K(𝑥𝑖, 𝑥𝑗) = Φ(𝑥𝑖)· Φ(𝑥𝑗)
  • 13. Normal Condition : Cluster bound : exp{− [ 𝑥1−𝑐1 2+ 𝑥2−𝑐2 2] 2𝜎2 } ≥ {0<Threshold<<1} 𝑥1 − 𝑐1 2 + 𝑥2 − 𝑐2 2 ≤ r2 𝐾1 + 𝐾2 ≤ r2 x1 x2 .(c1,c2) r K1 K2 r2 r2 Key Idea of Kernel Methods
  • 14. RBFN architecture Σ Input layer Hidden layer (RBFs) Output layer W1 W2 WM x1 x2 xn No weight f(x) Each of n components of the input vector x feeds forward to m basis functions whose outputs are linearly combined with weights w (i.e. dot product x∙w) into the network output f(x). The output layer performs a simple weighted sum (i.e. w ∙x). If the RBFN is used for regression then this output is fine. However, if pattern classification is required, then a hard- limiter or sigmoid function could be placed on the output neurons to give 0/1 output values Input data set ∶ 𝑋 = { 𝑥1 𝑥2 … 𝑥 𝑁}
  • 16. RBFN architecture  For Gaussian basis functions  s x w w x c w w x c p i i p i i M i pj ij ijj n i M ( ) exp ( )                    0 1 0 2 2 11 2    Assume the variance  across each dimension are equal s x w w x cp i i pj ij j n i M ( ) exp ( )            0 2 2 11 1 2 → → → →
  • 17. Σ Σ Category 1 Category 2 Category 1 Category 2 RBFN for classification
  • 18. RBFN Learning • Design decision • number of hidden neurons • max of neurons = number of input patterns • more neurons – more complex, smaller tolerance • Parameters to be learnt • centers • radii • A hidden neuron is more sensitive to data points near its center. This sensitivity may be tuned by adjusting the radius. • smaller radius  fits training data better (overfitting) • larger radius  less sensitivity, less overfitting, network of smaller size, faster execution • weights between hidden and output layers
  • 19. The question now is: How to train the RBF network? In other words, how to find:  The number and the parameters of hidden units (the basis functions) using unlabeled data (unsupervised learning).  K-Mean Clustering Algorithm  The weights between the hidden layer and the output layer.  Recursive Least-Squares Estimation Algorithm RBFN Learning
  • 21.  Use the K-mean algorithm to find ci RBFN Learning
  • 22. K-mean Algorithm step1: K initial clusters are chosen randomly from the samples to form K groups. step2: Each new sample is added to the group whose mean is the closest to this sample. step3: Adjust the mean of the group to take account of the new points. step4: Repeat step2 until the distance between the old means and the new means of all clusters is smaller than a predefined tolerance.
  • 23. Outcome: There are K clusters with means representing the centroid of each clusters. Advantages: (1) A fast and simple algorithm. (2) Reduce the effects of noisy samples.
  • 24.  Use K nearest neighbor rule to find the function width  k-th nearest neighbor of ci  The objective is to cover the training points so that a smooth fit of the training samples can be achieved 2 1 1   K k iki cc K → →
  • 25.  RBF learning by gradient descent  Let andi p pj ij ijj n p p px x c e x d x s x( ) exp ( ) ( ) ( )                   1 2 2 2 1   E e xp p N    1 2 1 2 ( ) .  we have       E w E E ci ij ij , , and Apply → → → → → N : No. of batch
  • 26. we have the following update equations  RBF learning by gradient descent
  • 27. Gaussian Mixture Models and Expectation-Maximization Algorithm
  • 28. 28 Normal Distribution (1D Gaussian)   2 2 1 ( , ) exp 22 x f x               ,mean 2 ,std
  • 29. 29  d = 2  x = random data point (2D vector)  = mean value (2D vector)  = covariance matrix (2D matrix) 2D Gaussians      1 1 ( , ) exp 22 det( ) T d x x f x                     The same equation holds for a 3D Gaussian
  • 30. 30 2D Gaussians      1 1 ( , ) exp 22 det( ) T d x x f x                   
  • 31. 31 Exploring Covariance Matrix    2 2 1 ( , ) cov( , )1 cov( , ) i i i N T w i i i h x random vector w h w h x x N h w                   is symmetric  has eigendecomposition (svd)    * * T V D V   1 2 ... d    
  • 32. 32 Covariance Matrix Geometry  1 2 * * 1* 2* T V D V a v b v       b a
  • 33. 33 3D Gaussians    2 2 1 2 ( , , ) cov( , ) cov( , ) 1 cov( , ) cov( , ) cov( , ) cov( , ) i rN T i i g i b x r g b g r b r x x r g b g N r b g b                       
  • 34. 34 GMMs – Gaussian Mixture Models W H  Suppose we have 1000 data points in 2D space (w,h)
  • 35. 35 W H GMMs – Gaussian Mixture Models  Assume each data point is normally distributed  Obviously, there are 5 sets of underlying gaussians
  • 36. 36 The GMM assumption  There are K components (Gaussians)  Each k is specified with three parameters: weight, mean, covariance matrix  The total density function is:      1 1 1 1 1 ( ) exp 22 det( ) { , , } 0 1 T K j j j j d j j K j j j j K j j j x x f x weight j                                  
  • 37. 37 The EM algorithm (Dempster, Laird and Rubin, 1977) Raw data GMMs (K = 6) Total Density Function i i
  • 38. 38 EM Basics  Objective: Given N data points, find maximum likelihood estimation of :  Algorithm: 1. Guess initial 2. Perform E step (expectation)  Based on , associate each data point with specific gaussian 3. Perform M step (maximization)  Based on data points clustering, maximize 4. Repeat 2-3 until convergence (~tens iterations)  1argmax ( ,..., )Nf x x       
  • 39. 39 EM Details  E-Step (estimate probability that point t associated to gaussian j):  M-Step (estimate new parameters): , 1 ( , ) 1,..., 1,..., ( , ) j t j j t j K i t i ii f x w j K t N f x          , 1 ,1 ,1 ,1 ,1 1 ( )( ) N new j t j t N t j tnew t j N t jt N new new T t j t j t jnew t j N t jt w N w x w w x x w                    
  • 40. 40 EM Example Gaussian j data point t blue: wt,j
  • 48. RBF networks MLP Learning speed Very Fast Very Slow Convergence Almost guarantee Not guarantee Response time Slow Fast Memory requirement Very large Small Hardware implementation IBM ZISC036 Nestor Ni1000 www-5.ibm.com/fr/cdlab/zisc.html Voice Direct 364 www.sensoryinc.com Generalization Usually better Usually poorer Hyper-parameter ? Initial values are given !
  • 49. Simulation • The color image under analysis is split into patches (either overlapping or not) of size 64x64 pixels. • A adversarially trained auto-encoder encodes the patches into a low dimensional representation called feature vector h(a 2,048 dimensional vector). • A one-class SVM fed with h is used to detect forged patches as anomalies with respect to features distribution learned from normal patches. • Once all patches are classified, a label mask for the entire image is obtained by grouping together all the patch labels.
  • 50. • Small - Object size is smaller than the patch size (approximately 32 pixel per side). • Medium - Object size is comparable to patch size (approximately 64 pixel per side). • Large - Object size is larger than patch size (approximately128 pixel per side). Simulation 검출대상물의 크기에 따라 성능평가
  • 51. Simulation Query Image I w/ unfamiliar object Query Image II w/ unfamiliar object GT mask I GT mask II
  • 52. Unsupervised Anomaly Detection with GANs to Guide Marker Discovery https://arxiv.org/abs/1703.05921 Postech 이도엽씨가 구현한 Tensorflow 코드 https://github.com/LeeDoYup/AnoGAN Previous Approach II
  • 53. 이 연구에서는 아래 그림처럼 정상 data만으로 학습시킨 GAN 모델를 이용하여 Query data에 대하여 정상여부는 물론 비정상 시 비정상 영역을 찾아내고자 함.
  • 54. 1. 정상 data를 이용하여 Generator & Discriminator의 훈련 - Deep convolutional generative adversarial network을 이용하여 latent space(z)로 부터 Generator를 이용하여 생성된 image와 Real image를 구별하도록 Discriminator를 훈련  정상 data의 latent space(z) 분포를 학습 2. 비정상 data여부와 비정상 영역 파악 - 훈련된 Generator & Discriminator의 parameter를 고정한 채 Query image에 대한 latent space(z)로의 mapping 작업을 수행 훈련된 정상 data의 경우, 기학습된 정상 data의 latent space(z) 로 mapping이 되지만, 비정상 data의 경우 벗어남  cost function의 오차가 발생 Anomaly Detection은 다음과 같이 2단계로 이루어짐
  • 55. 1. GAN을 이용하여 정상 data 모델링하기 : 정상 data의 generative model(distribution)을 GAN을 이용하여 학습 정상 𝑑𝑎𝑡𝑎 𝐼 𝑚, with m = 1,2,.....,M, where 𝐼 𝑚 ∈ 𝑅 𝑎𝑥𝑏 임의의 위치에서 랜덤하게 cxc크기의 K 2-D image patches를 추출 x = 𝑥 𝑘,𝑚 ∈ ℵ with k = 1,2,……,K. D and G are simultaneously optimized through the following two- player minimax game with value function V (G,D) The discriminator is trained to maximize the probability of assigning real training examples the “real” and samples from 𝑝 𝑔the “fake” label
  • 56. 2. Query data의 latent space Mapping Query image x가 주어질 경우, 이와 가장 유사한 가상 image인 G(z) 에 해당하는 latent space상의 점 z을 찾는다. x 와 G(z)의 유사여부는 query image가 generator의 훈련시 사용된 정상 data의 분포 𝑝 𝑔를 어느 정도 따르느냐에 의해 결정 z을 찾기 위하여 , latent space distribution Z에서 랜덤하게 샘플된 z1 을 기훈련된 generator에 입력하여 얻은 출력 G(z1)와 x의 차(loss ft’n)를 최 소화하도록 backpropagation을 통하여 latent space의 점z2로 update
  • 57. z 정상 image의 Latent space(z)가 1차원이라고 가정하고 Z은 다음과 같은 분포로 가정하면 𝜇 𝑧 z𝜇 𝑧 Query image에 대한 latent space(z) mapping은 i) 임의의 값 𝑧1에서 시작하여 loss ft’n을 최소화하도록 update ii) 주어진 Γ번째 iteration 후 𝑧Γ이 allowable range안에 들어왔는지 여부에 때라 정상, 비정상을 구분 𝑧1 𝑧2 𝑧Γ Allowable range
  • 58. • Overall loss or Anomaly score: • Anomaly score consists of two parts: • Residual Loss - visual similarity • Discrimination Loss - enforces the generated image to lie on the manifold Query Image의 Mapping에 대한 Loss function 정의
  • 59. Improved discrimination loss based on feature matching • f(.) – output of intermediate layer of the discriminator • It is some statistics of an input image This approach utilizes the trained discriminator not as classifier but as a feature extractor
  • 60. 3. Anomaly Detection Anomaly score : query image x가 정상 image에 얼마나 부합하는지 여부 R(x) : Γ번의 backpropagation후 Residual loss D(x) : Γ번의 backpropagation후 Discrimination Loss 비정상 image : A(x) is large 정상 image : A(x) is small 𝑥 𝑅 = 𝑥 − 𝐺 𝑧Γ Residual error : image내의 비정상 영역을 나타냄
  • 61. 4. Experiments 실험대상은 망막층을 3차원적으로 관측하는 빛간섭단층촬영(OCT) 영상 • Data, Data Selection and Preprocessing i) Training sets : - 2D image patches extracted from 270 clinical OCT volumes of healthy subjects - The gray values were normalized to range from -1 to 1. - Extracted in total 1,000,000 2D training patches with an image resolution of 64x64 pixels at randomly sampled positions.
  • 62. ii) Testing sets : - patches were extracted from 10 additional healthy cases and 10 pathological cases, which contained retinal fluid - Test set in total consisted of 8,192 image patches and comprised normal and pathological samples
  • 63. iii) Model description - Adopt DCGAN architecture that resulted in stable GAN training on images of sizes 64x64 pixels. - Utilized intermediate representations with 512-256-128-64 channels (instead of 1024-512-256-128) - Discrimination loss : Feature representations of the last convolution layer of the discriminator was used - Training was performed for 20 epochs utilizing Adam optimizer. - Ran 500 backpropagation steps for the mapping of new images to the latent space. - Used λ= 0.1 in loss function
  • 64. i) Generative capability of the DCGAN 5. Experiments Given image Generated image Residual overlay Pixel-level annotations of retinal fluid Normal image Anomalous image
  • 65. ii) Detection performance ROC curves Distribution of the residual score(c) and of the discrimination score(d) Latent space에서 정상 data(trained data 및 test data 중 정상)간의 분포는 유사하나 Test data 중 비정상과는 확실한 차이를 나타냄
  • 66. Problems in Previous Approach - Can’t control the shape and boundary of cluster - Can’t control the ambiguous point at the boundary  Let’s find a way to control the shape of cluster and ambiguous point at the boundary
  • 67. SVDD is the smallest enclosing ball problem and it’s alternatives are • The minimum enclosing ball problem with errors • The minimum enclosing ball problem in a RKHS(Repoducing Kernel Hilbert Spaces) • The two class Support vector data description (SVDD) Support Vector Data Description (SVDD)
  • 68. • One class is the target class, and all other data is outlier data. • Create a spherically shaped boundary around the complete target set. • To minimize the chance of accepting outliers, the volume of this description is minimized. • Outlier sensitivity can be controlled by changing the ball-shaped boundary into a more flexible boundary. • Example outliers can be included into the training procedure to find a more efficient description. SOLUTIONS FOR SOLVING DATA DESCRIPTION
  • 69. 1. The minimum enclosing ball problem [Tax and Duin, 2004] centerRadius, R
  • 70. 2. The minimum enclosing ball problem with errors
  • 71. - We assume vectors x are column vectors. - We have a training set {xi }, i = 1, . . , N for which we want to obtain a description. - We further assume that the data shows variances in all feature directions. NORMAL DATA DESCRIPTION • The sphere is characterized by center a and radius R > 0. • We minimize the volume of the sphere by minimizing R², and demand that the sphere contains all training objects xi. • To allow the possibility of outliers in the training set, the distance from xi to the center a should not be strictly smaller than R², but larger distances should be penalized. - Minimization problem: F(R, a) = R² + C∑ξi with constraints ||xi − a||² ≤ R² + ξi, ξi ≥ 0 2. The minimum enclosing ball problem with errors
  • 72. NORMAL DATA DESCRIPTION Lagrange function : L(R, a, αi, γi, ξi ) = R² + C∑ξi − ∑αi {R² + ξi − (‖xi‖² − 2a · xi + ‖a‖²)} − ∑γi ξi L should be minimized with respect to R, c, ξi and maximized with respect to αi and γi: } With subject to: 0 ≤ αi ≤ C 2. The minimum enclosing ball problem with errors
  • 73. 2. The minimum enclosing ball problem with errors NORMAL DATA DESCRIPTION } Support vectors There are 3 cases 𝑅2 = 𝑋 𝑏 − 𝑎 2 = 𝑋 𝑏 ⋅ 𝑋 𝑏 - 2 𝑖 𝛼𝑖 (𝑋𝑖 ⋅ 𝑋 𝑏 ) + 𝑖,𝑗 𝛼𝑖 𝛼𝑗 (𝑋𝑖⋅ 𝑋𝑗) Hypersphere’s center can be determined as 𝑎 = 𝑖 𝛼𝑖 𝑿𝒊 Hypersphere’s radius can be determined by selecting an arbitrary support vector on the boundary 𝑋 𝑏
  • 74. TEST A NEW DATA Xk To test if a new data Xk is within the sphere, the distance to the center of Sphere has to be calculated. A test data Xk is Normal when this distance is smaller than radius ||xk − a||² ≤ R2 2. The minimum enclosing ball problem with errors
  • 75. 2. The minimum enclosing ball problem with errors Please refer to Python Code for SVDD : https://wikidocs.net/3431
  • 76. SVDD with negative examples - When negative examples (objects which should be rejected) are available, they can be incorporated in the training to improve the description. - In contrast with the training (target) examples which should be within the sphere, the negative examples should be outside it.  Minimization problem: With constraints: } 2. The minimum enclosing ball problem with errors
  • 77. 3. The minimum enclosing ball problem in a RKHS Gaussian kernel: With subject to: 0 ≤ αi ≤ C • Minimum enclosing ball problem with errors • Inner product can be substituted by a general kernel function like Gaussian kernel 𝑋 𝑘 − 𝑎 2 = K(𝑋 𝑘, 𝑋 𝑘) - 2 𝑖 𝛼𝑖 K(𝑋𝑖, 𝑋 𝑘) + 𝑖,𝑗 𝛼𝑖 𝛼𝑗 K(𝑋𝑖, 𝑋𝑗) ≤ 𝑅2
  • 78. 3. The minimum enclosing ball problem in a RKHS - For small values of s all objects become support vectors. Test object is selected when: - For very large s the solution approximates the original spherically shaped solution. - Decreasing the parameter C constraints the values for αi more, and more objects become support vectors. - Also with decreasing C the error on the target class increases, but the covered volume of the data description decreases.
  • 79. 4. The two class Support vector data description (SVDD)
  • 80. The two class SVDD vs. one class SVDD
  • 81. Deep SVDD learns a neural network transformation Ф(· ; W) with weights W from input space X∈ R d to output space F ∈ R p that attempts to map most of the data network representations into a hypersphere characterized by center c and radius R of minimum volume. Mappings of normal examples fall within, whereas mappings of anomalies fall outside the hypersphere. Deep Support Vector Data Description (Deep SVDD)
  • 82. Given some training data on X, we define the soft-boundary Deep SVDD objective as - First term : minimizing R2 minimizes the volume of the hypersphere. - Second term is a penalty term for points lying outside the sphere after being passed through the network, i.e. if its distance to the center is greater than radius R - The last term is a regularizer on the network parameters W Deep Support Vector Data Description (Deep SVDD)
  • 83. To achieve this the network must extract the common factors of variation of the data. As a result, normal examples of the data are closely mapped to center c, whereas anomalous examples are mapped further away from the center or outside of the hypersphere. Through this we obtain a compact description of the normal class. Anomal data Anomal dataNomal data Nomal data Deep Support Vector Data Description (Deep SVDD)
  • 84. One-Class Deep SVDD objective SVDD simply employs a quadratic loss for penalizing the distance of every network representation to c One-Class Deep SVDD contracts the sphere by minimizing the mean distance of all data representations to the center.
  • 85. For a given test point x ϵ X, anomaly score s can be defined for both variants of Deep SVDD by the distance of the point to the center of the hypersphere Anomaly Score Anomaly Score Conventional Approach Deep SVDD Normal Anomal Normal Anomal Anomaly Score distribution distribution
  • 86. One-class classification on MNIST and CIFAR-10 Each convolutional module consists of a convolutional layer followed by leaky ReLU activations and 2x2 max-pooling. On MNIST, a CNN with two modules, 8x(5x5x1)-filters followed by 4x(5x5x1)- filters, and a final dense layer of 32 units. On CIFAR-10, a CNN with three modules, 32x(5x5x3)-filters, 64x(5x5x3)-filters, and 128x(5x5x3)-filters, followed by a final dense layer of 128 units. a batch size of 200 and set the weight decay hyper-parameter λ = 10-6 Network architectures
  • 87. Both MNIST and CIFAR-10 have ten different classes from which we create ten one-class classification setups. In each setup, one of the classes is the normal class and samples from the remaining classes are used to represent anomalies. Only train with training set examples from the respective normal class. Training set sizes of n≈6,000 for MNIST and n=5,000 for CIFAR-10. Both test sets have 10,000 samples including samples from the nine anomalous classes for each setup. Pre-process all images with global contrast normalization using the L1 norm and finally rescale to [0; 1] via min-max-scaling. One-class classification on MNIST and CIFAR-10 Data setup
  • 88. One-class classification on MNIST and CIFAR-10 Average AUCs in % with StdDevs (over 10 seeds) per method and one-class experiment on MNIST and CIFAR-10
  • 89. Anomaly Detection using One-Class Neural Networks arXiv:1802.06360v1 Code : https://github.com/raghavchalapathy/oc-nn
  • 90. We wanna make NN like this !
  • 91. Model architecture of Auto-encoder and the proposed one-class neural networks
  • 92. One-Class Support Vector Machine Objective is to find a Hyper plane and distance from origin, which is positive on subset A and negative on every thing out side A. Maximize distance from hyper plane to origin Subset A Hypersphere Hyperplane 𝑟 Negative 𝑤
  • 93. In order to obtain w and r , we need to solve the following optimization problem, One-Class Support Vector Machine where w is the norm perpendicular to the hyper-plane and r is the distance of the hyper-plane from origin. Distance of Feature vector from origin
  • 94. A simple feed forward network with one hidden layer having linear or sigmoid activation g(·) and one output node OC-NN objective can be formulated as: where w is the scalar output obtained from the hidden to output layer, V is the weight matrix from input to hidden units. Xn is an input vector One-Class NN
  • 96. A Discriminative Feature Learning For generic object, scene or action recognition. The deeply learned features need to be not only separable but also discriminative.
  • 97. • Only softmax loss has been considered in classification problem  SOFTMAX LOSS : encouraging the separability of features. • Discriminative feature learning approach considers center loss as well  CENTER LOSS: simultaneously learning a center for deep features of each class and penalizing the distances between the deep features and their corresponding class centers.  JOINT SUPERVISION: minimizing the intra-class variations while keeping the features of different classes separable A Discriminative Feature Learning
  • 98. A Discriminative Feature Learning Detailed Discussion on Center Loss • Easy-to-Implement. The gradient and update equation are easy to derive and the resulting CNN model is trainable. • Easy-to-Train. Centers are updated based on mini-batch with an adjustable learning rate. • Easy-to-Input. Center loss enjoys the same requirement as the softmax loss and needs no complex sample mining and recombination, which is inevitable in contrastive loss and triple loss. • Easy-to-Converge. Faster than softmax loss only
  • 99. • With only softmax loss (λ=0), the deeply learned features are separable, but not discriminative (significant intra-class variations). • With proper λ, the discriminative power of deep features can be significantly enhanced, which is crucial for classification problem A Discriminative Feature Learning