SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Entity Embedding-based Anomaly Detection
for Heterogeneous Categorical Events
Ting Chen♠1
, Lu-An Tang♣
, Yizhou Sun♠
, Zhengzhang Chen♣
,
Kai Zhang♣
♠
Northeastern University, ♣
NEC Labs America
{tingchen, yzsun}@ccs.neu.edu, {ltang, zchen,
kzhang}@nec-labs.com
July 14, 2016
1
Part of the work is done during first author’s internship at NEC Labs America.
1 / 25
Introduction
Anomaly detection is important in many applications. For
example, detect anomalous activities in enterprise networks.
2 / 25
Problem statement
Heterogeneous Categorical Event. A heterogeneous categorical
event e = (a1, · · · , am) is a record contains m different categorical
attributes, and the i-th attribute value ai denotes an entity from the
type Ai .
Table 1: Examples of event in process to process interaction network.
index day hour uid src. proc. dst. proc. src. folder dst. folder
0 3 16 8 init python /sbin/ /usr/bin/
1 5 3 4 init mingetty /sbin/ /sbin/
Problem: abnormal event detection. Given a set of training
events D = {e1, · · · , en}, by assuming that most events in D are
normal, the problem is to learn a model M, so that when a new event
en+1 comes, the model M can accurately predict whether the event is
abnormal or not.
3 / 25
Traditional anomaly detection by density estimation
Estimate probability distribution over data space using kernel
density estimation, and detect anomalies/outliers with lower
probability/density.
4 / 25
Challenges
There are several challenges associated with heterogeneous
categorical events:
Large event space: m different entity types, facing
O(exp(m)) event space.
Lack of intrinsic distance measure among entities and
events: similarities between two users/machines? And two
events with different entities?
No label data
5 / 25
Motivations for our model
To overcome the lack of distance measures: using entity
embedding.
To alleviate the large event space issue:
At the model level, only consider pairwise interactions.
A maintenance program is usually triggered at midnight, but
suddenly it is trigged during the day.
A user usually connect to servers with low privilege, but
suddenly it tries to access some server with high privilege.
At the learning level, propose using noise-contrastive
estimation (NCE) with “context-dependent” noise
distribution.
6 / 25
APE model
We model the probability of a single event e = (a1, · · · , am) in
event space Ω using the following parametric form:
Pθ(e) =
exp Sθ(e)
e ∈Ω exp Sθ(e )
(1)
Where
Sθ(e) =
i,j:1≤i<j≤m
wij(vai
· vaj
) (2)
Where wij is the weight for pairwise interaction between entity
types Ai and Aj, and it is non-negative constrained, i.e.
∀i, j, wij ≥ 0. vai
is the embedding vector for entity ai.
7 / 25
APE model
……
Event
Embedding	Lookup	
Table
Entity	Embeddings
Pairwise	
Interactions
Probability
!"#
!"$
%& %'
()*
+
Figure 1: The framework of proposed method.
8 / 25
APE model
The maximum likelihood learning objective:
arg max
θ e∈D
log Pθ(e) (3)
Where we maximize likelihood of each observed events.
Recall
Pθ(e) =
exp Sθ(e)
e ∈Ω exp Sθ(e )
(4)
Where event space Ω can be prohibitively large. So we resort
to Noise-contrastive learning (NCE).
9 / 25
Noise-contrastive Estimation
With NCE, we make mainly two modifications to original
learning objective:
Treat normalization term in Pθ(e) as a parameter c:
Pθ(e) = exp
i,j:1≤i<j≤m
wij(vai
· vaj
) + c (5)
Introduce a noise distribution Pn(e), and construct a binary
classification problem, discriminating samples from data
distribution and some artificial known noise distribution.
J(θ) =Ee∼Pd
log
Pθ(e)
Pθ(e) + kPn(e)
+
kEe∼Pn
log
kPn(e)
Pθ(e) + kPn(e)
(6)
10 / 25
Stochastic gradient descent
In practice, we can use SGD for fast and online training.
For each observed event e, samples k artificial events e
from Pn(e ), and update parameters according to
stochastic gradients based on:
log σ log Pθ(e) − log kPn(e) +
e
log σ − log Pθ(e ) + log kPn(e )
(7)
Here σ(x) = 1/(1 + exp(−x)) is the sigmoid function.
The complexity of our algorithm is O(Nkm2d), where N is
the number of total observed events it is trained on, m is
the number of entity type, and d is the embedding
dimension.
11 / 25
Context-independent noise distribution
“Context-independent” noise distribution: drawing an
artificial event, independent of given event e, according to
Pfactorized
n (e) = pA1
(a1) · · · pAi
(ai)
12 / 25
Context-independent noise distribution
“Context-independent” noise distribution: drawing an
artificial event, independent of given event e, according to
Pfactorized
n (e) = pA1
(a1) · · · pAi
(ai)
Table 2: Generation of example artificial event in process to process
interaction network “context-independent” noise distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
12 / 25
Context-independent noise distribution
“Context-independent” noise distribution: drawing an
artificial event, independent of given event e, according to
Pfactorized
n (e) = pA1
(a1) · · · pAi
(ai)
Table 2: Generation of example artificial event in process to process
interaction network “context-independent” noise distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
generated event e
12 / 25
Context-independent noise distribution
“Context-independent” noise distribution: drawing an
artificial event, independent of given event e, according to
Pfactorized
n (e) = pA1
(a1) · · · pAi
(ai)
Table 2: Generation of example artificial event in process to process
interaction network “context-independent” noise distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
generated event e 5
12 / 25
Context-independent noise distribution
“Context-independent” noise distribution: drawing an
artificial event, independent of given event e, according to
Pfactorized
n (e) = pA1
(a1) · · · pAi
(ai)
Table 2: Generation of example artificial event in process to process
interaction network “context-independent” noise distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
generated event e 5 3
12 / 25
Context-independent noise distribution
“Context-independent” noise distribution: drawing an
artificial event, independent of given event e, according to
Pfactorized
n (e) = pA1
(a1) · · · pAi
(ai)
Table 2: Generation of example artificial event in process to process
interaction network “context-independent” noise distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
generated event e 5 3 4
12 / 25
Context-independent noise distribution
“Context-independent” noise distribution: drawing an
artificial event, independent of given event e, according to
Pfactorized
n (e) = pA1
(a1) · · · pAi
(ai)
Table 2: Generation of example artificial event in process to process
interaction network “context-independent” noise distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
generated event e 5 3 4 bash mingetty / /sbin/
12 / 25
Context-independent noise distribution
“Context-independent” noise distribution: drawing an
artificial event, independent of given event e, according to
Pfactorized
n (e) = pA1
(a1) · · · pAi
(ai)
Table 2: Generation of example artificial event in process to process
interaction network “context-independent” noise distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
generated event e 5 3 4 bash mingetty / /sbin/
+ Simple and tractable.
- Generated samples might be too easy for the classifier.
12 / 25
Context-dependent noise distribution
“Context-dependent” noise distribution: first sample an
observed event e, then randomly sample an entity type Ai, and
finally sample a new entity ai ∼ pAi
(ai) to replace ai and form a
new negative sample e .
13 / 25
Context-dependent noise distribution
“Context-dependent” noise distribution: first sample an
observed event e, then randomly sample an entity type Ai, and
finally sample a new entity ai ∼ pAi
(ai) to replace ai and form a
new negative sample e .
Table 3: Generation of example artificial event in process to process
interaction network according to “context-dependent” noise
distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
13 / 25
Context-dependent noise distribution
“Context-dependent” noise distribution: first sample an
observed event e, then randomly sample an entity type Ai, and
finally sample a new entity ai ∼ pAi
(ai) to replace ai and form a
new negative sample e .
Table 3: Generation of example artificial event in process to process
interaction network according to “context-dependent” noise
distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
generated event e
13 / 25
Context-dependent noise distribution
“Context-dependent” noise distribution: first sample an
observed event e, then randomly sample an entity type Ai, and
finally sample a new entity ai ∼ pAi
(ai) to replace ai and form a
new negative sample e .
Table 3: Generation of example artificial event in process to process
interaction network according to “context-dependent” noise
distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
generated event e 3 16 8 init mingetty /sbin/ /usr/bin/
13 / 25
Context-dependent noise distribution
“Context-dependent” noise distribution: first sample an
observed event e, then randomly sample an entity type Ai, and
finally sample a new entity ai ∼ pAi
(ai) to replace ai and form a
new negative sample e .
Table 3: Generation of example artificial event in process to process
interaction network according to “context-dependent” noise
distribution.
day hour uid src. proc. dst. proc. src. folder dst. folder
observed event e 3 16 8 init python /sbin/ /usr/bin/
generated event e 3 16 8 init mingetty /sbin/ /usr/bin/
+ Generated samples is harder for the classifier.
- Pn(e ) ≈ Pd (e)PAi
(ai)/m is intractable.
13 / 25
Context-dependent noise distribution
We further approximate the “context-dependent” distribution
term Pn(e ) ≈ Pd (e)PAi
(ai)/m as follows.
Pd (e) is small for most events, we simply set it to some
constant l, so
log kPn(e ) ≈ log PAi
(ai) + z,
where z = log kl/m is a constant term (simply set to 0).
To compute Pn(e) for an observed event e, we will use the
expectation over all entity types as follows:
log kPn(e) ≈
i
1
m
log PAi
(ai) + z.
And again the constant z will be ignored.
14 / 25
Experimental settings
We utilize two data sets in enterprise network.
P2P. Process to process event data set.
P2I. Process to Internet Socket event data set.
We split the two-week data into two of one-weeks. The
events in the first week are used as training set, and new events
that only appeared in the second week are used as test sets.
Generate artificial anomalies: for each observed event in test,
we generate a corresponding anomaly by replacing 1-3 entities
of different types in the event according to random sampling.
15 / 25
Data statistics
Table 4: Entity types in data sets.
Data sets Types of entity and their arities
P2P day (7), hour (24), uid (361), src proc
(778), dst proc (1752), src folder (255), dst
folder (415)
P2I day (7), hour (24), src ip (59), dst ip (184),
dst port (283), proc (91), proc folder (70),
uid (162), connection type (3)
Table 5: Statistics of the collected two-week events.
Data # week 1 # week 2 # week 2 new
P2P 95,434 107,619 53,478 (49.69%)
P2I 1,316,357 1,330,376 498,029 (37.44%)
16 / 25
Methods for comparison
We compare following models in experiments:
Condition. This method is proposed in [Das and
Schneider2007].
CompreX. This method is proposed in [Akoglu et al.2012].
APE. This is the proposed method. Noted that we use the
negative of its likelihood output as the abnormal score.
APE (no weight). This method is the same as APE, except that
instead of learning wij , we simply set ∀i, j, wij = 1, i.e. it is APE
without automatic weights learning on pairwise interactions. All
types of interactions are weighted equally.
17 / 25
Performance comparison for abnormal event detection
Table 6: Values left to slash are AUC of ROC, and ones on the right
are average precision. The last two rows (∗
marked) are averaged
over 3 smaller (1%) test samples due to long runtime of CompreX.
P2P
Models c=1 c=2 c=3
Condition 0.6296 / 0.6777 0.6795 / 0.7321 0.7137 / 0.7672
APE (no weight) 0.8797 / 0.8404 0.9377 / 0.9072 0.9688 / 0.9449
APE 0.8995 / 0.8845 0.9540 / 0.9378 0.9779 / 0.9639
CompreX∗
0.8230 / 0.7683 0.8208 / 0.7566 0.8390 / 0.7978
APE∗
0.9003 / 0.8892 0.9589 / 0.9394 0.9732 / 0.9616
P2I
Models c=1 c=2 c=3
Condition 0.7733 / 0.7127 0.8300 / 0.7688 0.8699 / 0.8165
APE (no weight) 0.8912 / 0.8784 0.9412 / 0.9398 0.9665 / 0.9671
APE 0.9267 / 0.9383 0.9669 / 0.9717 0.9838 / 0.9861
CompreX∗
0.7749 / 0.8391 0.7834 / 0.8525 0.7832 / 0.8497
APE∗
0.9291 / 0.9411 0.9656 / 0.9729 0.9829 / 0.9854
18 / 25
Performance comparison for abnormal event detection
0.0 0.2 0.4 0.6 0.8 1.0
FPR
0.0
0.2
0.4
0.6
0.8
1.0
TPR
ROC
0.0 0.2 0.4 0.6 0.8 1.0
Recall
0.0
0.2
0.4
0.6
0.8
1.0
Precision
PRC
(a) P2P abnormal event detection.
0.0 0.2 0.4 0.6 0.8 1.0
FPR
0.0
0.2
0.4
0.6
0.8
1.0
TPR
ROC
0.0 0.2 0.4 0.6 0.8 1.0
Recall
0.0
0.2
0.4
0.6
0.8
1.0
Precision
PRC
Condition
CompareX
APE (no weight)
APE
(b) P2I abnormal event detection.
Figure 2: Receiver operating characteristic curves and precision
recall curves for abnormal event detections.
19 / 25
Parameter study
Table 7: Average precision under different choice of noise
distributions.
P2P P2I
context-independent 0.8463 0.7534
context-dependent, log kPn(e) = 0 0.8176 0.7868
context-dependent, log kPn(e) = appx 0.8845 0.9383
1 2 3 4 5
Number of negative samples per entity type
0.5
0.6
0.7
0.8
0.9
1.0
1.1
Average precision
Data
P2P
P2I
Figure 3: Performance versus number of negative samples drawn
per entity type. 20 / 25
Case study
Here are some examples of detected anomalies (events with
low probabilities):
Table 8: Detected abnormal events examples.
Data Abnormal event Ab. entity
P2P ..., src proc: bash, src folder: /home/, ... src proc
P2P ..., uid: 9 (some main user), hour: 1, ... uid
P2I ..., proc: ssh, dst port: 80, ... dst port
21 / 25
Pairwise weights visualization
day
hour
uid
src proc
dst proc
src folder
    dst folder
    dst folder
src folder
dst proc
src proc
uid
hour
day
0 0 0 0 0 0 0
0 0 0 0 0 0 0.82
0 0 0 0 0 0.69 6.1
0 0 0 0 5.3 3.8 0.54
0 0 0 3.5 4.4 3 2.3
0 0 2.6 3 4 2 1.5
0 0 1.8 0.5 0.54 0.99 0.58
0.0
1.5
3.0
4.5
6.0
(a) P2P events
day
hour
src ip
dst ip
dport
sproc
proc folder
uid
conn. type
conn. type
uid
proc folder
sproc
dport
dst ip
src ip
hour
day
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 2.5
0 0 0 0 0 0 0 1 1.8
0 0 0 0 0 0 3.1 1.6 2.5
0 0 0 0 0 2.1 1.2 2.4 1.6
0 0 0 0 2.9 1.7 1.8 1.4 0.94
0 0 0 1 0 0.05 0.95 6 1.3
0 0 0.73 0.8 0.78 0.79 0.7 0.94 1
0 1.4 0.42 0.91 0.78 0.76 0.85 0.32 3.2
0
1
2
3
4
5
(b) P2I events
Figure 4: Learned pairwise weights.
22 / 25
Embedding Visualization
Figure 5: 2d visualization of user embeddings. Each color indicates a
user type in the system. The left-most and right-most points are Linux
and Windows root users, respectively.
23 / 25
Embedding Visualization
10 11
12
131416
18
08
7
15
17
19
20
21 22
23
1
2 3
45
6
9
working hours
non­working hours
Figure 6: 2d visualization of hour embeddings.
24 / 25
Q & A
Thank you!
25 / 25

Más contenido relacionado

Similar a Entity Embedding-based Anomaly Detection for Heterogeneous Categorical Events

NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 
A Coupled Discrete-Event and Motion Planning Methodology for Automated Safety...
A Coupled Discrete-Event and Motion Planning Methodology for Automated Safety...A Coupled Discrete-Event and Motion Planning Methodology for Automated Safety...
A Coupled Discrete-Event and Motion Planning Methodology for Automated Safety...Md Mahbubur Rahman
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environmentjins0618
 
Mythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event ProcessingMythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event ProcessingTim Bass
 
Situation based analysis and control for supporting Event-web applications
Situation based analysis and control for supporting Event-web applicationsSituation based analysis and control for supporting Event-web applications
Situation based analysis and control for supporting Event-web applicationsVivek Singh
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdfnyomans1
 
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...Turi, Inc.
 
WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper Antidot
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagerySuneel Marthi
 
Particle Filters and Applications in Computer Vision
Particle Filters and Applications in Computer VisionParticle Filters and Applications in Computer Vision
Particle Filters and Applications in Computer Visionzukun
 
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...Rafael Nogueras
 
An Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRANAn Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRANTom Mens
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 
Proposed Event Processing Definitions ,September 20, 2006
Proposed Event Processing Definitions,September 20, 2006Proposed Event Processing Definitions,September 20, 2006
Proposed Event Processing Definitions ,September 20, 2006Tim Bass
 
Debs 2011 pattern rewritingforeventprocessingoptimization
Debs 2011  pattern rewritingforeventprocessingoptimizationDebs 2011  pattern rewritingforeventprocessingoptimization
Debs 2011 pattern rewritingforeventprocessingoptimizationOpher Etzion
 
Project 2: Baseband Data Communication
Project 2: Baseband Data CommunicationProject 2: Baseband Data Communication
Project 2: Baseband Data CommunicationDanish Bangash
 
Landuse Classification from Satellite Imagery using Deep Learning
Landuse Classification from Satellite Imagery using Deep LearningLanduse Classification from Satellite Imagery using Deep Learning
Landuse Classification from Satellite Imagery using Deep LearningDataWorks Summit
 
Event Stream Processing with Multiple Threads
Event Stream Processing with Multiple ThreadsEvent Stream Processing with Multiple Threads
Event Stream Processing with Multiple ThreadsSylvain Hallé
 

Similar a Entity Embedding-based Anomaly Detection for Heterogeneous Categorical Events (20)

NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
A Coupled Discrete-Event and Motion Planning Methodology for Automated Safety...
A Coupled Discrete-Event and Motion Planning Methodology for Automated Safety...A Coupled Discrete-Event and Motion Planning Methodology for Automated Safety...
A Coupled Discrete-Event and Motion Planning Methodology for Automated Safety...
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
 
Mythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event ProcessingMythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event Processing
 
Situation based analysis and control for supporting Event-web applications
Situation based analysis and control for supporting Event-web applicationsSituation based analysis and control for supporting Event-web applications
Situation based analysis and control for supporting Event-web applications
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
 
WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper
 
Manual orange
Manual orangeManual orange
Manual orange
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
 
Particle Filters and Applications in Computer Vision
Particle Filters and Applications in Computer VisionParticle Filters and Applications in Computer Vision
Particle Filters and Applications in Computer Vision
 
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
 
An Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRANAn Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRAN
 
2020 12-2-detr
2020 12-2-detr2020 12-2-detr
2020 12-2-detr
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Proposed Event Processing Definitions ,September 20, 2006
Proposed Event Processing Definitions,September 20, 2006Proposed Event Processing Definitions,September 20, 2006
Proposed Event Processing Definitions ,September 20, 2006
 
Debs 2011 pattern rewritingforeventprocessingoptimization
Debs 2011  pattern rewritingforeventprocessingoptimizationDebs 2011  pattern rewritingforeventprocessingoptimization
Debs 2011 pattern rewritingforeventprocessingoptimization
 
Project 2: Baseband Data Communication
Project 2: Baseband Data CommunicationProject 2: Baseband Data Communication
Project 2: Baseband Data Communication
 
Landuse Classification from Satellite Imagery using Deep Learning
Landuse Classification from Satellite Imagery using Deep LearningLanduse Classification from Satellite Imagery using Deep Learning
Landuse Classification from Satellite Imagery using Deep Learning
 
Event Stream Processing with Multiple Threads
Event Stream Processing with Multiple ThreadsEvent Stream Processing with Multiple Threads
Event Stream Processing with Multiple Threads
 

Último

CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 

Último (16)

CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 

Entity Embedding-based Anomaly Detection for Heterogeneous Categorical Events

  • 1. Entity Embedding-based Anomaly Detection for Heterogeneous Categorical Events Ting Chen♠1 , Lu-An Tang♣ , Yizhou Sun♠ , Zhengzhang Chen♣ , Kai Zhang♣ ♠ Northeastern University, ♣ NEC Labs America {tingchen, yzsun}@ccs.neu.edu, {ltang, zchen, kzhang}@nec-labs.com July 14, 2016 1 Part of the work is done during first author’s internship at NEC Labs America. 1 / 25
  • 2. Introduction Anomaly detection is important in many applications. For example, detect anomalous activities in enterprise networks. 2 / 25
  • 3. Problem statement Heterogeneous Categorical Event. A heterogeneous categorical event e = (a1, · · · , am) is a record contains m different categorical attributes, and the i-th attribute value ai denotes an entity from the type Ai . Table 1: Examples of event in process to process interaction network. index day hour uid src. proc. dst. proc. src. folder dst. folder 0 3 16 8 init python /sbin/ /usr/bin/ 1 5 3 4 init mingetty /sbin/ /sbin/ Problem: abnormal event detection. Given a set of training events D = {e1, · · · , en}, by assuming that most events in D are normal, the problem is to learn a model M, so that when a new event en+1 comes, the model M can accurately predict whether the event is abnormal or not. 3 / 25
  • 4. Traditional anomaly detection by density estimation Estimate probability distribution over data space using kernel density estimation, and detect anomalies/outliers with lower probability/density. 4 / 25
  • 5. Challenges There are several challenges associated with heterogeneous categorical events: Large event space: m different entity types, facing O(exp(m)) event space. Lack of intrinsic distance measure among entities and events: similarities between two users/machines? And two events with different entities? No label data 5 / 25
  • 6. Motivations for our model To overcome the lack of distance measures: using entity embedding. To alleviate the large event space issue: At the model level, only consider pairwise interactions. A maintenance program is usually triggered at midnight, but suddenly it is trigged during the day. A user usually connect to servers with low privilege, but suddenly it tries to access some server with high privilege. At the learning level, propose using noise-contrastive estimation (NCE) with “context-dependent” noise distribution. 6 / 25
  • 7. APE model We model the probability of a single event e = (a1, · · · , am) in event space Ω using the following parametric form: Pθ(e) = exp Sθ(e) e ∈Ω exp Sθ(e ) (1) Where Sθ(e) = i,j:1≤i<j≤m wij(vai · vaj ) (2) Where wij is the weight for pairwise interaction between entity types Ai and Aj, and it is non-negative constrained, i.e. ∀i, j, wij ≥ 0. vai is the embedding vector for entity ai. 7 / 25
  • 9. APE model The maximum likelihood learning objective: arg max θ e∈D log Pθ(e) (3) Where we maximize likelihood of each observed events. Recall Pθ(e) = exp Sθ(e) e ∈Ω exp Sθ(e ) (4) Where event space Ω can be prohibitively large. So we resort to Noise-contrastive learning (NCE). 9 / 25
  • 10. Noise-contrastive Estimation With NCE, we make mainly two modifications to original learning objective: Treat normalization term in Pθ(e) as a parameter c: Pθ(e) = exp i,j:1≤i<j≤m wij(vai · vaj ) + c (5) Introduce a noise distribution Pn(e), and construct a binary classification problem, discriminating samples from data distribution and some artificial known noise distribution. J(θ) =Ee∼Pd log Pθ(e) Pθ(e) + kPn(e) + kEe∼Pn log kPn(e) Pθ(e) + kPn(e) (6) 10 / 25
  • 11. Stochastic gradient descent In practice, we can use SGD for fast and online training. For each observed event e, samples k artificial events e from Pn(e ), and update parameters according to stochastic gradients based on: log σ log Pθ(e) − log kPn(e) + e log σ − log Pθ(e ) + log kPn(e ) (7) Here σ(x) = 1/(1 + exp(−x)) is the sigmoid function. The complexity of our algorithm is O(Nkm2d), where N is the number of total observed events it is trained on, m is the number of entity type, and d is the embedding dimension. 11 / 25
  • 12. Context-independent noise distribution “Context-independent” noise distribution: drawing an artificial event, independent of given event e, according to Pfactorized n (e) = pA1 (a1) · · · pAi (ai) 12 / 25
  • 13. Context-independent noise distribution “Context-independent” noise distribution: drawing an artificial event, independent of given event e, according to Pfactorized n (e) = pA1 (a1) · · · pAi (ai) Table 2: Generation of example artificial event in process to process interaction network “context-independent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ 12 / 25
  • 14. Context-independent noise distribution “Context-independent” noise distribution: drawing an artificial event, independent of given event e, according to Pfactorized n (e) = pA1 (a1) · · · pAi (ai) Table 2: Generation of example artificial event in process to process interaction network “context-independent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ generated event e 12 / 25
  • 15. Context-independent noise distribution “Context-independent” noise distribution: drawing an artificial event, independent of given event e, according to Pfactorized n (e) = pA1 (a1) · · · pAi (ai) Table 2: Generation of example artificial event in process to process interaction network “context-independent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ generated event e 5 12 / 25
  • 16. Context-independent noise distribution “Context-independent” noise distribution: drawing an artificial event, independent of given event e, according to Pfactorized n (e) = pA1 (a1) · · · pAi (ai) Table 2: Generation of example artificial event in process to process interaction network “context-independent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ generated event e 5 3 12 / 25
  • 17. Context-independent noise distribution “Context-independent” noise distribution: drawing an artificial event, independent of given event e, according to Pfactorized n (e) = pA1 (a1) · · · pAi (ai) Table 2: Generation of example artificial event in process to process interaction network “context-independent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ generated event e 5 3 4 12 / 25
  • 18. Context-independent noise distribution “Context-independent” noise distribution: drawing an artificial event, independent of given event e, according to Pfactorized n (e) = pA1 (a1) · · · pAi (ai) Table 2: Generation of example artificial event in process to process interaction network “context-independent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ generated event e 5 3 4 bash mingetty / /sbin/ 12 / 25
  • 19. Context-independent noise distribution “Context-independent” noise distribution: drawing an artificial event, independent of given event e, according to Pfactorized n (e) = pA1 (a1) · · · pAi (ai) Table 2: Generation of example artificial event in process to process interaction network “context-independent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ generated event e 5 3 4 bash mingetty / /sbin/ + Simple and tractable. - Generated samples might be too easy for the classifier. 12 / 25
  • 20. Context-dependent noise distribution “Context-dependent” noise distribution: first sample an observed event e, then randomly sample an entity type Ai, and finally sample a new entity ai ∼ pAi (ai) to replace ai and form a new negative sample e . 13 / 25
  • 21. Context-dependent noise distribution “Context-dependent” noise distribution: first sample an observed event e, then randomly sample an entity type Ai, and finally sample a new entity ai ∼ pAi (ai) to replace ai and form a new negative sample e . Table 3: Generation of example artificial event in process to process interaction network according to “context-dependent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ 13 / 25
  • 22. Context-dependent noise distribution “Context-dependent” noise distribution: first sample an observed event e, then randomly sample an entity type Ai, and finally sample a new entity ai ∼ pAi (ai) to replace ai and form a new negative sample e . Table 3: Generation of example artificial event in process to process interaction network according to “context-dependent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ generated event e 13 / 25
  • 23. Context-dependent noise distribution “Context-dependent” noise distribution: first sample an observed event e, then randomly sample an entity type Ai, and finally sample a new entity ai ∼ pAi (ai) to replace ai and form a new negative sample e . Table 3: Generation of example artificial event in process to process interaction network according to “context-dependent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ generated event e 3 16 8 init mingetty /sbin/ /usr/bin/ 13 / 25
  • 24. Context-dependent noise distribution “Context-dependent” noise distribution: first sample an observed event e, then randomly sample an entity type Ai, and finally sample a new entity ai ∼ pAi (ai) to replace ai and form a new negative sample e . Table 3: Generation of example artificial event in process to process interaction network according to “context-dependent” noise distribution. day hour uid src. proc. dst. proc. src. folder dst. folder observed event e 3 16 8 init python /sbin/ /usr/bin/ generated event e 3 16 8 init mingetty /sbin/ /usr/bin/ + Generated samples is harder for the classifier. - Pn(e ) ≈ Pd (e)PAi (ai)/m is intractable. 13 / 25
  • 25. Context-dependent noise distribution We further approximate the “context-dependent” distribution term Pn(e ) ≈ Pd (e)PAi (ai)/m as follows. Pd (e) is small for most events, we simply set it to some constant l, so log kPn(e ) ≈ log PAi (ai) + z, where z = log kl/m is a constant term (simply set to 0). To compute Pn(e) for an observed event e, we will use the expectation over all entity types as follows: log kPn(e) ≈ i 1 m log PAi (ai) + z. And again the constant z will be ignored. 14 / 25
  • 26. Experimental settings We utilize two data sets in enterprise network. P2P. Process to process event data set. P2I. Process to Internet Socket event data set. We split the two-week data into two of one-weeks. The events in the first week are used as training set, and new events that only appeared in the second week are used as test sets. Generate artificial anomalies: for each observed event in test, we generate a corresponding anomaly by replacing 1-3 entities of different types in the event according to random sampling. 15 / 25
  • 27. Data statistics Table 4: Entity types in data sets. Data sets Types of entity and their arities P2P day (7), hour (24), uid (361), src proc (778), dst proc (1752), src folder (255), dst folder (415) P2I day (7), hour (24), src ip (59), dst ip (184), dst port (283), proc (91), proc folder (70), uid (162), connection type (3) Table 5: Statistics of the collected two-week events. Data # week 1 # week 2 # week 2 new P2P 95,434 107,619 53,478 (49.69%) P2I 1,316,357 1,330,376 498,029 (37.44%) 16 / 25
  • 28. Methods for comparison We compare following models in experiments: Condition. This method is proposed in [Das and Schneider2007]. CompreX. This method is proposed in [Akoglu et al.2012]. APE. This is the proposed method. Noted that we use the negative of its likelihood output as the abnormal score. APE (no weight). This method is the same as APE, except that instead of learning wij , we simply set ∀i, j, wij = 1, i.e. it is APE without automatic weights learning on pairwise interactions. All types of interactions are weighted equally. 17 / 25
  • 29. Performance comparison for abnormal event detection Table 6: Values left to slash are AUC of ROC, and ones on the right are average precision. The last two rows (∗ marked) are averaged over 3 smaller (1%) test samples due to long runtime of CompreX. P2P Models c=1 c=2 c=3 Condition 0.6296 / 0.6777 0.6795 / 0.7321 0.7137 / 0.7672 APE (no weight) 0.8797 / 0.8404 0.9377 / 0.9072 0.9688 / 0.9449 APE 0.8995 / 0.8845 0.9540 / 0.9378 0.9779 / 0.9639 CompreX∗ 0.8230 / 0.7683 0.8208 / 0.7566 0.8390 / 0.7978 APE∗ 0.9003 / 0.8892 0.9589 / 0.9394 0.9732 / 0.9616 P2I Models c=1 c=2 c=3 Condition 0.7733 / 0.7127 0.8300 / 0.7688 0.8699 / 0.8165 APE (no weight) 0.8912 / 0.8784 0.9412 / 0.9398 0.9665 / 0.9671 APE 0.9267 / 0.9383 0.9669 / 0.9717 0.9838 / 0.9861 CompreX∗ 0.7749 / 0.8391 0.7834 / 0.8525 0.7832 / 0.8497 APE∗ 0.9291 / 0.9411 0.9656 / 0.9729 0.9829 / 0.9854 18 / 25
  • 30. Performance comparison for abnormal event detection 0.0 0.2 0.4 0.6 0.8 1.0 FPR 0.0 0.2 0.4 0.6 0.8 1.0 TPR ROC 0.0 0.2 0.4 0.6 0.8 1.0 Recall 0.0 0.2 0.4 0.6 0.8 1.0 Precision PRC (a) P2P abnormal event detection. 0.0 0.2 0.4 0.6 0.8 1.0 FPR 0.0 0.2 0.4 0.6 0.8 1.0 TPR ROC 0.0 0.2 0.4 0.6 0.8 1.0 Recall 0.0 0.2 0.4 0.6 0.8 1.0 Precision PRC Condition CompareX APE (no weight) APE (b) P2I abnormal event detection. Figure 2: Receiver operating characteristic curves and precision recall curves for abnormal event detections. 19 / 25
  • 31. Parameter study Table 7: Average precision under different choice of noise distributions. P2P P2I context-independent 0.8463 0.7534 context-dependent, log kPn(e) = 0 0.8176 0.7868 context-dependent, log kPn(e) = appx 0.8845 0.9383 1 2 3 4 5 Number of negative samples per entity type 0.5 0.6 0.7 0.8 0.9 1.0 1.1 Average precision Data P2P P2I Figure 3: Performance versus number of negative samples drawn per entity type. 20 / 25
  • 32. Case study Here are some examples of detected anomalies (events with low probabilities): Table 8: Detected abnormal events examples. Data Abnormal event Ab. entity P2P ..., src proc: bash, src folder: /home/, ... src proc P2P ..., uid: 9 (some main user), hour: 1, ... uid P2I ..., proc: ssh, dst port: 80, ... dst port 21 / 25
  • 33. Pairwise weights visualization day hour uid src proc dst proc src folder     dst folder     dst folder src folder dst proc src proc uid hour day 0 0 0 0 0 0 0 0 0 0 0 0 0 0.82 0 0 0 0 0 0.69 6.1 0 0 0 0 5.3 3.8 0.54 0 0 0 3.5 4.4 3 2.3 0 0 2.6 3 4 2 1.5 0 0 1.8 0.5 0.54 0.99 0.58 0.0 1.5 3.0 4.5 6.0 (a) P2P events day hour src ip dst ip dport sproc proc folder uid conn. type conn. type uid proc folder sproc dport dst ip src ip hour day 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.5 0 0 0 0 0 0 0 1 1.8 0 0 0 0 0 0 3.1 1.6 2.5 0 0 0 0 0 2.1 1.2 2.4 1.6 0 0 0 0 2.9 1.7 1.8 1.4 0.94 0 0 0 1 0 0.05 0.95 6 1.3 0 0 0.73 0.8 0.78 0.79 0.7 0.94 1 0 1.4 0.42 0.91 0.78 0.76 0.85 0.32 3.2 0 1 2 3 4 5 (b) P2I events Figure 4: Learned pairwise weights. 22 / 25
  • 34. Embedding Visualization Figure 5: 2d visualization of user embeddings. Each color indicates a user type in the system. The left-most and right-most points are Linux and Windows root users, respectively. 23 / 25
  • 35. Embedding Visualization 10 11 12 131416 18 08 7 15 17 19 20 21 22 23 1 2 3 45 6 9 working hours non­working hours Figure 6: 2d visualization of hour embeddings. 24 / 25
  • 36. Q & A Thank you! 25 / 25