3. Problem: Given a sequence of graphs,
Q1. Event detection: find time points at which
graph changes significantly
Q2. Characterization: find (top k) nodes / edges /
regions that change the most
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
3
4. Main framework
Compute graph similarity/distance scores
… … …
time
Find unusual occurrences in time series
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
4
5. Flow of Ensemble Approach
Event Detection in Dynamic Graphs
Ensemble Algorithms
Eigen Behavior based Event Detection (EBED)
Probabilistic Approach (PTSAD)
SPIRIT
Consensus Method
Rank based
Score based
Results
Dataset 1: Challenge Network flow Data
Dataset 2: New York Times News Corpus
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
5
6. Event Detection
Consensus Rank Merging
•Rank based
•Inverse Rank
•Kemeny Young
•Score Based
•Unification
(avg, max)
•Mixture Model (avg,
max)
• Final Ensemble
(Inverse Rank)
Characterization
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs 6
7. Numerous algorithms for event detection
Hard to decide which one will work well
for a specific data set
Our Goal: design an ensemble approach which
might not give best result
but “better” than most base algorithms
Challenges:
Different scores/scales
Different merging approaches
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
7
8. Extract “typical behavior” (eigen-behavior) of
nodes/edges
eigen-behavior ≡ principal eigen-vector
Compare eigen-behavior over time
Score the time ticks depending on
amount of change in behavior
from previous time tick.
Mark the ones with high score as
anomalous.
T
N
Feature: Degree
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
8
9. Nodes
T
Features
(egonet)
Time
T
N
Feature:
degree
WW
past pattern
right
singular
vector
N
eigen-behavior at t eigen-behaviors
change-score
metric: Z = 1- uTr
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
9
10. Individual nodes/edges time series with
distributions
Poisson
Zero-inflated Poisson
Hurdle Process
▪ Hurdle Component: Bernoulli & Markov Chain
▪ Count Component: Zero-truncated Poisson
Model Selection:
AIC, log likelihood, Vuong’s test and log gain
Find single-sided p-value as the probability of
observing a count as extreme as v [P(X ≥ v)]
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
10
12. Streaming Pattern dIscoveRy in multIple Time-series
(SPIRIT) [Papadimitriou et al. 2005]
Discovers trends – whenever trend changes it
introduce new hidden variable & remove when not
needed
Detects anomalous points in trends
Nodes weights change in each step
At a change point the node which has highest weight
is most anomalous
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
12
14. RankList2
ScoreList2
Consensus
RankList1
ScoreList1
Rank based Score based
•Inverse Rank
•Kemeny Young
[J. Kemeny 1959]
RankList3
ScoreList3
•Unification [Zimek et al. 2011]
-avg & max
•Mixture Model [Jing et al. 2006]
-avg & max
Final Ensemble: inverse rank
FinalRankList
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs 14
15. We were given a “Cyber Challenge Network”
from NGAS R&T Space Park
Simulated cyber network traffic
10 days activities
125 hosts
To-from information with timestamps
Find “suspicious” events and the entities
associated with the corresponding events in
Challenge Network.
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
15
21. ~8 years (Jan 2000- July 2007) of published
articles of New York Times
Graph links: Co-mention of named entities
(people, places, organization)
Sample rate: 1 week
No ground truth
Big Events detected:
January, 2001 – George W. Bush elected US president
September 11, 2001 – Terrorist attack in WTC
February 1, 2003 – Space Shuttle Columbia Disaster
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
21
24. Heterogeneous detectors
different scores
different effectiveness (depending on dataset)
Ensemble for event detection on dynamic graphs
Multiple consensus (merging) approaches
two-phase consensus finding
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
24
25. Near-future: Robust consensus by automatically
selecting effective base algorithms
Challenge: no ground truth
Near-future: real-time detection
Event detection under diverse data sources
(e.g., news media, social media, the Web, …)
Challenges: different entity types,
different time granularity,
entity resolution
Shebuti & Leman Event Detection & Characterization in Dynamic Graphs
25
My work focuses on
discovering patterns and detecting anomalies in real-world data,
using graph analytics techniques, and
developing effective and efficient tools to do so .