2. Research Goal
Real-time Video Analytics
Stage 1
Video Stream
Stage 2
…
Real-time Processing
Event Detection
Action / Activity Recognition
Behaviour Recognition
Behaviour Profiling
Stage N
Analytics
Intelligent Video Surveillance
Automated Alert
Smart Monitoring
Context-aware Environments
3. Unexpected Behaviors
Mob violence
Unusual Crowding
Sudden group
formation/deformation
Shooting
Public panic
4. Increasing number of surveillance cameras
Deployment of large number of surveillance cameras in recent years
Modern airports now have several thousands cameras!!
10. Background Subtraction: How?
Basic Background Subtraction (BBS)
Current frame
=
Background
Foreground Blob
Dynamic Background Modelling
Background
Model
Current frame
Challenges with BBS
Foreground Blob
•
•
•
•
•
Not a practical approach
Illumination variation
Local background motion
Camera displacement
Shadow and reflection
13. Typical Surveillance Setup
Video Stream
Frame-size reduction
Frame-rate reduction
Background
Subtraction
Feature
Extraction
Event
Detection
Parameter tuning based on operating environment
14. Scenario 1
α = Learning rate
T = Background data proportion
First Frame
T = 0.4
T = 0.6
T = 0.8
α = 0.1
Test Frame
α = 0.01
Ground Truth
α = 0.001
Test Sequence: PETS2001_D1TeC2
15. Scenario 2
α = Learning rate
T = Background data proportion
First Frame
T = 0.4
T = 0.6
T = 0.8
α = 0.1
Test Frame
α = 0.01
Ground Truth
α = 0.001
Test Sequence: VSSN06_camera1
16. Scenario 3
α = Learning rate
T = Background data proportion
First Frame
T = 0.4
T = 0.6
T = 0.8
α = 0.1
Test Frame
α = 0.01
Ground Truth
α = 0.001
Test Sequence: CAVIAR_EnterExitCrossingPaths2cor
17. Observations
• Slow learning rate (α) is not preferable (ghost
or black-out).
• Simple post-processing will not improve the
detection quality at fast learning rate (α).
• Need to know the context behaviour in
advance.
18. How can we detect abnormal situations?
“Hey, a mob will be approaching soon,
and background will be visible only 10%
of that duration. Please set T = 0.1”
19. Research Goals
• A new background subtraction technique for
unconstrained environments, i.e., no context
related information
• Operational at fast learning rate (α)
• Acceptable detection quality
• High stability across changing operating
environments
Te
20. The New Technique, PMOG
• Perceptual Mixture of Gaussians
• Incorporating perceptual characteristics of
human visual system (HVS) in statistical
background subtraction
– Realistic background value prediction
– Perception based detection threshold
– Perceptual model similarity measure
21. Realistic Background Value Prediction
Models are ordered by ω/σ
ω1
σ12
µ1
ω2
σ22
µ2
road
ω3
σ32
µ3
car
shadow
65%
15%
20%
x
x
x
x
P(x)
P(x)
μ
b
Te
New!
Most recent observation, b
22. Realistic Background Value Prediction
…
μ = (1-α)μ + αXt
μ
…
b
time
x
x
P(x)
b
Most recent observation, b
Higher agility than using mean
Not tied with the learning rate
Realistic: actual intensity value
No artificial value due to mean
Te
23. Realistic Background Value Prediction
Models are ordered by ω/σ
ω1
σ12
µ1
b1
ω2
σ22
µ2
b2
road
ω3
σ32
µ3
b3
car
shadow
65%
15%
20%
x
x
P(x)
x
P(x)
b
x
x
x
P(x)
b
Te
b
24. Perception Based Detection Threshold
Models are ordered by ω/σ
ω1
σ12
µ1
b1
ω2
σ22
µ2
b2
road
ω3
σ32
µ3
b3
car
shadow
65%
15%
20%
x
x
x = c1 σ
x
P(x)
x
P(x)
μ
Te
b
x=?
25. Our Problem: How is x related with b?
Low x
x=?
x
x
P(x)
b
Te
High x
26. Weber’s Law
How human visual system perceives noticeable intensity
deviation from the background?
Ernst Weber, an experimental psychologist in the
19th century, observed that the just-noticeable
increment ΔI is linearly proportional to the
background intensity I.
ΔI = c2I
Te
27. Weber’s Law
Ernst Weber, an experimental psychologist in the
19th century, observed that the just-noticeable
increment ΔI is linearly proportional to the
background intensity I.
ΔI = c2I
x
?
x
x
P(x)
b
Te
b
28. Another perceptual characteristic of HVS
What is the perceptual tolerance level in distinguishing
distorted intensity measures?
Method 1
Reference
Image
p dB
Method 2
q dB
Distorted
Images
|p – q| < 0.5 dB
Not perceivable
by human visual
system
29. Our Problem: How is x related with b?
Weber’s Law
x=?
x = c2b
x
x
P(x)
Perceptual Threshold, TP (0.5 dB)
255
20 log10
bx
b
Te
20 log 255
10 b x
1
2TP
30. Impact of Perceptual Threshold, TP
Human Vision: Tp = 0.5 dB
Machine Vision: Tp = 1.0 dB (minimal impact of shadow, reflection, noise etc.)
Te
33. Rod and Cone Cells of Human Eye
• Rods and Cones are two different types of
photoreceptor cells in the retina of human eye
• Rods
– Operate in less intense light
– Responsible for scotopic vision (night vision)
• Cones
– Operate in relatively bright light
– Responsible for photopic (color vision)
Te
48. PMOG: Summary
• Realistic background value prediction: high model
agility and superior detection quality at fast learning
rate
• No context related information: high stability across
changing scenarios
• Perception based detection threshold: superior
detection quality in terms of shadow, noise, and
reflection
• Perceptual model similarity: optimal number of models
throughout the system life cycle
• Parameter-less background subtraction: ideal for realtime video analytics
Te
50. Event Detection
time
Specific types of events vs. abnormality
An event persists for a certain duration of time
The duration is variable
Event characteristics of the same event
Variable in the same environment How to identify the generic
Variable from one scene to other
characteristics of an event?
Te
52. The Proposed Event Detection Approach
f1
f2
f3
.
.
.
time
Frame-level
Features
Event
Model
fn
Temporal
Features
Classifier
Event detection as temporal data classification problem
A distinct set of temporal features can characterise an event
Which/how frame-level features are extracted?
How the observed frame-level features are transformed in
temporal-features?
53. The Proposed Event Detection Approach
Motion based approaches
Tracking based approaches
Key points detection
Point matching in successive frames
Flow vectors: position, direction, speed
Object detection
Object matching in successive frames
Trajectories: object paths
Common characteristics
Inter-frame association
Context specific information
Event models are not generic
Hu et al. (ICPR 2008)
Proposed approach
Xiang et al. (IJCV 2006)
No Inter-frame association
Foreground blob detection
Independent frame-level features =>
Global frame-level descriptor based on
temporal features considering speed
blob statistical analysis, independent
and temporal order
of scene characteristics
54. The Proposed Event Detection Approach
f1
f2
f3
.
.
.
time
Frame-level
Features
Event
Model
fn
Temporal
Features
Classifier
Summary
Background subtraction for foreground blob detection
Independent frame-level features extracted using blob
statistical analysis; no object / position specific information,
no spatial association
Frame-level features are transformed into temporal features
considering speed and temporal order
Te
Supposed to be more context invariant
55. Blob Statistical Analysis
Frame-level features
Blob Area (BA)
Filling Ratio (FR)
Aspect Ratio (AR)
Bounding Box Area (BBA)
Bounding box Width (BBW)
Bounding box Height (BBH)
Blob Count (BC)
Blob Distance (BD)
60. Blob Statistical Analysis
Top five features for four different events
Feature ranking using absolute value criteria of two sample t-test, based on
pooled variance estimate.
61. Experimental Results
Specific Event Detection
•
•
•
•
•
•
•
Four different events: meet, split, runaway, and fight
CAVIAR dataset with labelled frames
80% of the test frames for model training
100 iterations of 10-fold cross validation
Remaining 20% of the test frames for testing
SVM classifier as event models
Separate model for each event
64. Experimental Results
Abnormal Event Detection
•
•
•
•
University of Minnesota crowd dataset (UMN dataset)
The Runaway event model
No additional training or tuning
Three different sites
69. Experimental Results
Performance Comparison
Method
AUC
Proposed Method
0.89
Pure Optical Flow [1]
0.84
[1] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior detection using social force model,” in Proc. IEEE
Conference on Computer Vision and Pattern Recognition CVPR 2009, 20–25 June 2009, pp. 935–942.
70. URLs of the images used in this presentation
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
http://www.fotosearch.com/DGV464/766029/
http://www.cyprus-trader.com/images/alert.gif
http://security.polito.it/~lioy/img/einstein8ci.jpg
http://www.dtsc.ca.gov/PollutionPrevention/images/question.jpg
http://www.unmikonline.org/civpol/photos/thematic/violence/streetvio2.jpg
http://www.airports-worldwide.com/img/uk/heathrow00.jpg
http://www.highprogrammer.com/alan/gaming/cons/trips/genconindy2003/exhibithall-crowd-2.jpg
http://www.bhopal.org/fcunited/archives/fcu-crowd.jpg
http://img.dailymail.co.uk/i/pix/2006/08/passaPA_450x300.jpg
http://www.defenestrator.org/drp/files/surveillance-cameras-400.jpg
http://www.cityofsound.com/photos/centre_poin/crowd.jpg
http://www.hindu.com/2007/08/31/images/2007083156401501.jpg
http://paulaoffutt.com/pics/images/crowd-surfing.jpg
http://msnbcmedia1.msn.com/j/msnbc/Components/Photos/070225/070225_surv
eillance_hmed.hmedium.jpg
http://www.inkycircus.com/photos/uncategorized/2007/04/25/eye.jpg