Talk 2011-buet-perception-event

Human Visual Perception Inspired
Background Subtraction

Mahfuzul Haque and Manzur Murshed

Research Goal
Real-time Video Analytics

Stage 1

Video Stream






Stage 2

…

Real-time Processing

Event Detection
Action / Activity Recognition
Behaviour Recognition
Behaviour Profiling

Stage N

Analytics

 Intelligent Video Surveillance
 Automated Alert
 Smart Monitoring
 Context-aware Environments

Unexpected Behaviors
 Mob violence
 Unusual Crowding
 Sudden group
formation/deformation
 Shooting
 Public panic

Increasing number of surveillance cameras

Deployment of large number of surveillance cameras in recent years
Modern airports now have several thousands cameras!!

Decreasing reliability

Dependability on human monitors has increased.
Reliability on surveillance system has decreased.

Typical Video Analytics Framework

Surveillance
video stream

High level
description of
unusual events/
actions
Alarm!

1.
Background
Subtraction

2.
Feature Extraction,
Foreground Blob
Classification

Foreground
Objects

Classified
Foreground Blobs

Event/
Behaviour models

4.
Event/Behavior
Recognition

Tracked
trajectories

Te

3.
Tracking,
Occlusion
Handling

Background Subtraction
Input

Output

Background Subtraction: How?
Basic Background Subtraction (BBS)
Current frame

=

Background

Foreground Blob

Dynamic Background Modelling
Background
Model

Current frame

Challenges with BBS

Foreground Blob

•
•
•
•
•

Not a practical approach
Illumination variation
Local background motion
Camera displacement
Shadow and reflection

MOG-based Background Subtraction
σ2

P(x)

µ
P(x)

x

Sky
Cloud
Leaf
Moving Person

σ2

Road
Shadow
Moving Car

Floor
Shadow
Walking People

Cloud

µ

x

P(x)

Person

Leaf
Sky

P(x)

σ2

µ

x

Te

x (Pixel intensity)

MOG-based Background Subtraction
Background
Model
Current frame

Detected object
Frame 1

road

Frame N

shadow

car

road

shadow

Models are ordered by ω/σ

ω1
σ12
µ1
road

ω2
σ22
µ2
shadow

65%

Te

20%

ω3
σ32
µ3
car

15%

Typical Surveillance Setup
Video Stream

 Frame-size reduction
 Frame-rate reduction

Background
Subtraction

Feature
Extraction

Event
Detection

Parameter tuning based on operating environment

Scenario 1

α = Learning rate
T = Background data proportion

First Frame

T = 0.4

T = 0.6

T = 0.8

α = 0.1
Test Frame
α = 0.01
Ground Truth
α = 0.001

Test Sequence: PETS2001_D1TeC2

Scenario 2

α = Learning rate

First Frame

T = 0.4

T = 0.6

T = 0.8

α = 0.1
Test Frame
α = 0.01
Ground Truth
α = 0.001

Test Sequence: VSSN06_camera1

Scenario 3

α = Learning rate

First Frame

T = 0.4

T = 0.6

T = 0.8

α = 0.1
Test Frame
α = 0.01
Ground Truth
α = 0.001

Test Sequence: CAVIAR_EnterExitCrossingPaths2cor

Observations
• Slow learning rate (α) is not preferable (ghost
or black-out).
• Simple post-processing will not improve the
detection quality at fast learning rate (α).
• Need to know the context behaviour in
advance.

How can we detect abnormal situations?
“Hey, a mob will be approaching soon,
and background will be visible only 10%
of that duration. Please set T = 0.1”

Research Goals
• A new background subtraction technique for
unconstrained environments, i.e., no context
related information
• Operational at fast learning rate (α)
• Acceptable detection quality
• High stability across changing operating
environments

Te

The New Technique, PMOG
• Perceptual Mixture of Gaussians
• Incorporating perceptual characteristics of
human visual system (HVS) in statistical
background subtraction
– Realistic background value prediction
– Perception based detection threshold
– Perceptual model similarity measure

Realistic Background Value Prediction

ω1
σ12
µ1

ω2
σ22
µ2

road

ω3
σ32
µ3
car

shadow

65%

15%

20%
x

x

x

x

P(x)

P(x)

μ

b
Te

New!

Most recent observation, b

…
μ = (1-α)μ + αXt

μ
…

b
time

x

x

P(x)

b
Most recent observation, b






Higher agility than using mean
Not tied with the learning rate
Realistic: actual intensity value
No artificial value due to mean
Te


ω1
σ12
µ1
b1

ω2
σ22
µ2
b2

road

ω3
σ32
µ3
b3
car

shadow

65%

15%

20%

x

x

P(x)

x
P(x)

b

x

x

x

P(x)

b
Te

b

Perception Based Detection Threshold

ω1
σ12
µ1
b1

ω2
σ22
µ2
b2

road

ω3
σ32
µ3
b3
car

shadow

65%

15%

20%

x

x

x = c1 σ

x

P(x)

x

P(x)

μ

Te

b

x=?

Our Problem: How is x related with b?
Low x

x=?
x

x
P(x)

b

Te

High x

Weber’s Law
How human visual system perceives noticeable intensity
deviation from the background?

Ernst Weber, an experimental psychologist in the
19th century, observed that the just-noticeable
increment ΔI is linearly proportional to the
background intensity I.

ΔI = c2I
Te

Weber’s Law
Ernst Weber, an experimental psychologist in the
19th century, observed that the just-noticeable
increment ΔI is linearly proportional to the
background intensity I.

ΔI = c2I
x

?

x

x
P(x)

b

Te

b

Another perceptual characteristic of HVS
What is the perceptual tolerance level in distinguishing
distorted intensity measures?

Method 1

Reference
Image

p dB

Method 2

q dB
Distorted
Images

|p – q| < 0.5 dB
Not perceivable
by human visual
system

Our Problem: How is x related with b?
Weber’s Law

x=?

x = c2b
x

x
P(x)

Perceptual Threshold, TP (0.5 dB)

 255
20 log10 
 bx


b

Te



  20 log  255
10  b  x





 1
 2TP


Impact of Perceptual Threshold, TP

Human Vision: Tp = 0.5 dB
Machine Vision: Tp = 1.0 dB (minimal impact of shadow, reflection, noise etc.)
Te

Error Sensitivity in Darker Background

Te

Rod and Cone Cells of Human Eye
• Rods and Cones are two different types of
photoreceptor cells in the retina of human eye
• Rods
– Operate in less intense light
– Responsible for scotopic vision (night vision)

• Cones
– Operate in relatively bright light
– Responsible for photopic (color vision)
Te

Piece-wise Liner Relationship

Scotopic Vision (R)

Photopic Vision (C)
Te

Perceptual Model Similarity in PMOG
Model redundancy in MOG

Te

Perceptual Model Similarity in PMOG

Experiments
Test Sequences
 Total 50 test sequences from 8 different sources
 Scenario distribution






Indoor
Outdoor
Multimodal
Shadow and Reflection
Low background-foreground contrast

Evaluation
 Qualitative and quantitative
 Lee (PAMI, 2005)
 Stauffer and Grimson (PAMI, 2000)

False Classification

False Positive (FP)
False Negative (FN)

Test Sequences

PETS (9) Wallflower (7) UCF (7)

IBMTe
(11)

CAVIAR (7)

VSSN06 (7)

Other (2)

PMOG: Summary
• Realistic background value prediction: high model
agility and superior detection quality at fast learning
rate
• No context related information: high stability across
changing scenarios
• Perception based detection threshold: superior
detection quality in terms of shadow, noise, and
reflection
• Perceptual model similarity: optimal number of models
throughout the system life cycle
• Parameter-less background subtraction: ideal for realtime video analytics
Te

Event Detection

time





Specific types of events vs. abnormality
An event persists for a certain duration of time
The duration is variable
Event characteristics of the same event
 Variable in the same environment How to identify the generic
 Variable from one scene to other
characteristics of an event?
Te

The Proposed Event Detection Approach
Architecture
Foreground
Detector

Frame-level
Feature Extractor

Temporal
Feature Extractor

Event
Models

Model Training
Frame-level
Feature Extraction
(30 features)

Background
Subtraction
Labelled frames

Temporal
Feature Extraction
(270 features)

Feature Ranking
and Selection

Event Model
Training

Foreground blobs

Real-time Execution
Selective
Frame-level
Feature Extraction

Background
Subtraction
Incoming frames

Foreground blobs

Selective
Temporal
Feature Extraction

Trained
Event Models

Detection
Results

f1
f2
f3
.
.
.

time
Frame-level
Features






Event
Model

fn
Temporal
Features

Classifier

Event detection as temporal data classification problem
A distinct set of temporal features can characterise an event
Which/how frame-level features are extracted?
How the observed frame-level features are transformed in
temporal-features?

Motion based approaches

Tracking based approaches

 Key points detection
 Point matching in successive frames
 Flow vectors: position, direction, speed

 Object detection
 Object matching in successive frames
 Trajectories: object paths

Common characteristics
 Inter-frame association
 Context specific information
 Event models are not generic
Hu et al. (ICPR 2008)

Proposed approach

Xiang et al. (IJCV 2006)

 No Inter-frame association
 Foreground blob detection
 Independent frame-level features =>
 Global frame-level descriptor based on
temporal features considering speed
blob statistical analysis, independent
and temporal order
of scene characteristics

f1
f2
f3
.
.
.

time
Frame-level
Features

Event
Model

fn
Temporal
Features

Classifier

Summary
 Background subtraction for foreground blob detection
 Independent frame-level features extracted using blob
statistical analysis; no object / position specific information,
no spatial association
 Frame-level features are transformed into temporal features
considering speed and temporal order
Te
 Supposed to be more context invariant

Blob Statistical Analysis
Frame-level features










Blob Area (BA)
Filling Ratio (FR)
Aspect Ratio (AR)
Bounding Box Area (BBA)
Bounding box Width (BBW)
Bounding box Height (BBH)
Blob Count (BC)
Blob Distance (BD)

Temporal features
2
1

4
3

6
5

Frame #

 Overlapping sliding window
 Temporal order
 Speed of variation

Blob Count (BC), Blob Area (BA)

Blob Distance (BD)

Aspect Ratio (AR)

Top five features for four different events

Feature ranking using absolute value criteria of two sample t-test, based on
pooled variance estimate.

Experimental Results
Specific Event Detection
•
•
•
•
•
•
•

Four different events: meet, split, runaway, and fight
CAVIAR dataset with labelled frames
80% of the test frames for model training
100 iterations of 10-fold cross validation
Remaining 20% of the test frames for testing
SVM classifier as event models
Separate model for each event

Specific Event Detection

Actual

Predicted

Severity

Abnormal Event Detection
•
•
•
•

University of Minnesota crowd dataset (UMN dataset)
The Runaway event model
No additional training or tuning
Three different sites

Abnormal Event Detection (UMN-9)

Performance Comparison

Method

AUC

Proposed Method

0.89

Pure Optical Flow [1]

0.84

[1] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior detection using social force model,” in Proc. IEEE
Conference on Computer Vision and Pattern Recognition CVPR 2009, 20–25 June 2009, pp. 935–942.

URLs of the images used in this presentation
•
•
•
•
•
•
•
•
•
•
•
•
•
•

•

http://www.fotosearch.com/DGV464/766029/
http://www.cyprus-trader.com/images/alert.gif
http://security.polito.it/~lioy/img/einstein8ci.jpg
http://www.dtsc.ca.gov/PollutionPrevention/images/question.jpg
http://www.unmikonline.org/civpol/photos/thematic/violence/streetvio2.jpg
http://www.airports-worldwide.com/img/uk/heathrow00.jpg
http://www.highprogrammer.com/alan/gaming/cons/trips/genconindy2003/exhibithall-crowd-2.jpg
http://www.bhopal.org/fcunited/archives/fcu-crowd.jpg
http://img.dailymail.co.uk/i/pix/2006/08/passaPA_450x300.jpg
http://www.defenestrator.org/drp/files/surveillance-cameras-400.jpg
http://www.cityofsound.com/photos/centre_poin/crowd.jpg
http://www.hindu.com/2007/08/31/images/2007083156401501.jpg
http://paulaoffutt.com/pics/images/crowd-surfing.jpg
http://msnbcmedia1.msn.com/j/msnbc/Components/Photos/070225/070225_surv
eillance_hmed.hmedium.jpg
http://www.inkycircus.com/photos/uncategorized/2007/04/25/eye.jpg

Thanks!

Q&A
Mahfuzul.Haque@gmail.com

Talk 2011-buet-perception-event

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Talk 2011-buet-perception-event

Similar a Talk 2011-buet-perception-event (20)

Más de Mahfuzul Haque

Más de Mahfuzul Haque (18)

Último

Último (20)

Talk 2011-buet-perception-event