The document proposes a robust abandoned object detection system based on measuring an object's life-cycle state. It uses a double-background framework to extract unmoving object candidates and filters them using appearance features. A finite state machine then models each object's life-cycle state to determine if it has been abandoned, accounting for occlusion or illumination changes. The system was tested on 10 videos and achieved low false alarm and missing rates, showing it can feasibly detect abandoned objects.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
CVGIP 2010 Part 2
1. Robust Abandoned Object Detection based on Life-cycle State Measurement
1
Wei-Hsin Hsu(徐維忻), 2Hung-I Pai(白宏益) , 3Shen-Zheng Wang(王舜正), 4San-Lung Zhao(趙
善隆), 5Kung-Ming Lan(藍坤銘)
Identification and Security Technology Center, Systems Development and Solutions Division,
Industrial Technology Research Institute, Hsin-Chu, Taiwan
E-mail: hsuweihsin@itri.org.tw 2HIPai@itri.org.tw 3st@itri.org.tw 4slzhao@itri.org.tw
1
5
blueriver@itri.org.tw
ABSTRACT can be used in some applications such as dangerous
abandoned object detection, abandoned luggage
In public areas, objects could be abandoned due to detection for passengers and so on. Moreover, this
careless forgetting or terrorist attack purposes. If we system is also lowering personnel.
can detect those abandoned objects automatically based
on a video surveillance system, the forgotten objects [1] provides a two backgrounds framework (long-term
can be returned to the owner and the terrorist attacks and short-term). The two backgrounds framework uses
can be stopped. In this paper, we propose an automatic pair of backgrounds that have different characteristics
abandoned object detection algorithm to satisfy the to segment related foregrounds and extract abandoned
requirement. The algorithm includes a double- objects. Our background module building technology is
background framework, which generates two just follow [1] and tries to fix it. The advantage of
background models according to different sampling method from [1] is that it doesn’t track all of objects so
rates. The framework can extract un-moving object that it can save the tracking computing performance.
candidates rapidly and the abandoned objects will be However, this method is not perfect yet; temporal rate
extracted from those candidates according to their of background construction is quite important, in [1],
appearance features. Finally, this paper propose a finite long-term and short-term background will be updated
state machine to model The Life-cycle State for a period of time. However, if long-term background
Measurement called LCSM module, to prevent the is updated before the abandoned object is detected, it
missing fault caused by occlusion or illumination cause failure in detection and effect on the result. In
changing. When abandoned events happened, the this study, we try to find the optimized key timing to
LCSM module can launch or stop an alarm in the reduce the risk of failure in detection cause of long-
starting time or the ending time. To evaluation the term background updating. The detail is described in
performance, we test the algorithm on 10 videos. The Section 2.
experiment result shows that the algorithm is feasible,
since the false alarm rate and missing rate are very low. [2] provides a framework of two level backgrounds,
which uses linear combination, belonging to single
Keywords Pure background; Instant background; un- background building method and manages to track each
moving object aggregated map; LCSM moving object. The framework uses optical flow
technology to detect moving objects because optical
1. INTRODUCTION flow for each object will be changed when the object is
moving. Therefore, it can easy to recognize moving
The problem of video surveillance becomes quite object and static object. Static object and stayed human
important issue today. Following the progressed are separated by Human Patten recognition method.
technology, a lot of places including the public area, However, the method still has some limits for filtering.
insider building, even in the public roadway are set up For example, we are hard to confirm each shape of still
camera for surveillance. However, we cannot monitor human. Therefore, we must have enough Human Patten
each camera by human immediately cause of templates. When the number of template is up, the
insufficient human resource. In this study, we propose a recognition rate is increased; even so, the computing
new abandoned object detection technology, which performance is raised. Therefore, it is unsuitable for
receive the video stream from a camera and than detect performance priority. In Section 2, we propose a
the abandoned object in few minutes. The technology method to filter an object by the object feature filtering
691
2. to avoid the performance problem cause of using the of objects in S are aggregated over the threshold, we
Human Patten method. draw objects out from S and save those objects.
Moreover, those objects should be filtered according to
[3] provides a framework of two backgrounds method feature filtering including Shape 、 Compactness and
(current and buffered background) to track all objects Area. The final processing is the most important issue
and to record the object’s information to determine in this paper: The Life-cycle State Measurement called
whether it is occluded or not. The advantage is that it is LCSM. The conception of LCSM is from software
still locks on target even so the target is occluded. engineering, the original meaning talks about the life-
Section 2, we fix the idea from [3] and provide the cycle of software, this paper puts the idea into
theory about The Life-cycle State Measurement (LCSM) abandoned object detection, it makes abandoned object
to make each abandoned object in different have different state in each situation. In this paper , the
environments more convinced and have the detection state include the growing state、the stable state、the
result become more reasonable. aging state and the dead state. We assign proper state
for each abandoned object in different situations like
occlusion or removal, in this way; we can make
abandoned object processing more reasonable. This
issue will be discussed in Section 2, and Fig. 1 is shown
the definition of all the symbols and described the
relationships one another.
2. ABANDONED OBJECT EDTECTION SYSTEM
Fig. 1.Symbol definitions and flow
An abandoned object detection method using object
tracking method could be feasible. However, the
problem of using tracking method has lower efficiency
when too many objects are tracked, since those methods
track not only abandoned object but also all other
objects. Otherwise, the computing is heavy. Therefore, Fig. 2. System Overview
we use new method based on different sample rate
instead of tracking-based technology to avoid the In this paper, the system consists of three modules as
problem we discussed above. In this paper , we divide shown in Fig. 2. The first module, um-moving object
the system framework into three technological detection, is composed of foreground detection, un-
processeses; first, we manage to receive the pure moving object decision and real un-moving object
background called BP(t) and the instant background extraction by aggregated processing. The second
called BI(t) according to different sample rate at time t, module called an abandoned object decision; the
BP(t) means the original background without any function of this module is clustering the image pixels of
moving objects here, and BI(t) means the background un-moving object from the un-moving object
received form video sequence for a short period. aggregated map received from module 1. Moreover,
Computing by frame difference between current frame those un-moving objects are filtered by the object
BC(t) and BP(t), BI(t) to get the pure foreground called FP(t) feature filtering method and the abandoned objects are
and the instant foreground called FI(t). Following the decided. The final module, life cycle of abandoned
rules in Section 2, we can extract the un-moving objects object, decides the persistent time of an abandoned
in current frame according to FP(t) and FI(t), and getting object event according to a finite state machine.
the un-moving object map called St(t) at present. Second,
the processing will aggregate value for each pixel from According to the definition of the modules, three main
each St(t) to receive the un-moving object aggregated corresponding technologies are presented in the
map called S, this processing can get rid of some following.
objects which remain within short period. When some
692
3. 2.1 The un-moving object decision background using BC(t) because BC(t) could have moving
or un-moving objects here, and it is easy to update the
According to the discussion from the Introduction, the objects into Bp(t). For that reason, To keeping pure, we
abandoned objects detection by objects tracking method select foreground map FP(t) as the mask, and then we
could cause performance problem, because most use the linear combination method to combine the
computing are cost to track moving objects. In [1] [3], masked pixels of BP(t) and BC(t). Therefore it can avoid
double backgrounds are proposed to remove moving objects updated into BP(t) and suitable for illumination
objects and retain un-moving objects. By this method, changing. Actually, we cannot guarantee the accuracy
the performance will be raised than tracking method. of FP(t) and can’t avoid noises are updated into BP(t).
Those noises be increased quickly when the updated
frequency is raised. However, when the updated
2.1.1. Background updating frequency of BP(t) is lower, the ability of illumination
changing adaption will be declined. Therefore, in this
In the sequence of video, we extract the frames to be the paper, according to different timing, we propose three
updated source of the two background model BP(t) and updating rules to adapt background model BP(t). The
BI(t) by different duration shown in Fig.3: background is not updated for each frame. The update
rate of BP(t) is defined as previous paragraph and the
updating rules are defined as below:
a) The foreground map FP(t) is used as a mask to select
the pixels that need to be updated. Those selected pixels
in BP(t) is linearly combined with BC(t) for updating.
b) If the number of pixels from un-moving object
aggregated map S is over an assigned threshold, it
means that the noises are up or light is changing. In
that condition, BP(t) is replaced by BC(t) .
c) If there are not any moving objects in BC(t) for the
Fig. 3 Pure and Instant sampling rate illustration long duration and there are not any abandoned objects
detected, BP(t) is replaced by BC(t) .
BP(t) indicates keeping pure image without any moving
objects during the long period, and BI(t) indicates an In the second rule, the un-moving object aggregated
image caught by an sample rate with short period. map S is a map used to accumulate the possibility of
Objects are expected in BI(t) when the event of each pixel belonging to abandoned object. The detail
abandoned objects happened. Current frame at time t is about map S will be described in next sub-section (Sec.
denoted by BC(t). After the background model BP(t) is 2.1.2).
estimated, we can easy to obtain a foreground map FP(t)
including moving and un-moving objects by computing 2.1.2. Un-moving object decision
the frame difference between BP(t) and BC(t). To extract
un-moving objects from the foreground map FP(t), we Following the last section, first of all, we compute the
try to extract another foreground map FI(t) only includes frame difference between BC(t) and BP(t), and between
moving object shown in Fig.4. The foreground map FI(t) BC(t) and BI(t). The frame difference method is image
is gotten by computing the frame difference between subtraction between two images. The difference results
BI(t) and BC(t). If an un-moving object stays in a position, FP(t) and FI(t) are shown in Fig.4:
the value in the object position of the map FP(t) should
be 0 and the value in the object position of the map FI(t)
should be 1. Therefore, this processing can easily to
extract an un-moving object presently. According to our
experiment, the time period of sample rate of BP(t) is
about 25 frames, and the sample rate of BI(t) is about 15
frames .
Because of illumination variances, it is difficult to
update Bp(t). In general, Bp(t) could be updated and Fig. 4 Pure and Instant foreground illustration
sourced from BC(t). However, it is hard to keep pure
693
4. If a object stay in the environment, the value on the constant value (Iv). Otherwise, decrease a constant
same positions (x, y) of FP(t)(x, y) is 0 and FI(t)(x, y) is 1. value (Dv). Therefore, if objects stay for a long period,
The algorism is shown in Algo.1: the values of the object positions in map S will increase
continuously until that enough pixels value are exceed
In the algorithm, FP(t) and FI(t) are monochrome and the to the thresholds we assign. In that time, those pixels
range of pixel value is 0 to 1. When the pixel value of could be parts of un-moving objects. The advantage of
FP(t) (x, y) is equal to 0 and the pixel value of FI(t)(x, y) this method is that it can prevent the temporary objects,
is equal to 1, it means that this pixel begin to stop and which stayed only for few seconds, to be regarded as
stay in the current frame. Therefore; we can extract the abandoned objects. Moreover, when number of pixel
un-moving object map St shown in Fig.5: which value is equal to 1 is too much, this situation
may infer too many noises in Bp(t) after updating, or the
light is changed too large. Those information could be
feedback to be a updated timing conditions of BP(t).
Input: FP(t) ,FI(t) Output: St(t)
For Each Position (x, y) inside the maps FP(t) and
FI(t)
IF (FP(t)(x, y) = 0 & FI(t)(x, y) = 1)
Set St(t) (x, y) = 1
Else
Set St(t) (x, y) = 0
Algo. 1. The algorism of getting the un-moving object
Fig. 6. Un-moving object aggregated map illustration
2.2 Abandoned object decision
The method of [2] is using Human Patten to recognize
human or abandoned objects. But it is not enough that
only Human Patten is using. In normal situation, an
abandoned object is usually a luggage or a briefcase,
Fig. 5.Un-moving object map illustration: only present and bomb usually package in a box with regular shape.
the un-moving object. The right map is St Therefore, we can only focus on the object that has
regular shape, getting rid of it using object feature
2.1.3. Real Un-moving Object extraction filtering.
St includes current un-moving objects. However, it 2.2.1. Abandoned object clustering
could not indicate that they are real un-moving objects
because those objects could just be a person staying for According to the un-moving object aggregated map S
seconds or objects placing temporary. Therefore, St which is getting in the section 2.1, we extract each
must be added up for each frame continuously to form object from S using clustering method. Those un-
an un-moving object aggregated map S for extracting moving objects may include noises, staying human or
real un-moving objects. The map S is defined as Algo.2. luggage and so on, generally, we take care about the
object which is baggage with regular shape, briefcase
and box, therefore, we focus on those kinds of objects
based on three special features.
Input: St(t),S , Iv, Dv Output: S
For Each Position (x, y) inside St(t) & S 2.2.2. Object feature filtering
IF(St(t)(x, y) = 0 )
According to the need from Subsection 2.2.1, we filter
S(x, y) = S(x, y) + Iv
each object based on object features. In this paper, we
Else use three object features. First object feature is Area.
S(x, y) = S(x, y) - Dv The goal of Area constrain is filtering the objects which
are too large or too small. The Area feature is shown
Algo. 2. The algorism of S
(1) :
For each pixel, if the pixel value is equal to 0, then the
same position (x, y) of S will begin to increase a Area size(Object) (1)
694
5. where size (Object) means sun of pixels of object. abandoned object set that we already have(objectj(t-n)
1<n<t-1).If the center position of objecti (t) is located on
The second feature is Shape, the state of object one of abandoned objects‘s bounding box from the Set
appearance. The function is shown (2) : (objectj(t-n) 1<n<t-1), it means that they have relationship
between objectj(t-n) and objecti(t), and called the objectj(t-n)
size(Object) 4 has relationship satisfaction. Otherwise; it could be new
Shape (2) object 、 removal object or occlusion and called
( Perimeter)2 relationship dissatisfaction, if it is a new object, than
where Perimeter indicate sun of object edge length and adding objecti(t) into the Set(objectj(t-n) 1<n<t-1). Therefore;
the shape of object. When the shape of object is not we can decide relation and state for each object from
regular, the Shape is smaller. Otherwise, the Shape is Set(objectj(t-n) 1<n<t-1) through this processing. When an
larger by non-regular object. Generally speaking, this abandoned object is considered, we don't lunch a alarm
feature is good for human shape filtering. to user immediately instead of making a decision based
on the state of current abandoned object. Following the
The last feature is Compactness. If the object is more definition of the life-cycle of software from software
dispersed, the value of Compactness will be smaller. engineering technology, we manage to use the ideas
The function is shown (3): into our study. The Life-cycle State Measurement
(LCSM) including four states: the growing state、the
n
iObject
i
stable state、 the aging state and the dead state. Fig.7
is The Finite State Machine of LCSM:
Compactness (3)
size(Object ) 8
where ni means that for each pixel, searching eight
neighborhood pixels, if one of eight pixels is also
included object, than ni will be add value 1.
The objects extracting from S should be contented with
those three features, and than those objects be
Fig. 7.The Finite State Machine of LCSM
considered an abandoned object finally. Therefore; we
can be easily to filter most of objects thought those
The beginning state of abandoned object which is
features.
recorded in Set(objectj(t-n) 1<n<t-1) is assigned the growing
state, when the timing of growing state is finished, the
2.3 The Life-cycle State Measurement
state will be changed into the stable state or the aging
state. When the timing of aging state is raised, it could
The advantage of our method is that we don't track all
be into the dead state.
objects and the efficiency is better. However, the object
features are easily to be effected due to illumination
Following illustrate is the algorism of each state.
changing and so on. Moreover, the occlusion problem
Symbols are defined in Table.1 to convince the
is a harsh issue, when the objects are occluded. The
following illustration.
abandon object could be discarded due to that the Area
feature is reduced. When objects are occluded in small
period; system should be kept instead of being Table.1. The definitions of symbols in LCSM
discarded. Therefore, the set saving an abandoned symbol illustration
object should have temporal register. In other word, ObjState The life-cycle state of current object
when we get objects form the S for each frame, those
objects should compute the relationship algorism with Newobj A type is Boolean, indicates whether
abandoned object set which detected before and object is new abandoned object or
discarded abandon object. The definition of relationship not
means whether the abandoned object set at present GrowingT A type is Boolean, indicates whether
(objecti (t)) connect to the object set which detected the growing time (GrowingTime) is
before (objectj(t-n) 1<n<t-1) or not. The processing ensures finished or not. GrowingTime is
that each abandoned object has stayed for a period of means a duration form Growing
time. State to Stable State
AgingT A type is Boolean, indicates whether
The computing method of relationship is that for each the aging time (AgingTime) is
objecti(t), their center position is compared with
695
6. finished or not. . AgingTime is be stopped and the state will be changed into the special
means a duration form Aging State situation called the unstable state. When the unstable
to Dead State state continues for a long period of time, this object will
ObjFeature A type is Boolean, indicates whether be killed by itself finally. Therefore, through the LCSM
features of abandoned object are processing, we can avoid some of unstable situations
satisfied assigned by user or not and make the detection more reasonable.
ObjRelation A type is Boolean, indicates whether
the relationship satisfaction is 3. EXPERIMENT RESULTS
satisfied through the relationship
computing or not We test ten videos which the dimension is 352 X 288 or
320 X 240 and those videos be divided into four types,
Growing State: When the new abandoned object is each type has the same background, therefore; the
created, we give it a time buffer (GrowingTime) to parameters for each type should be the same to proof
grow and to avoid a false alarm due to the error that the same parameters in the same background could
detection by occlusion or non-complete area. Therefore; be use. When abandoned objects are detected, system
this state is using at the beginning until the will draw bounding box on them even in occlusion
GrowingTime is finished. situation, when objects are removed, the alarm is still
retained for a period of time.
ObjState = Grounging State
if Newobj = True & GrowingT = False &
ObjFeature = True & ObjRelation = True (4)
Stable State: Stable state means the GrowingTime is
finished, and object’s features are satisfied, and
relationship satisfaction is existed. Relationship
satisfaction means the object is not removed or
occluded. If the state of abandoned object is changed to
the stable state, system will lunch alarm to user and the
object will be boxed and feedback to user.
ObjState = Stable State
if GrowingT = True & ObjFeature = True &
ObjRelation = True (5)
Aging State: Aging state happened when the
relationship is not contented or feature conditions are
not satisfied, it is usually stand for occlusion or removal
object. In that time, the state will be changed to the
aging state. We also give the object a time buffer Fig. 8.Abandoned object detection results: when the
(AgingTime) to age, once the conditions of the stable abandoned object is occlude, the bounding box is still
state are fitted among the AgingTime. The state is keeping it; when the object is removed, the bounding
returned to the stable state again, otherwise, the state is box is also keeping for duration.
changed to the dead state finally.
The top of picture in the Fig.8 show the fact that the
ObjState = Aging State system can still select the object for a while when the
if GrowingT = True & AgingT = False & object is occluded, others are show the fact that the
ObjFeature = False | ObjRelation = False (6) system can still select the object for a while when the
object is picked off.
Dead State: When the state is changed into the dead
state, It means the object should be ignored and deleted
from Set(objectj(t-n) 1<n<t-1) in a few minute. In testing, we use Sensitivity and Specificity to verify
our result。All the action in those test videos have at
ObjState = Dead State least one abandoned object, objects which stay over 3
if AgingT = True (7) seconds will be considered the non-abandoned object
event. Those test videos include 10 abandoned objects.
If the state is the Growing State, but each conditions of Table 2 is about the definition of True positive (TP)、
the Growing State are not satisfied, the GrowingT will
696
7. False positive (FP)、False negative (FN), True negative [2] Wu-Yu Chen, Meng-Fen Ho, Chung-Lin Huang, S. T.
(TN) and Table 3is the table of results of 10 test video. Lee,Clarence Cheng,” DETECTING ABANDONED
OBJECTS IN VIDEO-SURVEILLANCE SYSTEM”,”
The 21th IPPR Conference on Computer Vision, Graphics,
Table.2. Definition of TP、FP、FN、TN and Image Processing” ,CVGIP2008.
Illustration
TP Abandoned objects are detected correctly [3]A.Singh ,S.Sawan ,M.Hanmandlu ,V.K.Madasu ,B.C.Love
ll,”An abandoned object detection system based on dual
FP Non-abandoned objects are detected as
background segmentation”,” Proceedings of the 2009
abandon objects Sixth IEEE International Conference on Advanced Video
TN Non-abandoned object are detected as non- and Signal Based Surveillance”, Pages: 352-357 .
abandon objects
FN Abandoned objects are detected as non- [4] J.Wang and W. Ooi. “Detecting static objects in busy
abandon objects scenes”. Technical Report TR99-1730, Department of
Computer Science, Cornell University, February1999.
The 10 test video is from popular databases and our
[5] M. Bhargava, C-C. Chen, M.S. Ryoo, and J.K. Aggarwal,
results show that the methods to solve the problem “Detection of Abandoned Objects in Crowded
abandon object detection are efficient in the two points Environments”, in Proceedings of IEEE Conference on
with high accuracy and low computing cost. The Advanced Video and Signal Based Surveillance, 2007,
sensitivity is 90% and specificity is 92.6%. This shows pp. 271 – 276
high accuracy by applying our methods. The average [6] R. Mathew, Z. Yu and J. Zhang, “Detecting New Stable
FPS is around 30 fps, and real time test using IP Objects in Surveillance Video” in Proceedings of the
camera is about 25 fps. This shows cheap computing IEEE 7th Workshop on Multimedia Signal Processing,
cost and the methods can work real time. 2005, pp. 1 – 4.
[7] F. Porikli, Y. Ivanov, and T. Haga, “Robust Abandoned
Table.3. Result of Sensitivity and Specificity Object Detection Using Dual Foregrounds”, Eurasip
Positive Negative Journal on Advances in Signal Processing, vol. 2008,
Positive TP = 9 FP = 4 2008.
Negative FN = 1 TN = 50
Sensitivity = TP / (TP + FN) = 90.0%
Specificity = TN / (FP + TN) = 92.6%
4. CONCLUSION
In this paper, the results are reasonable by applying
some techniques of foreground analysis 、 feature
filtering and LCSM mechanism. However, the
techniques are not flawless. For example, the updating
BP(t) still has noise in a long period of time, even though
a mechanism is proposed to replace BP(t). Missing
abandon object detection can not be avoided. The next
problem is about feature filtering. In normal situation,
feature filtering can separate human and object, but it
could make a false decision due to non-completed
foreground detection, or people whose foregrounds are
looked like rectangular and static object. In the future,
we will make this technology of an abandoned object
detection more reliable and useful in video surveillance.
REFERENCES
[1] Fatih Porikli ,”Detection of Temporarily State Regions
by Processing Video at Different Frame Rates ” , ”
Advanced Video and Signal Based Surveillance2007”,
AVSS 2007. IEEE Conference on 5-7 Sept.
697
8. HIERARCHICAL METHOD FOR FOREGROUND DETECTION USING
CODEBOOK MODEL
Jing-Ming Guo (郭景明), Member, IEEE and Chih-Sheng Hsu (徐誌笙)
Department of Electrical Engineering
National Taiwan University of Science and Technology
Taipei, Taiwan
E-mail: jmguo@seed.net.tw, seraph1220@gmail.com
ABSTRACT [6], the gradient information is employed to detect
This paper presents a hierarchical scheme with shadows, and which achieves good results. Yet, multiple
block-based and pixel-based codebooks for foreground steps are required for removing shadows, and thus it
detection. The codebook is mainly used to compress increases the complexity. Zhang et al. [24] proposed ratio
information to achieve high efficient processing speed. In edge method to detect shadow, and the geometric
the block-based stage, 12 intensity values are employed to heuristics was used to improve the performance. However,
represent a block. The algorithm extends the concept of the main problem of this scheme is its high complexity.
the Block Truncation Coding (BTC), and thus it can Most foreground detection methods are pixel-based,
further improve the processing efficiency by enjoying its and one of the popular methods is the MOG. Stauffer and
low complexity advantage. In detail, the block-based Grimson [7], [8] proposed the MOG by using multiple
stage can remove most of the noises without reducing the Gaussian distributions to represent each pixel in
True Positive (TP) rate, yet it has low precision. To background modeling. The advantage is to overcome
overcome this problem, the pixel-based stage is adopted non-stationary background which provides better
to enhance the precision, which also can reduce the False adaptation for background modeling. Yet it has some
Positive (FP) rate. Moreover, the short term information is drawbacks: One of which is the standard deviation (SD);
employed to improve background updating for adaptive if SD is too small, a pixel may easily be judged as
environments. As documented in the experimental results, foreground, and vice versa. Another drawback is that it
the proposed algorithm can provide superior performance cannot remove shadows, since the matching criterion
to that of the former related approaches. simply indicates that a pixel is classified as background
when it is within 2.5 times of SD. Chen et al. [9] proposed
Keywords- Background subtraction; foreground a hierarchical method with MOG, the method also
detection; shadow detection; visual surveillance; BTC employs block and pixel-based strategy, yet shadows
cannot be removed with their method. Martel-Brisson and
1. INTRODUCTION Zaccarin [10] presented a novel pixel-based statistical
In visual surveillance, background subtraction is an approach to model moving cast shadows of non-uniform
important issue to extract foreground object for further and intensity-varying objects. This approach employs
analysis, such as human motion analysis. A challenge MOG’s learning strategy to build statistical models for
problem for background subtraction is that the describing moving cast shadows, yet this model requires
backgrounds are usually non-stationary in practice, such more time for learning. Benedek and Sziranyi [23] choose
as waving tree, ripple water, light changing, etc. Another the CIE L*u*v space to detect foregrounds or shadows by
difficult problem is that the foreground generally suffers MOG, and the texture features are employed to enhance
from shadow interference which leads to wrong analysis the segmentation results. The main problem of this
of foreground objects. Hence, background model is highly scheme is its low processing speed.
demanded to be adaptively manipulated via background Kim et al. [11] presented a real-time algorithm for
maintenance. In [1], some of the well-known issues in foreground detection which samples background pixel
background maintenances are introduced. values and then quantizes them into codebooks. This
To overcome shadows, some well-known methods can approach can improve the processing speed by
be adopted for use, such as RGB model, HSV model, compressing background information. Moreover, two
gradient information and ratio edge. In particular, features, layered modeling/detection and adaptive
Horprasert et al. [2] proposed to employ statistical RGB codebook updating, are presented for further improving
color model to remove shadow. However, it suffers from the algorithm. In [12] and [13], the concept of Kohonen
some drawbacks, including 1) more processing time is networks and Self-Organizing Maps (SOMs) [14] were
required to compute thresholds, 2) non-stationary proposed to build background model. The background
background problem cannot be solved, and 3) a fixed model can automatically adapt to a self-organizing
threshold near the origin is used which offers less manner and without a prior knowledge. Patwardhan et al.
flexibility. Another RGB color model proposed by [15] proposed robust foreground detection by propagating
Carmona et al. [18] can solve the third problem of [2], yet layers using the maximum-likelihood assignment, and
it needs too many parameters for their color model. In [3] then clustered into “layers”, in which pixels that share
and [4], the HSV color model is employed to detect similar statistics are modeled as union of such
shadows. The shadows are defined by a diminution of the nonparametric layer-models. The pixel-layer manner for
luminance and saturation values when the hue variation is foreground detection requires more time for processing, at
smaller than a predefined threshold parameter. In [5] and around 10 frames per second on a standard laptop
698
9. computer. In our observation, classifying each pixel to model building. In our observation, the CB employs more
represent various types of features after background information to build the background, yet the proposed
training period is good manner for building adaptive method employed the concept of MOG [7] by simply
background model. Also, it can overcome the using weights to classify foreground and background and
non-stationary problem for background classification. thus can provide even higher efficient advantage and the
Another foreground detection method can be classified precision is also higher than that of CB. Another
as texture-based, in which Heikkila and Pietikainen [16] difference between the proposed method and CB is that
presented efficient texture-based method by using the two stages, namely block-based and pixel-based
adaptive local binary pattern (LBP) histograms to model stages, are involved in background model construction,
the background of each pixel. LBP method employs while simply one stage is used in CB. In block-based
circular neighboring pixels to label the threshold stage, multiple neighboring pixels are classified as a unit,
difference between neighboring pixels and the center while a pixel is the basic unit in pixel-based. Figure 1
pixel. The results are considered as a binary number shows the structure of the background model which
which can fully represent the texture of a pattern. composes of block-based and pixel-based stages. The
In this study, a hierarchical method is proposed for details are introduced in the following sub-sections.
background subtraction by using both block and
pixel-based stages to model the background. This Background Model
block-based strategy is from the traditional compression
scheme, BTC [17], which divides an image into
non-overlapped blocks, and each pixel in a block is Block-based Pixel-based
substituted by a high mean or low mean. BTC algorithm
simply employs two distinct intensity values to represent
a block. Yet, in this paper, four intensity values are Fig. 1. Structure of background construction model.
employed to represent a block, and each pixel in a block
is substituted by the high-top mean, high-bottom mean, 2.1 Block feature in block-based stage
low-top mean or low-bottom mean. The block-based The block feature used in this study is extended from
background modeling can efficiently detect foreground BTC algorithm which maintains the first and the second
without reducing TP, yet the precision is rather low. To moment in a block. Although BTC is a highly efficient
overcome this problem, the pixel-based codebook strategy coding scheme, we further reduce its complexity by
is involved to compress background information to modifying the corresponding high mean and low mean.
simultaneously maintain its high speed advantage and Moreover, we extended the BTC algorithm by using four
enhance the accuracy. Moreover, a modified color model intensity values to represent a block to increase the
from the former approach [18] is used to distinguish recognition confidence, each pixel in a block is
shadow, highlight, background, and foreground. The substituted by the High-top mean (Ht), High-bottom (Hb),
modified structure can simplify the used parameters and Low-top (Lt) or Low-bottom (Lb) means. Suppose an
thus improve the efficiency. As documented in the image is divided into non-overlapped blocks, and each
experimental results, the proposed method can effectively block is of size M x N. Let x1, x2, ..., xm be the pixel
solve the non-stationary background problem. One values in a block, where m=MxN. The average value of a
specific problem for background subtraction is that a block is
1 m
x xi
moving object becomes stationary foreground when it (1)
stands still for a while during the period of background m i 1
construction. Consequently, this object shall become a The high mean Hm and low mean Lm is defined as
part of the background model. For this, the short term m m
information is employed to solve this problem in (x i | xi x ) (x i | xi x )
background model construction. Hm i 1
, Lm i 1 (2)
The paper is organized as below. Section 2 presents q mq
initial background model in background training period where q denotes the number of pixels equal or greater
that includes the block-based and pixel-based codebooks. than x . Notably, if q is equal to m or 0 then all the
Section 3 reports background subtraction by the proposed values in a block are forced to be identical to x . In this
hierarchical scheme. Section 4 introduces the short term case, the Ht, Hb, Lt and Lb are assigned with x .
information with background model. Section 5 documents Otherwise, three thresholds ( x , Hm and Lm ) are
experimental results, in terms of accuracy and efficiency, employed to distinguish the four intensity values, Ht, Hb,
and compares with former MOG [7], Rita’s method [4], Lt and Lb as defined below,
CB [11], Chen’s method [9] and Chiu’s method [22]
m m
schemes. Section 6 draws conclusions. (x i | xi Hm) (x i | x xi Hm)
Ht i 1
, Hb i 1 (3)
p q p
2. INITIAL BACKGROUND MODEL m m
In this study, two types of codebooks are constructed for ( x | Lm x x )
i i ( x | x Lm)
i i
block-based and pixel-based background modeling. The Lt i 1
, Lb (4)
i 1
proposed background modeling is similar to CB [11]. The mqk k
advantage of CB is its high efficiency in background where p denotes the number of the pixels equal or greater
699
10. than Hm. If p is equal to q or 0, then both Ht and Hb are vblock _ L xblock_ t
assigned with a value equal to Hm. The variable k denotes
1
the number of the pixels which are smaller than Lm. If k wL
is equal to (m-q) or 0, then both Lt and Lb are assigned N
with a value equal to Lm. In RGB color spaces, a divided IV. Otherwise, update the matched codeword cm,
block of a specific color space is transformed to yield a consisting of Vblock_m and wm, by setting:
block _ m (1 )vblock _ m x block_t
set of Ht, Hb, Lt, and Lb. Thus, a block is represented by v (5)
Vblock=(RHt, GHt, BHt, RHb, GHb, BHb, RLt, GLt, BLt, RLb, GLb, 1
BLb).
wm wm
N
The reason that the proposed block feature can provide end for
superior performance than the former schemes is that Step 3: select background codeword in codebook:
unlike the traditional BTC, The codeword size for a block I. Sort the codewords in descending order
is increased from six to twelve to better characterize the according to their weights
texture of the block for the block-based background b
reconstruction. Moreover, the BTC-based strategy can II. B arg min wk T (6)
b
k 1
significantly reduce the complexity to adapt to a real-time
application. Compared with the former Chen’s
hierarchical method [9], in which the texture information where α denotes the learning rate and which is empirically
is employed to form a 48-dimension feature, the proposed set at 0.05 in this study. Step 3 is to demarcate the
method can effectively classify foreground and background with the way as that in MOG [7]. A codeword
background by simply using 12 dimensions. Moreover, with a bigger weight has higher likelihood of being a
the processing speed is superior to Chen’s method. background codeword in the background codebook. The
codewords are sorted in descending order according to
2.2 Initial background model for block-based their weights, and then select the codewords meet Eq. 6 as
codebook the background codebook, where T denotes an empirical
In block-based stage, an image is divided into threshold with value 0.8.
non-overlapped blocks, and each block can construct its
own codebook. Using N training sequence to build the 2.3 Initial background model for pixel-based codebook
block-based codebook, thus each codebook of a block has Algorithm for codebook construction in pixel-based stage
N block vectors for training the background model. Let X is similar to block-based stage when a basic unit is
be a training sequence for a block consisting of N block changed from a block to a pixel. Let X be a training
vectors: X={xblock_1,xblock_2,…,xblock_N}. Let C=(c1, c2,…, sequence for a pixel consisting of N RGB vector:
cL) represent the codebook for a block consisting of L X=(xpixe_1, xpixel_2, …, xpixel_N). Let F=(f1, f2,…, fL ) be the
codewords. Each block has a different codebook size codebook for a pixel consisting of L codewords. Each
based on codewords’ weights. Each codeword ci, i=1, …, pixel has a different codebook size based on codewords’
L, consisting of an block vector vblock_i=( RHt _ i , GHt _ i , weight. Each codeword fi, i=1…L, consisting of a pixel
vector vpixel_i=(Ri, Gi, Bi) and a weight wi.
BHt _ i , RHb _ i , GHb _ i , BHb _ i , RLt _ i , GLt _ i , BLt _ i , RLb _ i , In the step 2(II), find the codeword fm matching to
GLb _ i , BLb _ i ) and a weight wi. xpixel_t based on the match_function(xpixel_t, vpixel_m) which
will be introduced in Section 2.4. In the step 2(III), if F=0
In the training phase, an input block vector xblock
or there is no match, then create a new codeword f L by
compares with each codeword in the codebook. If no
assigning xpixel_t to vpixel_L. Otherwise, update fm by
match is found or there is no codeword in the codebook,
assigning (1 )v pixel _ m x pixel_t to vpixel_m. In the step 3,
the input codeword is created in the codebook. Otherwise,
update the matched codeword, and increase the weight the parameters α and T are identical to that of the
value. To determine which codeword is the best matched block-based stage.
candidate, the match function as introduced in sub-section The proposed block-based and pixel-based procedures
2.4 is employed for measuring. The detailed algorithm is are used to establish the background mode, which is
given below. similar to CB [11]. The main difference is that the CB
employs more information to build the background, yet
Algorithm for block-based codebook construction the proposed method employed the concept of MOG [7]
Step 1: L 0 , C 0 (empty set) by simply using weights to classify foreground and
Step 2: for t=1 to N do background and thus can provide higher efficient
I. xblock_t=(RHt_t , GHt_t , BHt_t , RHb_t , GHb_t , BHb_t , advantage and the precision is also higher than that of CB
RLt_t , GLt_t , BLt_t , RLb_t , GLb_t , BLb_t) [11].
II. find the codeword cm in C={ ci | 1 i L }
2.4 Match function
matching to xblock_t based on: The match function for n dimensions employed in this
Matching_function(xblock_t, vblock_m)=true
study in terms of squared distance is given as below
III. If C=0 or there is no match, then L L 1.
dTd
Create a new codeword cL by setting: 2 (7)
N
700
11. where d (I ) 1 ( x v) , and the empirical value of the Input Sequence
standard deviation σ is in between 2.5 and 5, with 2.5 as a
tight bound and 5 as a loosen bound; The identity matrix
is of size NxN, where N=12 and 3 in block-based and
Foreground
pixel-based stages, respectively. The match function can Detection
be applied for n dimensions; the proposed block vector
vblock in the block-based stage is of 12 dimensions, and Foreground
Foreground
pixel vector vpixel is of 3 dimensions. A match is found as Block-based stage Pixel-based stage
sample falling within λ=2.5 standard deviation of the
mean of one of the codeword. The output of the match Background
function is as below:
d Td Background
true, 2 ; (8)
Update Pixel-based Shadows
match _ function ( x, v) N
background Model Highlight
false, otherwise.
In the pixel-based phase, the color model is exploited to
classify a pixel simply when no match is found. The
Short term Construct short term Construct short term
strategy can significantly improve the efficiency. information information information
for Block-based stage for Pixel-based stage
3. FOREGROUND DETECTION
The proposed foreground detection stage can also be Weight > T_add Weight > T_add
divided into block-based and pixel-based stages. In the
block-based stage, the match function introduced in
Insert short term Insert short term
section 2.4 is employed to distinguish background or Insert information information
foreground. If a block is classified as background, then to Block-based
background model
to Pixel-based
background model
which is fed to pixel-based background model updating
for adapting to the current environment conditions. Yet, Fig. 2. Flow chart for foreground detection.
this raises a disadvantage by increasing the processing
time for foreground detection. For this, the threshold 3.2 Pixel-based background updating model
T_update is used to enable the updating phase, which To adapt to the current environment conditions, when a
means the updating is conducted every T_update frames. block is classified as background in block-based stage, the
Empirically, the T_update is set at 2~5 to guarantee the corresponding pixel-based background model needs to be
adaptation of the background model. Using the color updated. Yet this raises a disadvantage by increasing the
model function which will be introduced in Section 3.5 processing time for foreground detection. For this, the
and the match function can distinguish the current frame threshold T_update is used to enable the updating phase,
into four states, background, foreground, high light and which means the updating is conducted every T_update
shadows. Figure 2 shows the proposed foreground frames. Empirically, the T_update is set at 2~5 to
detection flow chart, and which is detailed in the guarantee the adaptation of the background model.
following sub-sections. Meanwhile, the match function is used to find the
matched codeword for updating. The details of the
3.1 Foreground detection with block-based stage algorithm are organized as below.
Block-based stage is employed to separate background
and foreground. Although the block-based stage has low Algorithm for pixel-based background model updating
precision, it can ensure the detected foreground without Step 1: xpixel=(R,G,B)
reducing TP rate when σ is set with a small value as a Step 2: if the accumulated time is equal to T_update, then
tight bound. However, a small σ increases FP rate as well. do
Therefore, there is a trade-off in choosing the value of σ. 1) for all codewords in B in Eq. (6), find the
Herein, the empirical value is set at 2.5 in this work. codeword fm matching to xpixel based on :
Match_function(xpixle, vpixel_m)=true
Algorithm for background subtraction using block-based Update the matched codeword as
codebook v pixel _ m (1 )v pixel _ m x pixel
Step 1: xblock=(RHt, GHt, BHt, RHb, GHb, BHb, RLt, GLt, BLt,
RLb, GLb, BLb)
Step2: for all codewords in B in Eq. (6), find the 3.3 Foreground detection with pixel-based stage
codeword cm matching to xblock based on : If a pixel is classified as foreground in block-based, then
Match_function(xblock, vblock_m)=true
input pixel xpixel=(R,G,B) proceeds to pixel-based stage to
Update the matched codeword as in Eq. (5) determine the state of a pixel. Algorithm for pixel-based
background subtraction is similar to block-based. The
Foreground if there is no match;
block )
Step 3: BS(x only difference is on the match function. Herein, the color
Background otherwise. model and match function are used to determine a pixel
vector belongs to shadow, highlight, background, or
701
12. foreground. The detailed algorithm is organized below. between 2 and 3.5.
I _ max v , I _ min v
Algorithm for background subtraction using pixel-based (11)
Where β>1 and γ<1. In our experiments, β is set in
codebook
Step 1: xpixel=(R,G,B) between 1.1 and 1.25, and γ is in between 0.7 and 0.85.
Step 2: for all codewords in B in Eq. (6), find the The range [I_max, I_min] is used for measuring comp_I;
codeword fm matching to xpixel based on : if comp_I is not in this range, the pixel is classified as
foreground. The overall color model is organized as
s color _ mod el _ function ( x pixel , v pixel )
below:
If s is classified as background, then do
v pixelm (1 )v pixelm x pixel Color_model_function(x, v) =
Background if match_func tion (x, v) true;
3.4 Color model proj_I
Highlight else tan θ & v comp _ I I_max;
In [18], the proposed color model can classify a RGB comp_I
color pixel into shadow, highlight, background, and
Shadow proj_I
foreground. However, many parameters are employed in else tan θ & I_min comp _ I v ;
comp _ I
this model, which leads to a disadvantage by increasing
the computational complexity. In this work, the number of Foreground otherwise.
parameters is reduced to three, namely θ, β, and γ, to (12)
reduce the complexity. Figure 3 shows the modified color 4. BACKGROUND MODEL UPDATING WITH
model. SHORT TERM INFORMATION
G As indicated in Fig. 2, the background model updating
with the short term information is divided to two stages:
I_max
First, construct short term information model with
foreground region; second, if a codeword in short term
information model accumulates enough weights, this
_I
mp
Vi
codeword will be inserted to background model for
Co
Proj_I
I_min foreground detection. This strategy yields an advantage: A
Xi (input pixel)
user can control the lasting period of a stationary
θ foreground which can be inserted to the background
model. However, a non-stationary foreground region will
R lead to too much unnecessary codewords in short term
B
information model. For this, time information is added to
Fig. 3. Modified color model a codeword and a threshold is used to decide whether a
codeword is reserved or deleted. In addition, identical
Given an input pixel vector x, the match function is strategy is applied to background model as well.
employed for measuring if it is in background state. If the The procedures of the short term information
vector x is classified as foreground by the match function, construction for block-based and pixel-based phases are
then we compare the angle, tanθ. If proj_I/comp_I is identical. The main concept is to add an additional model
greater than tanθ, the vector x must be foreground. S called the short term information model. The S records
Otherwise, the input vector x may fall within this color foreground regions after foreground detection. The model
model bound. Subsequently, the variables I_max and construction is similar to that of Section 2. Yet, herein the
I_min are calculated; if comp_I falls in between v and time information (S_time) is added for a codeword. In
I_max, the pixel is classified as highlight; if the pixel addition, three additional thresholds, S_delete, T_add, and
value is not in between I_max and I_min, the pixel is B_delete, are employed: S_delete is used to determine
classified as foreground. whether a codeword is reserved or deleted. If the current
Given an input pixel vector x=(R,G,B) and time subtracted by the last time information of a
background vector v ( R , G , B ) , codeword is smaller than S_delete, then the codeword is
unnecessary in the codebook, and thus it is deleted from
x R2 G2 B2 , v R 2 G 2 B 2
the codebook. T_add is used to decide whether a
x, v ( RR GG BB ) codeword is inserted to background model. If a codeword
x, v accumulates enough weights, then this codeword can be a
comp _ I x cos (9) part of background model. B_delete is used to determine
v
whether a codeword is reserved or deleted in background
proj _ I x comp _ I 2
2
(10) model when short term information is inserted to
background model, and sets the parameter B_delete
where comp_I is used to determine a pixel vector belongs
equals to T_add times S_delete (which is the worst case)
to shadow or highlight; where proj_I is used to measure
to ensure reserve the last updated time for codeword in
the nearest distance with background vector v. If
background model. The overall procedure of the
proj_I/comp_I is greater than tanθ, then the pixel is
algorithm is organized as below.
classified as foreground. Herein, θ is empirically set in
702
13. Algorithm for short term information model construction green and blue, are employed to represent shadows,
Step 1: Given a background model B with the initial highlight and foreground, respectively. Figure 4(b) shows
background model, create a new model S for the detected results using the block-based stage with
recording foreground regions. block of size 10x10, in which most of the noises can be
Step 2: Add time information parameter (B_time) for removed. Figure 4(c) shows the results obtained by the
every codeword in B for recording current time hierarchical block-based and pixel-based stages.
(C_time). S is assigned with an empty set. Apparently, the pixel-based stage can significantly
Step 3: enhance the detected precision. Yet, we would like to
I. Find a match codeword in B for an input image. point out a weakness of the proposed method. As it can be
The “match” is determined by when a codeword seen in the third row of Fig. 4 (Highway_I), when the
is found during the updating codeword in Eqs. color of the shadow is dark, it will be classified as
(5) and B_time is equal to C_time. foreground. Since a lower threshold is set for the color
II. If no match codeword is found in B, then search model of the proposed method, when the value exceeds
the matched codeword in S for foreground the threshold it will be classified as shadows. The problem
region, and do the following steps: can be eased by increasing the threshold. Yet, as it can be
i. find the codeword sm in S={ si | 1 i L } seen in Fig. 4, some of the foregrounds are classified as
whether matching to x (input vector) based shadows by doing this. In summary, the proposed method
on the matching function. performs well for small intensity of shadows, yet it cannot
ii. If S=0 or no match, then L L 1 . Create a provide perfect performance for greater intensity of
new codeword sL by setting: shadows.
vL x
wL=1
S_timeL=C_time
iii. Otherwise, update the matched codeword s m,
consisting of vm, wm and S_timem, by
setting:
v (1 )v x
m m
w w 1
m m
S_time C_time
m
Step 4: S {sm | (C _ time S _ timem ) S _ delete}
Step 5: Check the weight of every codeword in S. If the
weight of the codeword is greater than T_add,
then do the following steps:
I. B {cm | (C _ time B _ timem ) B _ delete}
(a) (b) (c)
II. Add codeword as short term information at the
Fig. 4. Classified results of sequence [19] for IR (row 1),
head of B.
Campus (row 2) and Highway_I (row 3) with shadow
Step 6: Repeat the algorithm from Step 3 to Step 5.
(red), highlight (green), and foreground (blue). (a)
Original image, (b) block-based stage only with block of
5. EXPERIMENTAL RESULTS
size 10x10, and (c) proposed method.
For measuring the accuracy of the results, the criterions
FP rate, TP rate, Precision, and Similarity [12] are
Figure 5 shows the test sequence WT [21] of
employed as defined below:
non-stationary background with waving tree containing
fp tp
FP rate TP rate 287 frames of size 160x120. Compared with the five
fp tn , tp fn , former methods, MOG [7], Rita’s method [4], CB [11],
tp
Similarity
tp Chen’s method [9] and Chiu’s method [22] the proposed
Precision
tp fp , tp fp fn , method can provide better performance in handling
non-stationary background. Moreover, show the detected
where tp, tn, fp, and fn denote the numbers of true
results with different block sizes using simply
positives, true negative, true positives, and false negative,
block-based codebook. Apparently, most noises are
respectively; (tp + fn) indicates the total number of pixels
removed without reducing TP rate. Most importantly, the
presented in the foreground, and (fp + tn) indicates the
processing speed is highly efficient with the block-based
total number of pixels presented in the background. In our
strategy. Yet, a low precision is its drawback. To
experimental results is without any post processing and
overcome this problem, the pixel-based stage is involved
short term information for measuring the accuracy of the
to enhance the precision, and which can also reduce the
results.
FP rate. And, show the detected results using the proposed
Figure 4 shows the test sequences [19] of size
hierarchical scheme (block-based stage and pixel-based
320x240 with IR (row 1), Campus (row 2) and
stage) with various block sizes. Figures 6(a)-(d) shows the
Highway_I (row 3). To provide a better understanding
accuracy values, FP rate, TP rate, Precision, and Similarity,
about the detected results, three colors, including red,
703