2. TraVis: Web-Based
Vehicle Counter with
Traffic Congestion
Estimation Using
Computer Vision
Presented by:
Aguirre, Byron Franco
Alcantara, Jan Andre
Trinidad, John Ferdinand
Thesis adviser:
Dr. Joel P. Ilao
5. Problem Statement
In the Philippines, many traffic cameras are being
installed where recorded traffic videos are monitored by
traffic personnel. However, these cameras might have
been recording videos 24/7 but the people who
monitor them might not be watching. The
“unmonitored” videos then become unused since a
traffic situation has already occurred in an area. With
this, a traffic monitoring system that makes
use of all recorded videos can provide
information to the public regarding the traffic situation in
an area.
7. General Objective
• The aim of this study is to develop a
vision-based system for counting
vehicles and estimating traffic
congestion levels in road sections
installed with traffic surveillance cameras,
accessible through a web interface.
8. Specific Objectives
● gather traffic videos taken by roadside
traffic surveillance cameras;
● develop machine vision algorithms
for processing traffic videos that can:
o classify vehicles according to type;
o track and count the number of vehicles
seen;
9. Specific Objectives (cont.)
● generate graphs of traffic congestion
levels based on vehicle count statistics
estimated from traffic videos;
● design a suitable database for efficiently
storing traffic surveillance videos and
corresponding traffic statistics; and
● develop an interactive web interface for
accessing relevant data and information from
traffic surveillance videos;
10. Project Scope and Limitation
This study will count and classify vehicles
seen in a traffic surveillance camera’s
Field Of View via application of computer
vision techniques. The system that will be
developed can:
● identify and classify vehicles
● distinguish adjacent vehicles
11. Project Scope and Limitation
(cont.)
● estimate traffic congestion levels
based on vehicle counts
● allow users to view results using an
intuitive web interface
12. Project Scope and Limitation
(cont.)
● no hardware implementation of video
acquisition
● use traffic videos from Archer’s Eye
13. Project Scope and Limitation
(cont.)
● Factors affecting the quality of counting:
o low quality videos
o slow frame rates
o variations in lighting
o occlusion
14. Project Scope and Limitation
(cont.)
● Factors that are not considered
o Traffic at night
o Swerving vehicles
15. Project Scope and Limitation
(cont.)
● IP Cameras were used
● Videos obtained through ITS
● Video file: .mp4 at 6 FPS
16. Project Scope and Limitation
(cont.)
● Can detect:
Small
Vehicles
Medium
Vehicles
Large
Vehicles
Car SUV Truck
Sedan Jeep Bus
Van
17. Project Scope and Limitation
(cont.)
● Performance assessed by:
o Accuracy
● Against:
o Occlusion
o Number of vehicles present
24. Architectural Design
● Video Acquisition
o Video Input
o Frame Extraction
● Vehicle Detection
● Statistics Generation
25. Video Input
The videos are stored locally and accessed
by the server via the URL passed when
the used has chosen a video.
26.
27. Invoke Matlab process
After choosing a video to process, a java servlet
invokes a Matlab instance to process the video.
With the URL as the input, the servlet passes in
the value of the video directory.
28. Architectural Design
● Video Acquisition
o Video Input
o Frame Extraction
● Vehicle Detection
● Statistics Generation
29. Frame Extraction
The input video is converted into frames to prepare them
for processing in the Vehicle Detection module.
30. Architectural Design
● Video Acquisition
● Vehicle Detection
o Object Detection
o Tracking
o Classification
o Counting
o Congestion Estimate
● Statistics Generation
33. Architectural Design
● Video Acquisition
● Vehicle Detection
o Object Detection
o Tracking
o Classification
o Counting
o Congestion Estimate
● Statistics Generation
41. Architectural Design
● Video Acquisition
● Vehicle Detection
o Object Detection
o Tracking
o Classification
o Counting
o Congestion Estimate
● Statistics Generation
42. Vehicle Tracking
Vehicle tracking is
performed by using the
Kalman Filter to estimate
the next locations of the
vehicles. Once a vehicle
is tracked, classification
and counting follows.
Green box = detected
Red box = predicted
43. • Why Kalman filter?
– Because of its consistent performance in the
previous researches referenced in this
project.
– The future states of the vehicles, represented
by their locations in the image are estimated
using the Kalman filter via their centroids.
44. “An object at rest will remain at rest unless an
external force acts upon it. An object in motion
will not change its velocity unless an external
force acts upon it.”
45. • Why Kalman filter?
– Because it makes use of Newton’s Law of
Motion, which is applicable to the project’s
objectives and scope.
49. Architectural Design
● Video Acquisition
● Vehicle Detection
o Object Detection
o Tracking
o Classification
o Counting
o Congestion Estimate
● Statistics Generation
51. Vehicle Classification
Once potential vehicles
are tracked, they will be
classified into different
types such as small,
medium and large
vehicles.
*The basis of the sizes (in pixels) of
the vehicle types were obtained
by getting the area properties of
the binary mask of a vehicle via
the Matlab function,
regionprops().
Types Sizes
carAreas.smallAreaMin 300
carAreas.smallAreaMax 800
carAreas.mediumAreaMin 801
carAreas.mediumAreaMax 5000
carAreas.largeAreaMin 5001
carAreas.largeAreaMax 45000
52. Architectural Design
● Video Acquisition
● Vehicle Detection
o Object Detection
o Tracking
o Classification
o Counting
o Congestion Estimate
● Statistics Generation
53. Vehicle Counting
When the detected vehicles are classified into their types,
the system will now count their number in preparation for
the traffic congestion estimation.
*A tracked vehicle’s count was done on its first detection in the scene.
54. Counts: Manual vs System
August 26, 2014
Manual Count System Count
Small Medium Large Small Medium Larg
e
0700-0800 130 191 14 267 354 21
0800-0900 167 210 17 280 372 19
0900-1000 365 287 22 252 289 23
1000-1100 309 241 21 268 227 18
1100-1200 330 288 17 277 271 15
August 25, 2014
Manual Count System Count
Small Medium Large Small Medium Larg
e
0700-0800 147 248 11 190 130 5
0800-0900 179 273 18 143 290 6
0900-1000 239 304 15 312 258 14
1000-1100 209 366 14 408 669 29
1100-1200 320 379 12 552 638 22
55. Counts: Manual vs System
(cont.)
One Time BG
Model
Frequent BG
Model
Manual Count
0600-0630 308 153 190
0630-0700 387 223 198
0700-0730 73 173 214
0730-0800 206 152 192
0800-0830 67 167 222
0830-0900 101 272 248
0900-0930 30 349 294
0930-1000 276 235 264
May 23, 2014, North Gate applying multiplying factor values
Manual Count Applying Mult Factor
Smal
l
Mediu
m
Larg
e
Smal
l
Mediu
m
Larg
e
0600-0700 71 70 24 109 108 46
0700-0800 131 142 18 202 219 35
0800-0900 131 121 35 202 189 67
0900-1000 133 134 45 205 206 86
1200-1300 142 145 47 219 223 90
56. Architectural Design
● Video Acquisition
● Vehicle Detection
o Object Detection
o Tracking
o Classification
o Counting
o Congestion Estimate
● Statistics Generation
57. Congestion Estimation
● Traffic congestion will be estimated by making use of
the number of vehicles per type that are present in the
scene and the number of vehicles that have passed in
the area in 5 minutes.
● Traffic congestion varies with time so the vehicle
counts and types per 5 minutes in the video will be used
to estimate the congestion.
58. Congestion Estimation (cont.)
Summarized statistics such as the counts of the
vehicles types, number of vehicles in the scene in five
minutes and the traffic congestion estimate are inserted
into the database after five minutes in the video has
passed.
64. Data Insertion
Insertion of data happens inside the Matlab process.
● ODBC/JDBC Connection must be ensured between
Matlab and database in order for them to communicate.
● After the connection, the data is immediately inserted
into the database every 5 minutes or (1800 frames).
66. Data Fetching and Displaying of Results
● Using TraVis’ web document, data is produced by the
system and then is stored into the database which is
shown into the web user interface.
● The data is fetched from the MySQL database through
the use of query statements which is then provided to
generate the graph using FusionCharts.
69. Experiments
● Actual program implementation
● Bounding Box Aspect Ratio Checking
● Vehicle Detection on Background Modeling
Scheme
● Accuracy of Vehicle Counting
70. Bounding Box Aspect Ratio Checking
System Output without Aspect Ratio Checking
74. Vehicle Detection on Background
Modeling Scheme
0
50
100
150
200
250
300
350
400
450
One Time BG
Model
Frequent BG
Model
Manual Count
Pearson’s One
Time BG Model to
Manual Count
Pearson’s
Frequent BG
Model to Manual
Count
-0.51302 0.87499
75. Accuracy of Vehicle Counting
• Comparing of system and manual counts
• Use of multiplying factor
• 𝑴𝒖𝒍𝒕𝒊𝒑𝒍𝒚𝒊𝒏𝒈 𝑭𝒂𝒄𝒕𝒐𝒓 =
𝑆𝑦𝑠𝑡𝑒𝑚 𝐶𝑜𝑢𝑛𝑡
𝑀𝑎𝑛𝑢𝑎𝑙 𝐶𝑜𝑢𝑛𝑡
• Use of Pearson’s Correlation Coefficient
• Percentage Relative Error
89. Pearson’s Correlation Coefficient
It is a measure of how well related the sets
of data are.
● High correlation: .5 to 1.0
● Medium correlation: .3 to .5
● Low correlation: .1 to .3
90. Pearson's Correlation Coefficient
System Count to
Manual Count
System Count with Multiplying
Factors to Manual Count
Aug 25 2014 0.532330547 0.665698685
Aug 26 2014 0.532337468 0.712235061
May 23 2014 0.962154466 0.955987229
Average 0.675607493 0.777973658
91. Percentage Relative Error
• Relative error is the measure of mean
absolute error to the mean value of the
measured dataset
• 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑟𝑟𝑜𝑟 =
𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟
𝑇𝑟𝑢𝑒 𝑉𝑎𝑙𝑢𝑒
∗ 100
• 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑟𝑟𝑜𝑟 =
𝑀𝑎𝑛𝑢𝑎𝑙 𝐶𝑜𝑢𝑛𝑡 −𝑆𝑦𝑠𝑡𝑒𝑚 𝐶𝑜𝑢𝑛𝑡
𝑀𝑎𝑛𝑢𝑎𝑙 𝐶𝑜𝑢𝑛𝑡
∗ 100
95. Percentage Relative Error
Date RelError %RelError
25-Aug 0.002315195 0.231519
26-Aug 0.00647172 0.647172
23-May 0.006049424 0.604942
Average 0.004945446 0.494545
96. Conclusion
• We developed a vision-based system for
counting vehicles and estimating traffic
congestion levels in road sections installed
with traffic surveillance cameras,
accessible through a web interface.
97. Conclusion
• Performance have been affected by the
following:
– Quality of video
– Background model
– Position of camera towards the road
– Detection of unwanted blobs/segmented
blobs
98. Conclusion
• System Counts has a .78 Correlation
Coefficient value
• It also has a 0.49 Percentage Relative
Error
99. Conclusion
• Gathered traffic videos from the Archer’s
Eye
• Developed Algorithms for Classification of
vehicles and track and count the number
of vehicles
• Generated Historical Graphs of Traffic
Congestion levels
100. Conclusion
• Designed a Database for storage of videos
and Traffic Statistics
• Developed an interactive Web interface
101. TraVis: Web-Based Vehicle Counter with
Traffic Congestion Estimation Using
Computer Vision
Presented by:
Aguirre, Byron Franco
Alcantara, Jan Andre
Trinidad, John Ferdinand
Adviser:
Dr. Joel P. Ilao
Editor's Notes
Hello Goodmorning panelists
We are the group Travis and our Thesis is a Web-based vehicle counter with traffic congestion estimation using computer vision.
I am,(names)
And we are advised by Dr. Joel Ilao.
This is the flow of the presentation. (wait few seconds)
So for Problem that we would like to tackle,
Nowadays, there are many traffic cameras being installed to monitor traffic. However, not all of them might have people watching all the time. These unmonitored videos become unused since the traffic situation changes in time. With that, a traffic monitoring system that makes use of all the recorded videos can provide information to the public regarding traffic situation.
Here are our objectives to guide us in this project…
For our general objective,
we aim to develop a vision-based system for counting vehicles and estimating traffic congestion levels that would be accessible through a web interface.
To develop our system, we came up with these specific objectives. First off, we have to gather traffic videos. Next would be the development of the machine vision algorithms for the processing the videos. These algorithms would classify vehicles according to type, count the number of vehicles and also track them.
After that, we generate graphs of traffic congestion levels based on the data made from the algorithms. To store these traffic statistics, we will design a suitable database and to show these results, we will develop an interactive web interface.
The scope of TraVis include:
Ideitification and classification of vehicles
Distinguishin adjacent vehicles in traffic
Estimate traffic congestion levels based on vehicle counts
Show the results to the users through a web application
A limitation of our study is that we have no hardware implementation such as in video acquisition. Our traffic videos were acquired from Dlsu’s Archer’s Eye.
We also considered the following that may affect the performance of our system.
Traffic at night and swerving vehicles were not considered
The IP cameras recorded videos at 6 FPS only, adding challenges when being processed for vehicle detection.
<HIDDEN> The following is the training data used for the purpose of testing the system.
In TraVis, vehicle classification is done through the following categories.
(The types of vehicles are divided to the following. Small, Medium and Large. You can see in the table the distribution of the types of vehicles.)
The performance of the system is assessed by the accuracy against occlusion and the number of vehicles present
This is a diagram of our system overview. The users selects a video to process through the web application. The system then selects the videos to process and have their status inserted into the database. After generating the traffic statistics by Matlab instances, the data is stored again in the database. These data would be continually uploaded to the web application so as to have the feel of semi-real time.
Our system implementation is divided into three modules: Video Acquisition, Vehicle Detection and Statistics Generation.
Our system implementation is divided into three modules: Video Acquisition, Vehicle Detection and Statistics Generation.
First is Video Acquisition..
This shows the process of the Video Acquisition module. It starts with an input traffic video, followed by getting the frames before they are processed.
(For our Video Acquisition module, this is a summary of the process of it.)
In the video acquisition module, user selects a video, travis then checks if the video is available, if not flag the database whether it was processed already (in the past) or not. If not yet processed, it will invoke a Matlab instance in the backend of the system. The selected video would be processed. During the processing, it will check (frequently) if the video is finished or not. If it is, the results are displayed and if not, check if 5 minutes – in video time, has already passed and then insert the data in the database. After that return to processing the video until finished.
This will cover further discussions of the sub-modules in the video acquisition. Starting with the video input sub-module…
In travis, videos used were locally stored and accessed by the server through their file locations.
These are the screens of the web application during the selection of videos.
When the process button is clicked, a Matlab instance would be invoked.
The next sub-module is Frame extraction…
In this sub-module, video frames are extracted and processed one by one.
The next module that we would discuss is the Vehicle Detection Module. This module covers majority, if not all, of the processing in TraVis.
We begin with the frames as inputs. Then the object detection sub-module prepares the frames for vehicle tracking. Once the vehicles are tracked, they are classified according to the types mentioned earlier. Vehicle counts are done when vehicles have been classified. The estimation of traffic congestion level follows when necessary data are obtained such as the vehicle counts and the classifications. Once these data are gathered, they are inserted into the database to be displayed to the user.
(We start with the flow of the vehicle detection. First, a sequence of images would undergo object detection algorithms. These potential vehicles are then tracked, classified and counted. After getting the counts, estimation of the congestion would be done and then all of the data would be inputted in the Travis Database.)
This is a more detailed flow of the Vehicle detection module. When processing has started, an initial background model estimate is used to segment the foreground objects. We would then get object properties and assign vehicle IDs to these objects.
To track these vehicles, Kalman Filter is used, which will be explained later.
They are then classified according to their types and counted.
Next is the estimation of traffic congestion.
After that, the processing would be checked if the video is done or not. If it is the option to view the results would then be available and if it is not yet done, the previous values would be stored in the database again and the loop would repeat.
The sub-modules in vehicle detection starts with the object detection….
In this sub module, an initial background model is estimated.
Using one of the most used vehicle detection technique, background subtraction is then performed throughout the frames of the video using the initial backgorund model.
In TraVis, succeeding background modelling are done simultaneously as the video frames are traversed.
Background update is done when five minutes in the video has occurred.
<HIDDEN>
For the Object detection, Background subtraction and area thresholding techniques were used.
In background subtraction, a static background image is taken from the video to be the background model. This background model is used when extracting the foreground models.
Area thresholding is used to take out unwanted blobs in the subtracted image and leave potential vehicles.
The next slides would show the steps in Object detection.
First, The initial background image is on the left and on the right is the use of ROI or Region of Interest. It is a road mask to limit the detection to the part of the road where vehicles pass through.
The next slides will show pairs of images coming from two different frames.
The images above are the gray-scale frames from the video.
The images below show the effect of background subtraction.
Thresholds are then used to the foreground to limit the blobs.
Below as you can see, it is the Foreground filtered with road mask. You can see the difference from the images above that only in that region can blobs be seen.
In the next set of images we can see the effect of morphological operations and edge detection.
In the first set, erosion and dilation were used to highlight the blobs to be considered as vehicles.
In the second set, Edge detection is also used to remove blobs that are intersecting with the background mask to remove extra blobs.
(it may not be highlighted here)
Here is an application of edge detection in this sub-module.
Basically, the edges of vehicles are obtained, then dilation is performed to somehow connect the edges that have small gaps in them, and then filling in the object.
(Here would be a better example of edge detection.)
If you recall the previous pair of images, there are small remaining objects in the scene. In this slide, these small objects are filtered via area thresholding. The remaining objects are now the potential vehicles that will be tracked in the next sub module.
(Here the application of threshold is used to…(explain jan))
Vehicle tracking. In this sub module, potential vehicles will have their aspect ratios checked – only the ones with horizontal aspect ratios will be tracked.
This is because in the current configuration of the cameras, vehicles appear as horizontal objects.
In the image on the right you could see two boxes.
The two boxes represent the two states of the kalman filter: the predicted (red) and the corrected (green).
(The predicted one is the box for kalman filter’s prediction of the object while the green box is the detected (explain better jan)
So why did we use kalman filter? It is not only because of its consistent performance in the papers referenced in travis, but since travis targets vehicles – whose motion is linear, and the kalman filter making use of the law of motion, we find it most utilized for this project.
Include:
Since vehicles, we follow Newtonian law of motion.
Include:
Since vehicles, we follow Newtonian law of motion.
This is the vehicle tracking flowchart
It starts with the segmented objects that we could refer to as blobs.
These blobs have been applied with the morphological operations as mentioned in the previous sub module
These blobs are limited via the region of interest also known as the road masks and area thresholding
When the vehicles are inside the region of interest, and filtered according to their areas, properties such as the bounding box and centroid coordinates, and the areas are obtained
The centroids are used by the kalman filter object to predict the next location of the vehicle
To keep track of the vehicles detected, the hungarian algorithm is used together with the kalman filter.
The hungarian algorithm finds the minimal values between the detected locations and the predicted locations
To be further explained by jan….
The next set of images shows the frame by frame tracking of the system in black and white and rgb images.
You might wonder why on the top images even though there is a box there isn’t any vehicle there anymore.
This is due to edge detection. We used edge detection to limit the counting of vehicles. There are times when the vehicles are at the edge of the image and the bounding box then merges the whole image with the vehicle making the bounding box as large as the whole image making it an error.
Vehicle classification is the classification of the vehicles according to their types.
These types are separated by the minimum and maximum sizes of the pixel areas of the blobs.
Include traffic congestion formula
Change the images – must be same frame numbers and same video used.
Remove window of mac
Change the images – must be same frame numbers and same video used.
Remove window of mac
CHANGE IMAGES
In here is the graph comparing the differences of counts when it is manually done, through a one time background model and a frequently changing background model.
Below is the comparison of the pearson’s correlation coefficient of the one time background model and a frequently changing background model.
It shows how related each count is to the manual counts. And as you can see it has improved a lot from a negative to a coefficient near 1. It indicates that the frequently changing background model is the better choice.
Relative error is the measure of mean absolute error to the mean value of the measured dataset
Absolute error is used to identify the exact error in the dataset
This shows us that there is a low percentage of error or uncertainty in our system.
We were able to achieve our main goal of developing a vision-based system for counting vehicles and estimating traffic congestion levels in road sections installed with traffic surveillance cameras, that is accessible through a web interface.
Although our performance have been affected by the following
first off is the quality of the video, we thought that the videos of the Archer’s eye has a low quality because of the cameras used and of the framerate of the videos.
Next is our problems with the background models, we used a background modelling scheme that updates every 5 minutes due to the changes in the settings of the camera and also the position of the camera towards the road because sometimes it moves. This sometimes has a problem when the traffic is really heavy and the background updated while there are too many vehicles.
And also unwanted blobs of segmented blobs greatly affects our performance. First the unwanted blobs such as the people detected. Even though this is controlled with the use of the background masking or by using a region of interest and also with the use of aspect ratio there are still some instances of detections of these unwanted blobs. It maybe be because that a group of people walking in a group that makes the bounding box see that group as one object thus also making it into a potential vehicle blob.
There are also problems with segmented blobs, this happens due to the morphological process used. Some pixels of the cars were removed. Some cars are then segmented into two or more blobs that either is detected into two vehicles or not even detected.
Admist those problems, Our System Counts with the application of the multiplying factors has an average of .78 Correlation Coefficient value which is close to 1 that indicates that the values were highly correlated.
It also has an average of 0.49 Percentage Relative Error which shows that the relative error of the values is low indicating the accuracy of the system amidst the aforementioned factors that affect the performance.
We were able to gather traffic videos from the Archer’s Eye which covers our first specific objective.
We were also able to develop machine vision Algorithms for processing the videos to classify the vehicles according to type and track and count the number of vehicles
We were also able to design a suitable database for efficiently storing traffic surveillance videos and corresponding traffic statistics
And lastly, we develop an interactive web interface for accessing relevant data and information from the traffic surveillance videos.