SlideShare a Scribd company logo
1 of 45
Download to read offline
國立臺灣大學電機資訊學院資訊工程學系
                             碩士論文
      Department of Computer Science and Information Engineering
        College of Electrical Engineering and Computer Science
                    National Taiwan University
                           Master Thesis




  以單一攝影機完成同步定位、地圖建置與物體追蹤之
                     非延遲初始化演算法
Monocular Simultaneous Localization and Generalized Object
          Mapping with Undelayed Initialization



                              蕭辰翰
                         Chen-Han Hsiao


                   指導教授:王傑智 博士
              Advisor: Chieh-Chih Wang, Ph.D.


                       中華民國 99 年 7 月
                            July, 2010
誌謝
  經歷了兩年的研究生活,能順利完成這本論文,確實是件讓人很高興的事。
而這兩年的生活,接觸了系上許多非常有實力、經驗的教授、一起在系館、實驗
室研究的夥伴,還有業界一起合作計畫的工程師。過程中完成了一些 project,
也投出了品質不錯的 journal paper。對我來說,這真是成長很多的一段經歷。
  在這段研究生活中,最先要感謝的是我的指導教授王傑智。從第一次聽到老
師演講關於實驗室有趣的研究,就感覺到老師對機器人的熱情。也在當下決定,
考上研究所後,要加入機器人知覺與學習實驗室。而這兩年跟著老師,除了學習
到受用的知識,我覺得更重要的,是老師對於研究時細節的掌握,還有處理事情
的條理,讓我在做事時有跟以往不同的想法。而撰寫論文時,老師給我的鞭策與
鼓勵,使我盡力去克服寫論文所遇到的障礙,使我順利完成論文。而另外四位口
試委員,傅立成教授、莊永裕教授、黃漢邦教授、林達德教授,則是給了我很受
用的建議,讓我的研究能有更多發展的空間。
    而實驗室中,幫助我最大的是 Casey 學長與昆翰學長。從碩一剛進實驗室,
Casey 學長就帶領我學習實驗室中的各項事務,讓我能漸漸熟悉實驗室的研究。
這兩年中,幾乎是跟著 Casey 一起完成了每一項 project,也一起為了 VSLAM 的
journal paper 努力將近一年的時間。Casey 認真負責但又能讓人輕鬆相處的個
性,是我很欽佩的人。而坐我隔壁的昆翰學長,在一起研究 VSLAM 的過程之中,
也讓我有很多新觀念,跟昆翰一起討論研究上的問題,常常能夠釐清很多原先不
清楚的想法。而同屆的崇瀚,則是一起經歷了研究生活中的苦悶與樂趣,也互相
在研究生活中幫忙。以後有空也要再一起好好討論喜歡的音樂跟電影。另外,總
是給實驗室帶來歡樂的氣氛的 Nicole 學姐;具有穩重氣息的國輝學長;做事認
真、才氣十足的 Alan、紹丞、顥學;設計過令我十分佩服的德州撲克演算法的
Jimmy 學長;去美國唸書的維均學長跟懿柳學姐;熱愛登山健行的郁君;Andi、
Any、俊甫跟宗哲等等實驗室的前輩,感謝有你們,讓我在實驗室學習大家累積
的經驗還有豐富的知識,也感謝你們曾給我過的幫助。
    認識了 9 年的高中好友段佳宏、林昭辰,經常在我遇到低潮給我鼓勵。很感
謝他們,使我能克服人生中的不順利。大學好友徐國鐘、譚立暉則是從大學以來
的夥伴 一起去玩過台灣很多地方 也一起從數學系轉換跑道 各自到了資工所、
   ,           ,           ,
財金所,很高興大家都有好的表現,以後要一起在台北工作,相信大家也能繼續
有好成績的。還有,擁有共同興趣的樂團朋友、單車社夥伴,跟你們一起合作、
遠遊的回憶令人感覺人生的美好。
  最後要感謝我的家人,我的爺爺奶奶把我帶大、教導我,讓我有良好的價值
觀。而爸爸媽媽辛苦的給了我好的環境、空間,讓我能發揮所長、完成了碩士學
位。
  暫時會結束學生的身份、離開學校。但我很高興能在台大待了六年,在數學
系當了四年的大學生,在資工所當了兩年的研究生。每次走在寬廣的椰林大道,
心情總是非常開闊而舒服。這六年,台大給了我自由的環境來學習、發揮,並促
使我去找尋自己真的想要的方向。儘管並非一切順利,但在這樣的過程中,越能
夠確定自己內心的想法 了解自己內心所希望堅持的理想 我很喜歡在這校園中
          ,              。         ,
擁有各式各樣不同生活背景、不同生活目標的人,互相在這裡激發出彼此更大的
能量。感謝所有影響過我的人,謝謝你們讓我變的更美好。


               2010/8/16 蕭辰翰 於 台大資工系 R407
摘要
   已有不少基於卡爾曼濾波器的研究結果展示了使用單一相機來進行同步定位、

建立地圖(SLAM)的可行性。然而,較少研究探討 SLAM 在動態環境中的可行性。為

了能在動態環境中同時建立靜態與動態地圖,我們提出一個基於卡爾曼濾波器的

演算法架構及新的參數表示法來整合移動物體。藉由新的參數表示法,我們的演

算法能同時估測環境中的靜態物體及動態物體,而達到廣泛物體的地圖建製(SLAM

with generalized objects)。這樣的參數表示法繼承了倒數深度表示法(Inverse

depth parametrization)的優點,像是較大範圍的距離估測、較佳的線性化參數

表示。目前關於 SLAM 在動態環境中的研究,皆需要數筆測量以確保物體的靜止性

質,再延遲的進行物體初始化。而我們的參數表示法允許無延遲的物體初始化,

使得我們的演算法能利用每一筆的測量而獲得更好的估測。同時,我們也提出了

一個低運算量的動態、靜態物體分類演算法。模擬實驗顯示了我們演算法的準確

性。而真實環境實驗也顯示了我們的演算法能在室內動態環境成功的進行廣泛物

體的地圖建製(SLAM with generalized objects)。
MONOCULAR SIMULTANEOUS
   LOCALIZATION AND
  GENERALIZED OBJECT
MAPPING WITH UNDELAYED
     INITIALIZATION

                Chen-Han Hsiao


 Department of Computer Science and Information Engineering
                 National Taiwan University
                       Taipei, Taiwan


                           July 2010



                Submitted in partial fulfilment of
                the requirements for the degree of
                        Master of Science


                Advisor: Chieh-Chih Wang

                     Thesis Committee:
                  Chieh-Chih Wang (Chair)
                        Li-Chen Fu
                      Yung-Yu Chuang
                     Han-Pang Huang
                         Ta-Te Lin
                    c C HEN -H AN H SIAO, 2010
ABSTRACT




ABSTRACT




R
        ECENT     works have shown the feasibility of the extended Kalman
           filtering(EKF) approach on simultaneous localization and map-
           ping (SLAM) with a single camera. However, few approaches
           have addressed the solutions for the insufficiency of SLAM to
deal with dynamic environments. For accomplishing SLAM in dynamic
environments, we proposed a unified framework based on a new para-
metrization for both static and non-static point features. By applying the
new parametrization, the algorithm is able to integrate moving features and
thus achieve monocular SLAM with generalized objects. The new para-
metrization inherits good properties of the inverse depth parametrization
such as the ability to adopt large range of depths and better linearity. In
addition, the new parametrization allows undelayed feature initialization.
Contrary to the existing SLAM algorithms with delayed initialization ap-
proach which takes some measurements for the classification usage, our
SLAM with generalized objects algorithm and undelayed initialization al-
gorithm would utilize each measurement on point features for filtering and
has a better estimation of the environment. A low computational classifica-
tion algorithm to distinguish static and moving features is also presented.
Simulations shows high accuracy of our classification algorithm and esti-
mation about features. We also demonstrate the success of our algorithm
with real image sequence captured from an indoor environment.




                                                                         ii
TABLE OF CONTENTS




TABLE OF CONTENTS



ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                  ii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                   iv
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                  vi
CHAPTER 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . .                                              1
 1.1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . .                                         1
CHAPTER 2. STATE VECTOR DEFINITION IN SLAM
             WITH GENERALIZED OBJECT . . . . . . . . . . . . .                                               4
 2.1. STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED
 OBJECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                    4
    2.1.1. State Vector Definition . . . . . . . . . . . . . . . . . . . .                                    4
    2.1.2. Dynamic Inverse Depth Parametrization . . . . . . . . . .                                         5
    2.1.3. Undelayed Feature Initialization . . . . . . . . . . . . . .                                      7
CHAPTER 3. STATIC AND MOVING OBJECT CLASSIFICATION                                                      .    8
 3.1. STATIC AND MOVING OBJECT CLASSIFICATION . . . .                                                   .    8
   3.1.1. Velocity Convergency . . . . . . . . . . . . . . . . . . .                                    .    8
   3.1.2. Define Score Function for Classification . . . . . . . . .                                      .    9
   3.1.3. Classification State . . . . . . . . . . . . . . . . . . . . .                                 .   11
   3.1.4. Issue on unobservable situations . . . . . . . . . . . . .                                    .   12
CHAPTER 4. EXPERIMENTAL RESULTS                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
 4.1. EXPERIMENTAL RESULTS . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
    4.1.1. Simulation . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
    4.1.2. Real Experiments . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
CHAPTER 5. CONCLUSION AND FUTURE WORK . . . . . . . . .                                                     30
 5.1. CONCLUSION AND FUTURE WORK . . . . . . . . . . . . .                                                  30
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                    32


                                                                                                            iii
LIST OF FIGURES




LIST OF FIGURES



2.1 rWC denotes the camera position, and qWC denotes quaternion
    defining orientation of camera. Moving Object is coded with the
    dynamic inverse depth parametrization. . . . . . . . . . . . . . . .        6

3.1 Velocity convergency of 3 target features under observable condition 10
3.2 Velocity convergency of 3 target features
    under unobservable condition . . . . . . . . . . . . . . . . . . . .        14

4.1 Effect of different classified threshold on the classification result .       17
4.2 Convergency of our SLAM algorithm shown with boxplot. The
    lower quartile, median, and upper quartile values of each box
    shows the distribution of the estimation error of all the objects in
    each observed frame. (a)estimation error of the camera increases
    when exploring(frame 1 to frame 450) and decreases when close-
    loop(frame 450 to frame 600) (b)estimation error of the static features
    decreases with numbers of observed frame (c)estimation error of the
    moving features decreases with numbers of observed frame. . . . 21
4.3 The NTU PAL7 robot. Real data experiment platform of monocular
    SLAM with generalized object. . . . . . . . . . . . . . . . . . . . . 22
4.4 The basement of the CSIE department at NTU. Green/grey dots
    show the map built using the laser scanner. . . . . . . . . . . . . .       23
4.5 The image sequence collected in the basement and the corresponding
    monocular SLAMMOT results. Figures 4.5(a), 4.5(c), 4.5(e), 4.5(g),
    4.5(i), 4.5(k), 4.5(m), and 4.5(o) show the results of feature extraction
    and association. Figures 4.5(b), 4.5(d), 4.5(f), 4.5(h), 4.5(j), 4.5(l),
    4.5(n), and 4.5(p) show the monocular SLAM with generalized
    object results in which black and grey triangles and lines indicate
    the camera poses and trajectories from monocular SLAM with
    generalized object and LIDAR-based SLAMMOT. Gray points show

                                                                                iv
LIST OF FIGURES


    the occupancy grid map from LIDAR-based SLAM. All the estimation
    of visual features are inside the reasonable cube. . . . . . . . . . . 27
4.6 The result of the SLAM part of monocular SLAM with generalized
    object. The definitions of symbols are the same as Figure. 4.5. There
    are 107 stationary features in the state vector of monocular SLAM
    with generalized object. . . . . . . . . . . . . . . . . . . . . . . . . 29




                                                                             v
LIST OF TABLES




LIST OF TABLES



4.1 Total classification result of 50 Monte Carlo simulation in the observable
    condition with the threshold ts = 1.6 . . . . . . . . . . . . . . . . . 19
4.2 Total classification result of real experiment . . . . . . . . . . . . . 23




                                                                           vi
1.1   INTRODUCTION




 CHAPTER 1

                      INTRODUCTION


1.1. INTRODUCTION
    Recently, SLAM using a monocular or stereo camera as the only sen-
sor has been proven feasible and thus become popular in robotics (Davison
et al., 2007; Lemaire et al., 2007). To overcome the weakness of the XYZ
encoding system in Davison et al.’s approach, Montiel et al. proposed an
inverse depth parameterization approach (Montiel et al., 2006; Civera et al.,
2008). Montiel et al.’s approach shows a better Gaussian property for the
EKF algorithm, and a non-delayed initialization procedure increasing the
speed of convergence. The inverse depth parameterization also provides
the feasibility for estimating a feature at potentially infinite. However, the
inverse depth parameterization is only defined for positive depth. The in-
verse depth of a feature may converge to negative value and therefore cause
a catastrophic failure (Parsley & Julier, 2008).
    Several attempts have been made to solve the SLAM problem in dy-
namic environments. Sola discussed the observability issue of bearing-only
tracking and proposed using two cameras to solve SLAM and moving ob-
ject tracking with some heuristics for detecting moving object in specific
scenarios (Sola, 2007). Wangsiripitak and Murray (Wangsiripitak & Murray,


                                                                           1
1.1   INTRODUCTION


2009) presented an approach to recover the geometry of known 3D moving
objects and avoid the effect of wrongly deleting the occluded features. In
their approach, manual operations such as deleting features on non-static
objects are needed. Migliore et al. (Migliore et al., 2009) demonstrated a
monocular SLAMMOT system with a separated SLAM filter and a mov-
ing object tracking filter. The classification for the moving object is based
on the Uncertain Projective Geometry (Hartley & Zisserman, 2004). Vidal-
Calleja et al. analyzed the observability issue of bearing-only SLAM sys-
tems and identified the motion for maximizing the number of observable
states (Vidal-Calleja et al., 2007). However, the current approaches about
SLAMMOT decouple the tracking part and the SLAM part. Also, the clas-
sification between moving objects and static objects takes several steps. Ex-
isting approaches adapted delayed initialization and thus the estimation of
SLAM cannot benefit from the observation information during the classifi-
cation steps.
    In this thesis, we propose a framework for monocular Simultaneous
Localization and Generalized Object Mapping. This research presents an
un-delayed initialization approach and a better classification method. Both
the simulation and real experiment will be demonstrated and evaluated.
The static environment assumption will not be needed in our approach.
    In chapter 2, we define the state vector for EKF SLAM with SLAM with
generalized object, especially the proposed parametrization for landmarks
in dynamic environments. The un-delayed initialization method is also il-
lustrated. Chapter 3 gives the detail of our classification algorithm for dis-
tinguishing static features and moving features. In addition, the observabil-
ity issue for classification is discussed. In chapter 4, both the simulation




                                                                           2
1.1   INTRODUCTION


and real experimental results are provided to show the performance of our
approach.




                                                                       3
2.1   STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT




 CHAPTER 2

STATE VECTOR DEFINITION IN SLAM
   WITH GENERALIZED OBJECT


2.1. STATE VECTOR DEFINITION IN SLAM WITH GEN-
     ERALIZED OBJECT
2.1.1. State Vector Definition

    To build a feature-based map, we applied the Extended Kalman Filter
(EKF) based Simultaneous Localization and Mapping (SLAM) algorithm.
Following the standard EKF SLAM, we maintain a state vector containing a
pose of the camera and locations of features.
                                    ⊤       ⊤
                       χ = (x⊤ , o1 , o2 , . . . , on ⊤ )⊤
                             k    k    k            k                      (2.1)

The variable xk is composed of rW camera position, qW quaternion defining
orientation, vW velocity and ω C angular velocity.


                                      rW
                                        
                                     qW 
                                xk =  W 
                                     v                                   (2.2)
                                      ωC
    The constant velocity and constant angular velocity motion model de-
rived from Montiel’s approach is applied in our monocular system (Mon-
tiel et al., 2006). For the generalized object oi , the encoding parametrization
                                                k

                                                                              4
2.1   STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT


could be the inverse depth parametrization for static features or the dy-
namic inverse depth parametrization for both static and moving features,
which is composed of the position and the velocity. The proposed parame-
trization will be introduced in section 2.1.2.



2.1.2. Dynamic Inverse Depth Parametrization

    Landmarks in dynamic environments may not be stationary. Thus, para-
metrization containing only position is not enough for the non-static land-
marks. To represent a dynamic environment, we come up with dynamic
inverse parametrization combining the inverse depth parametrization and
the 3-axis velocities to model each landmark. Each landmark is coded with
the 9-dimension state vector.


                                        ⊤
                               ⊤
                 oi =
                  k
                           i
                          ok       vi
                                    k
                                                               ⊤
                     =    xk yk zk θk φk ρk vx vy vz
                                             k  k  k                   (2.3)

 i
ok is the 3D location of the i-th landmark with the inverse depth parame-
                         y
trization, and vi = (vx vk vz )⊤ denotes the 3-axis velocities in the world
                k     k     k
coordinate system. The 3D location of the feature with respect to the XYZ
coordinate is:

                                   
                  Xi                 xk
                 Yi  = loc(oi ) = yk  + 1 × G (θk , φk )           (2.4)
                  Zi
                              k
                                     zk     ρk

    In the prediction stage of the EKF algorithm, the features are predicted
by applying the constant velocity model as illustrated in Figure. 2.1. The
prediction state of the features can be calculated in a closed form:

                                                                          5
2.1   STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT




                          k+1 = loc(ok ) + vk · ∆t
                         oi           i     i
                                                                           (2.5)
                                     1
                              = ri + Gk + vi · ∆t
                                     ρk         k

                                        1
                              = ri +      Gk+1
                                     ρk+1

    where Gk and Gk+1 are the directional vectors.




      Figure 2.1. rWC denotes the camera position, and qWC denotes
                 quaternion defining orientation of camera. Moving Ob-
                 ject is coded with the dynamic inverse depth parametri-
                 zation.



    In the update stage of the EKF algorithm, the measurement model of
the features is also derived from Montiel et al.’s approach. In our approach,
each feature is either coded with the inverse depth parametrization or the

                                                                              6
2.1   STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT


dynamic inverse depth parametrization. Position of all the features is repre-
sented by the inverse depth parametrization, thus we follow the measure-
ment model proposed in Montiel et al.’s approach.

2.1.3. Undelayed Feature Initialization

    As the dynamic inverse depth parametrization is an extension of the
inverse depth parametrization. The initial values for the position of a new
feature can be calculated from rWC , qWC , h = ( u v )⊤ , ρ0 as in Montiel et
                               ˆ     ˆ
al.’s approach. For the initial value in velocity, v0 is set to be 0. And for
the covariance value in velocity, σv is designed to cover its 95% acceptance
region [−|v|max , |v|max ]. So:


                                     |v|max
                                        σv =                                        (2.6)
                                       2
    The initial state of an observed feature is

                y(ˆWC , qWC , h, ρ0 , v0 ) = (xi , yi , zi , θi , φi , ρi , v0 )⊤
                ˆ r     ˆ                     ˆ ˆ ˆ ˆ ˆ ˆ                           (2.7)

After adding the feature into the state vector, the state variance Pk|k becomes
                                 ˆ
                                                    
                                 Pk|k 0 0 0
                                0 Rj 0 0  ⊤
                      ˆ new
                      Pk|k = J                      J
                                0      0 σρ 0 
                                              2

                                   0    0 0 σv     2

                                             I                      0
                      J=         ∂y     ∂y                    ∂y ∂y ∂y
                                     ,
                                ∂ rWC ∂ qWC
                                            , 0, . . . , 0,   ∂h, ∂ρ , ∂v
    By using the dynamic inverse depth parametrization, we are able to
add a new feature into the state vector at the first observed frame. Through
the undelayed feature initialization, our monocular system would uses each
measurement on the point features to estimate both the camera position and
the feature locations and gets better estimations.



                                                                                       7
3.1 STATIC AND MOVING OBJECT CLASSIFICATION




 CHAPTER 3

       STATIC AND MOVING OBJECT
             CLASSIFICATION


3.1. STATIC AND MOVING OBJECT CLASSIFICATION
    Retaining stationary features in the map would be needed for getting
better estimation or for the usage of close loop. Hence, classification is nec-
essary. We propose a classification method of low-computation cost based
on the estimation of the velocity state of the features in this chapter.


3.1.1. Velocity Convergency

    We run simulation experiments with the dynamic inverse depth para-
metrization and the undelayed feature initialization technique discussed in
2.1.1 to verify the velocity convergency of the features in dynamic environ-
ments. In the environment, there were 40 static landmarks and 2 moving
landmarks. 39 static landmarks were added to the state vector using the in-
verse depth parametrization as known features. One static landmark (target
1) and two moving landmarks (target 2 and target 3) were initialized at the
first observed frame with the dynamic inverse depth parametrization and
added to the state vector. The camera trajectory was designed as a helix.
We checked the velocity distribution of these 3 landmarks (target 1, target 2

                                                                            8
3.1 STATIC AND MOVING OBJECT CLASSIFICATION


and target 3) coded in the dynamic inverse depth parametrization after 150
EKF steps. Figure. 3.1 shows the velocity convergency.
    In this simulation example, we found that velocity distribution of the
features converged and thus providing useful information for classifying
the type of these features. Hence we developed a classification algorithm
based on the estimation of the velocity distribution.


3.1.2. Define Score Function for Classification

    3.1.2.1. score function for classifying static objects.                   To classify the
features as static or moving, we define a score function mapping a velocity
distribution to a score value. Then we use the score to determine the feature
as static or moving. Given a 3-dimension velocity distribution X = N (µ , Σ),
the score function is defined as:
                                                     1
              Cs (X) = fX (0) =                                                         (3.1)
                                                   −1
                                  (2π )3/2 |Σ|1/2e  2    (0−µ )⊤ Σ−1 (0−µ )

fX is the probability density function of Gaussian distribution X. That is,
the score function calculates the probability density function value of the
velocity distribution at (0, 0, 0)⊤ . The score reveals the relative likelihood of
the velocity variable to occur at (0, 0, 0)⊤ .
    For a static feature oi , the velocity vi is expected to converge closed to
                          k                 k
(0, 0, 0)⊤ . The score would thus increases and exceeds threshold ts and then
helps the monocular system to classified static objects.

    3.1.2.2. score function for classifying moving objects.                     For classify-
ing each object as either static type or moving type, we further define the
score function for classifying moving objects. Given a 3-dimension velocity
distribution X = N (µ , Σ), the score function is defined as:

                  Cm (X) = DX (0) =     2
                                            (0 − µ )⊤ Σ−1 (0 − µ )                      (3.2)

                                                                                           9
3.1 STATIC AND MOVING OBJECT CLASSIFICATION




(a) Target 1 (static object marked with green circle) under observable condition. Ground-
truth velocity of the target v = (0, 0, 0)




(b) Target 2 (moving object marked with green circle) under observable condition. Ground-
truth velocity of the target v = (1, 0, 0)




(c) Target 3 (moving object marked with green circle) under observable condition. Ground-
truth velocity of the target v = (0, 0, 0.5)

       Figure 3.1. Velocity convergency of 3 target features under observ-
                   able condition
                                                                                    10
3.1 STATIC AND MOVING OBJECT CLASSIFICATION


DX is the Mahalanobis distance function under distribution X. Mahalanobis
distance could be used to detect outlier. Thus, we can check whether the
point (0, 0, 0)⊤ is an outlier of the distribution.
    For a moving feature oi , the velocity vi is expected to converge away
                          k                 k
from (0, 0, 0)⊤ . The score would thus increases and exceeds threshold tm
and then helps the monocular system to classified moving objects.


3.1.3. Classification State

    With the dynamic inverse depth parametrization and the proposed clas-
sification algorithm, SLAM with generalized object could be implement as
following. Each feature is initialized at the first observed frame with the dy-
namic inverse depth parametrization and labeled as unknown state. In each
of the following observed frame, we examine the estimated distribution of
the feature using the two score functions.

    3.1.3.1. from unknown state to static state.       If we find that the score
value Cs (X) of a unknown-state feature exceeds the threshold ts at a certain
frame, we immediately classify the feature as static object and label the fea-
ture as static. Also, the velocity distribution of the feature is adjusted to
satisfied the property of a static object. The velocity is set to (0, 0, 0)⊤ and
the correspond covariance covariance is set to 0. After the feature is classi-
fied as static, we also assume the feature is static and make not prediction
at the prediction stage to ensure the velocity of the object fixed at (0, 0, 0)⊤ .
The trasition process can be expressed as a function:



                   f ( xk yk zk θk φk ρk vx vk vz )
                                             y
                                          k     k                           (3.3)

                      = xk yk zk θk φk ρk 0 0 0

                                                                              11
3.1 STATIC AND MOVING OBJECT CLASSIFICATION


    In fact, the feature coded in the 9-dimension dynamic inverse depth
parametrization contribute to the estimation process as the same as a static
feature coded in 6-dimension inverse depth parametrization. Also, the tran-
sition from an unknown state to a static state makes the covariance matrix
become sparse and thus reduce the computation cost in the implementation
of SLAM with generalized object. The computation complexity for a static
feature coded in the dynamic inverse depth parametrization with zero ve-
locity is the same as the computation complexity for a static feature coded
in the inverse depth parametrization.

    3.1.3.2. from unknown state to moving state.               If we find that the
score value Cm (X) of an unknown-state feature exceeds the threshold tm at a
certain frame, we immediately classify the feature as a moving object and la-
bel the feature as moving. Note that the feature has been initialized with the
dynamic inverse depth parametrization, both the position and the velocity
are already being estimated thus there is no need to adjust the distribution
and the motion model.
                                           ⊤    ⊤
    Finally, the state vector χ = (x⊤ , o1 , o2 , . . . , on ⊤ )⊤ is composed of three
                                    k    k    k            k
types of features (unknown, static, moving). Different type of feature is ap-
plied different motion model. Unknown-type and moving-type features are
applied motion model with acceleration noise while the static-type features
are applied stationary assumption. Thus a generalized object mapping ap-
proach is achieved.


3.1.4. Issue on unobservable situations

    3.1.4.1. Non-converged velocity distribution under unobservable sit-
uations.     Under unobservable situations, the monocular system cannot
accurately estimate the location of a moving feature. Thus, the system also

                                                                                   12
3.1 STATIC AND MOVING OBJECT CLASSIFICATION


cannot accurately estimate the velocity of the feature. We check the effect of
unobservable situations on the proposed classification algorithm with sim-
ulation.
    The scenario in simulation is set as the same as the scenario in Figure.
3.1. Same moving objects with same moving patterns are in the scenario ex-
cept that the camera moves at constant speed. Note that under such camera
trajectory, the projection of target 1 (static object) is the same as the projec-
tion of target 3 (moving object). This condition matches the observability
issue which means the disability of monocular system to find an unique
trajectory of an object under the constant-velocity assumption on moving
objects. We checked the velocity distribution of the 3 landmarks coded in
the dynamic inverse depth paramatrization after 150 EKF steps.
    From Figure 3.2, we can find that velocity distribution of these three
target objects do not converged. Three velocity distributions cover large
area so that we can not ensure the velocity of these objects.




    3.1.4.2. ambiguation of a static object and a parallel-moving object.
The location distribution and velocity distribution of both target 1 and tar-
get 3 are the same. While target 1 is static and target 3 is moving, we cannot
distinguish the state of them according to the velocity distribution. In fact,
the projection points of target 1 and target 3 are the same during these 150
frames. The estimation of these two objects must be the same. Thus, it is
impossible to classify target 1 and target 3 as static or moving under the
monocular system. This ambiguation could be extended to any static ob-
ject since we can find a corresponding moving object whose projections are
the same to the static object under unobservable situations. Note that such
corresponding moving object must moves parallel to the camera. From the

                                                                              13
3.1 STATIC AND MOVING OBJECT CLASSIFICATION




(a) Target 1 (static object marked with green circle) under unobservable condition. Ground-
truth velocity of the target v = (0, 0, 0)




(b) Target 2 (moving object marked with green circle) under unobservable condition.
Ground-truth velocity of the target v = (1, 0, 0)




(c) Target 3 (moving object marked with green circle) under unobservable condition.
Ground-truth velocity of the target v = (0, 0, 0.5)

        Figure 3.2. Velocity convergency of 3 target features
                   under unobservable condition
                                                                                      14
3.1 STATIC AND MOVING OBJECT CLASSIFICATION


simulation, we can understand the disability of classification under unob-
servable situations.

    3.1.4.3. Non parallel-moving object.     However, velocity distribution
of target 2 reveals another fact. 95% confidence region of the velocity es-
timation do not cover origin point (0, 0, 0)⊤ . The zero velocity was filtered
out with our SLAM with generalized object algorithm. Thus, the moving
object would be classified as a moving object even under unobservable sit-
uations. In fact, no static object would have the same projection as the non
parallel-moving objects, which mean there is no ambiguation between static
objects and non parallel-moving objects. Thus, we could find out the possi-
ble range of velocity distribution by filtering technique and classified those
non parallel-moving objects as moving under unobservable situations.

    3.1.4.4. Assumptions for solving the unobservable issue.       Although
we have seen the disability of classification under unobservable situations,
especially the ambiguation of a static object and a parallel-moving object
due to the same projection in the monocular system, assumptions may help
us to classify the object under the unobservable. For example, for an en-
vironments that has little moving object moved parallel to the camera, am-
biguation between static object and a parallel-moving object is solved. Clas-
sification could be done using our algorithm.




                                                                          15
4.1   EXPERIMENTAL RESULTS




 CHAPTER 4

             EXPERIMENTAL RESULTS


4.1. EXPERIMENTAL RESULTS
4.1.1. Simulation

    4.1.1.1. Effect of different threshold on the classification result.      To
evaluate the effect of different threshold on the classification, simulation ex-
periments were conducted. The accuracy of classification will be compared
under different thresholds. Since there are three possible states (unknown,
static and moving) of an estimating feature, the wrongly classified error and
the misclassified error are defined:
      wrongly classified error: A feature would be added with an unknown
        state. If the feature is finally classified as a different type as it
        should be, we say the feature is wrongly classified. For example,
        a static feature is classified as a moving feature or a moving feature
        is finally classified as a static feature.
      misclassified error: A feature would be added with an unknown state.
        If the feature is wrongly classified or not be classified as either static
        or moving type, we say the feature is misclassified.
    The simulated scenarios are shown in Figure. 4.1(a) and Figure.4.1(b).
In Figure. 4.1(a), the camera moved with a non-constant speed on the circle

                                                                             16
4.1                EXPERIMENTAL RESULTS


       to avoid the unobservability situation. In Figure. 4.1(b), the camera moved
       with a constant speed on four connected lines to test the performance under
       an unoberservable situation. 300 static landmarks and 288 moving land-
       marks were randomly located in a 3D cube with a width of 30 meters in
       each scenario. 50 Monte Carlo simulations of each scenario were run and
       evaluated.

                                                       25

                                                                                                                                                           25
                                                  20

                                                                                                                                                      20
                                           15

                                                                                                                                             15
        Z[m]




                                      10




                                                                                                               Z[m]
                                                                                                                                        10
                                5
                                                                                                                                   5
                            0
                                                                                                                             0
               −5
           5                                                                                                            −5
                                                                                                                    5

           0
                                                                                                                    0

         −5
                                                                                                                −5
Y[m]           25                                                                                      Y[m]             25
                                20
                                                                                                                                   20
                                            15                                                                                                   15
                                                        10                                                                                                   10
                                           X[m]                   5                                                                          X[m]                      5
                                                                            0                                                                                                0
                                                                                    −5                                                                                               −5



                                       (a) Observable scenario                                                                   (b) Unobservable scenario

                                                        misclassified ratio[moving object]                                                             misclassified ratio[moving object]
                                                        misclassified ratio[static object]                                                             misclassified ratio[static object]
                                                        wrongly classified [static object]                                                             wrongly classified [static object]
                    1                                                                                          1



                0.8                                                                                           0.8



                0.6                                                                                           0.6



                0.4                                                                                           0.4



                0.2                                                                                           0.2



                    0                                                                                          0
                        0            0.5         1          1.5       2     2.5          3   3.5   4                0            0.5         1             1.5    2        2.5        3     3.5   4
                                                                      ts                                                                                          ts



       (c) Misclassified ratio in observable scenario (d) Misclassified ratio in unobservable sce-
                                                     nario

                    Figure 4.1. Effect of different classified threshold on the classification result



                        Instead of using a ROC curve to present the performance, the relations
       between threshold and classified ratio are shown directly in Figure. 4.1(c)

                                                                                                                                                                                                      17
4.1   EXPERIMENTAL RESULTS


and 4.1(d). In both scenarios, the misclassified ratio of static features in-
creases when the threshold ts increases, while the misclassified ratio of mov-
ing features decreases. For example, misclassified ratio of static features is
increasing from 0 to 0.5 when ts = 1 is increasing to ts = 3.5 under observ-
able situations. Meanwhile, misclassified ratio of static features is increas-
ing from 0.1 to 1 when ts = 1 is increasing to ts = 3.5 under unobservable
situations.
    And misclassified ratio of moving features is decreasing from 0.03 to
0.01 when ts = 1 is increasing to ts = 3.5 under observable situations. Mean-
while, misclassified ratio of moving features is decreasing from 0.03 to 0.01
when ts = 1 is increasing to ts = 3.5 under unobservable situations.
    This finding satisfied our expectation that a larger threshold ts would re-
sult in less features classified as static feature. Thus, when a larger threshold
ts is chosen, the misclassified ratio of static features would increase and mis-
classified ratio of moving features would decrease. The trade-off between
these two misclassified ratios should be considered according to the usage
of the monocular system.
    Further, the classification performance is better under an observable sit-
uations. By comparing Figure. 4.1(a) and Figure.4.1(b), it can be seen that
misclassified ratio of static features is smaller under observable situations
than the ratio under unobservable situations. The results match the dis-
cussions in Sec 3.1. Thus, an observable situations would be necessary for
better classification performance.
    However, we should note that a small portion of misclassified features
are caused by wrongly classification. That means the classification algo-
rithm does not provide incorrect information. It retains the unknown state




                                                                             18
4.1   EXPERIMENTAL RESULTS


of a feature when the information is insufficient and still letting the un-
known state features contribute to our monocular system.
    By choosing the threshold ts = 1.6, we can obtain a good classification
performance. Table 4.1 shows the classification result under the threshold
ts = 1.6 in the observable condition. Thus, we choose the threshold ts = 1.6
in the following experiments for checking the convergency of our SLAM
algorithm and the real experiment.
                                 classification state
                             static moving unknown
                   Static    9480       16         1
                   Moving 71          6734         30

                   Table 4.1. Total classification result of
                             50 Monte Carlo simulation
                             in the observable condition
                             with the threshold ts = 1.6
    4.1.1.2. Convergency of our SLAM algorithm.            We further checked
the convergency of our SLAM algorithm in the observable scenario. The
estimation error on the camera, static features and moving features of 50
Monte Carlo simulation results are shown with boxplot in Figure. 4.2. The
estimation error on the camera increases when the robot is exploring the
environment from frame 1 to frame 450. The camera starts to close loop
from frame 450, thus the error decreases. The mean of the errors in Figure.
4.2(a) reveals the finding. During the procedure of SLAM, the estimation
errors do not diverged. As presented in Figure. 4.2(b) and Figure. 4.2(c),
estimation errors of both static features and moving features decrease when
the number of estimated frame increases.




                                                                          19
4.1    EXPERIMENTAL RESULTS


                                                                      estimation error of the camera
                                              2

                                             1.8

                                             1.6
camera estimation error [m]




                                             1.4

                                             1.2

                                              1

                                             0.8

                                             0.6

                                             0.4

                                             0.2

                                              0
                                                   60     120        180    240     300   360    420        480    540   600
                                                                                  frame

                                                          (a) estimation error of the camera


                                                                 estimation error of the static objects
                                             15
        static object estimation error [m]




                                             10




                                              5




                                              0
                                                   5     10     15     20    25     30  35      40     45     50    55   60
                                                                                  frame

                                                        (b) estimation error of the static objects




                                                                                                                               20
4.1    EXPERIMENTAL RESULTS


                                                           estimation error of the moving objects
                                         15

    moving object estimation error [m]




                                         10




                                          5




                                          0
                                                5     10   15   20    25     30  35      40     45    50   55   60
                                                                           frame

                                                    (c) estimation error of the moving objects

Figure 4.2. Convergency of our SLAM algorithm shown with box-
                                              plot. The lower quartile, median, and upper quartile
                                              values of each box shows the distribution of the esti-
                                              mation error of all the objects in each observed frame.
                                              (a)estimation error of the camera increases when explor-
                                              ing(frame 1 to frame 450) and decreases when close-
                                              loop(frame 450 to frame 600) (b)estimation error of the
                                              static features decreases with numbers of observed frame
                                              (c)estimation error of the moving features decreases with
                                              numbers of observed frame.




                                                                                                                       21
4.1   EXPERIMENTAL RESULTS


4.1.2. Real Experiments




       Figure 4.3. The NTU PAL7 robot. Real data experiment platform of
                 monocular SLAM with generalized object.


    A real experiment with 1793 loop closing image sequence was run and
evaluated. Figure. 4.3 shows the robotic platform, NTU-PAL7, in which a
Point Grey Dragonfly2 wide-angle camera was used to collect image data
with 13 frames per second, and a SICK LMS-100 laser scanner was used
for ground truthing. The field of view of the camera is 79.48 degree. The
resolution of the images is 640 × 480. The experiment were conducted in
Department of Computer Science and Information Engineering (CSIE), Na-
tional Taiwan University (NTU). Figure. 4.4 shows the basement (15.2 ×
11.3 meters) in which we verified the overall performance of SLAM with
generalized object such as loop closing, classification and tracking. During
the experiments, a person moved around in the environment and appeared
3 times in front of the camera.
    In the experiments, there are 107 static features and 12 moving features.
Each time the moving person appeared, 4 features on the person are gener-
ated and initialized. Table 4.2 shows the performance of our classification

                                                                          22
4.1    EXPERIMENTAL RESULTS




                  15
                                    Wall 2 ↓

                  10
                                 Desks 2 ↓                 ← Desks 1


                   5
                           Wall 3 →
          Z[m]




                                               Checker ↑
                                                                ← Wall 1
                   0                           Board
                                 Desks 3 →


                  −5
                                    Wall 4 ↑

                 −10
                    20      15     10      5        0      −5      −10     −15   −20
                                                   X[m]


      Figure 4.4. The basement of the CSIE department at NTU.
                         Green/grey dots show the map built using the laser
                         scanner.



algorithm. None of the feature is wrongly classified. 107 static features in
the environment are all classified correctly as static. And 12 moving features
are also classified correctly as moving.
                                   classification state
                              static moving unknown
                     Static     107       0          0
                     Moving      0        12         0

                           Table 4.2. Total classification result of
                                        real experiment

                                                                                           23
4.1   EXPERIMENTAL RESULTS


    The estimation with time is shown in Figure. 4.5. We estimated the
trajectory of the camera, built the map and estimated the moving object
successfully using our proposed algorithm. Figure. 4.5 (f), (j) and (n) shows
the estimation of the moving person. Clearly, the algorithm did not fail
while the image sequence was captured from a dynamic environment.




(a) Frame 10. In the beginning of our SLAM      (b) Top view: Frame 10. Squares indicate the
with generalized object algorithm, each fea-    stationary features and blue shadows indi-
ture are assumed static and added into the      cates the 95% acceptance regions of the es-
state vector. The ellipses show the projected   timates. The possible location of the features
2σ bounds of the features.                      have not converged.




(c) Frame 220. The robot start to explore the (d) Top view: Frame 220. The possible loca-
environment.                                  tion of the features start to converged.




                                                                                           24
4.1   EXPERIMENTAL RESULTS




(e) Frame 330. The person appeared in front (f) Top view: Frame 330. Red shadows indi-
of the camera the first time. 4 feature are lo- cates the 95% acceptance regions of the mov-
cated and initialized in the state vector. Red ing features.
ellipses show the projected 2σ bounds of the
moving features. Note that some static fea-
tures are occluded by the person. Cyan el-
lipses show the projected 2σ bounds of the
non-associated features.




               (g) Frame 730                           (h) Top view: Frame 730




                                                                                        25
4.1   EXPERIMENTAL RESULTS




(i) Frame 950. The person appeared in front    (j) Top view: Frame 950. Green shad-
of the camera the second time. And as the      ows indicates the 95% acceptance regions of
robot exploring the environment, some new      the newly initialized features with unknown
feature would be added as unknown feature.     state
Green ellipses show the projected 2σ bounds
of those newly initialized features with un-
known state.




              (k) Frame 1260                            (l) Top view: Frame 1260




                                                                                       26
4.1   EXPERIMENTAL RESULTS




(m) Frame 1350. The person appeared in                (n) Top view: Frame 1350
front of the camera the third time.




            (o) Frame 1560                           (p) Top view: Frame 1560

      Figure 4.5. The image sequence collected in the basement and the
                 corresponding monocular SLAMMOT results. Figures
                 4.5(a), 4.5(c), 4.5(e), 4.5(g), 4.5(i), 4.5(k), 4.5(m), and 4.5(o)
                 show the results of feature extraction and association.
                 Figures 4.5(b), 4.5(d), 4.5(f), 4.5(h), 4.5(j), 4.5(l), 4.5(n),
                 and 4.5(p) show the monocular SLAM with generalized
                 object results in which black and grey triangles and lines
                 indicate the camera poses and trajectories from monoc-
                 ular SLAM with generalized object and LIDAR-based
                 SLAMMOT. Gray points show the occupancy grid map
                 from LIDAR-based SLAM. All the estimation of visual
                 features are inside the reasonable cube.




                                                                                      27
4.1   EXPERIMENTAL RESULTS


   The final map is shown in Figure. 4.6 in both top view and side view.
Compared with the gray map built with laser-SLAMMOT algorithm, the
map built with our monocular SLAM with generalized object algorithm is
closed to the location of the environments. All the estimated features are
located within a reasonable cube. No estimation of features goes outside
the reasonable range.




                                                                       28
4.1   EXPERIMENTAL RESULTS




                    (a) Top view: Frame 1690




                    (b) Side view: Frame 1690

Figure 4.6. The result of the SLAM part of monocular SLAM with
          generalized object. The definitions of symbols are the
          same as Figure. 4.5. There are 107 stationary features
          in the state vector of monocular SLAM with generalized
          object.




                                                                   29
5.1   CONCLUSION AND FUTURE WORK




 CHAPTER 5

  CONCLUSION AND FUTURE WORK


5.1. CONCLUSION AND FUTURE WORK
    We have illustrated the procedures of our proposed algorithm, includ-
ing the state vector definition, the motion model for moving objects, feature
initialization and the classification algorithm. The proposed dynamic in-
verse depth parametrization achieves the undelayed feature initialization.
The parametrization is able to encode both static features and moving fea-
tures. The algorithm benefits from the undelayed initialization and thus
have a better estimation about the camera and both static features and mov-
ing features. Also, the parametrization provides a way for tracking a mov-
ing feature. Both the location and the velocity information are estimated
inside the state vector. Further, the little computation cost classification al-
gorithm provides the feasibility for real-time SLAM in dynamic environ-
ments.
    For dealing with more dynamic environments, applying the Constant
Acceleration Model would be a possible solution for dealing with high-
degree motion pattern objects. We plan to investigate the tracking perfor-
mance for the moving objects with more complicated motion pattern in the
future. Also, approaches for dealing with move-stop-move objects will be a


                                                                            30
5.1    CONCLUSION AND FUTURE WORK


further research interest. In addition, SLAM with generalized object using
a stereo camera could be further studied.




                                                                       31
BIBLIOGRAPHY




BIBLIOGRAPHY




Civera, J., Davison, A. J., & Montiel, J. M. M. (2008). Inverse depth para-
  metrization for monocular SLAM. IEEE Transactions on Robotics, 24(5),
  932–945.
Davison, A. J., Reid, I. D., Molton, N. D., & Stasse, O. (2007). Monoslam:
  Real-time single camera slam. IEEE Transactions on Pattern Analysis and
  Machine Intelligence, 29(6), 1052–1067.
Hartley, R. & Zisserman, A. (2004). Multiple View Geometry in Computer Vi-
  sion. Cambridge University Press.
Lemaire, T., Berger, C., Jung, I.-K., & Lacroix, S. (2007). Vision-based slam:
  Stereo and monocular approaches. International Journal of Computer Vision,
  74(3), 343–364.
Migliore, D., Rigamonti, R., Marzorati, D., Matteucci, M., & Sorrenti, D. G.
  (2009). Use a single camera for simultaneous localization and mapping
  with mobile object tracking in dynamic environments. In ICRA Work-
  shop on Safe navigation in open and dynamic environments: Application to au-
  tonomous vehicles.
Montiel, J. M. M., Civera, J., & Davison, A. J. (2006). Unified inverse depth
  parametrization for monocular slam. In Robotics: Science and Systems,
  Philadelphia, USA.



                                                                           32
BIBLIOGRAPHY


Parsley, M. P. & Julier, S. J. (2008). Avoiding negative depth in inverse depth
  bearing-only SLAM. In IEEE/RSJ International Conference on Intelligent
  Robots and Systems (IROS), (pp. 2066–2071)., Nice, France.
Sola, J. (2007). Towards Visual Localization, Mapping and Moving Objects Track-
  ing by a Mobile Robot: a Geometric and Probabilistic Approach. PhD thesis,
  Institut National Polytechnique de Toulouse.
Vidal-Calleja, T., Bryson, M., Sukkarieh, S., Sanfeliu, A., & Andrade-Cetto,
  J. (2007). On the observability of bearing-only slam. In IEEE International
  Conference on Robotics and Automation (ICRA), (pp. 4114–4119)., Roma,
  Italy.
Wangsiripitak, S. & Murray, D. W. (2009). Avoiding moving outliers in vi-
  sual slam by tracking moving objects. In IEEE International Conference on
  Robotics and Automation (ICRA), (pp. 375–380)., Kobe, Japan.




                                                                            33
BIBLIOGRAPHY




Document Log:


                       Manuscript Version 1 — 19 July 2010
                    Typeset by AMS-L TEX — 19 August 2010
                                    A




                                 C HEN -H AN H SIAO




   T HE R OBOT P ERCEPTION AND L EARNING L AB .,           D EPARTMENT OF C OMPUTER S CI -
ENCE AND I NFORMATION E NGINEERING , N ATIONAL             TAIWAN U NIVERSITY, N O .1, S EC .
4, R OOSEVELT R D ., D A - AN D ISTRICT, TAIPEI   C ITY, 106, TAIWAN , Tel. : (+886) 2-3366-
4888 EXT.407
     E-mail address: r97922120@ntu.edu.tw



                                                                 Typeset by AMS-L TEX
                                                                                 A


                                                                                          34

More Related Content

What's hot

Automatic Sea Turtle Nest Detection via Deep Learning.
Automatic Sea Turtle Nest Detection via Deep Learning.Automatic Sea Turtle Nest Detection via Deep Learning.
Automatic Sea Turtle Nest Detection via Deep Learning.Ricardo Sánchez Castillo
 
Brownian motion calculus
Brownian motion calculusBrownian motion calculus
Brownian motion calculuslvzhou1009
 
Sparse feature analysis for detection of clustered microcalcifications in mam...
Sparse feature analysis for detection of clustered microcalcifications in mam...Sparse feature analysis for detection of clustered microcalcifications in mam...
Sparse feature analysis for detection of clustered microcalcifications in mam...Wesley De Neve
 
GE4230 Micromirror Project 2
GE4230 Micromirror Project 2GE4230 Micromirror Project 2
GE4230 Micromirror Project 2Jon Zickermann
 
MSc_thesis_OlegZero
MSc_thesis_OlegZeroMSc_thesis_OlegZero
MSc_thesis_OlegZeroOleg Żero
 
Achieving undelayed initialization in monocular slam with generalized objects...
Achieving undelayed initialization in monocular slam with generalized objects...Achieving undelayed initialization in monocular slam with generalized objects...
Achieving undelayed initialization in monocular slam with generalized objects...Chen-Han Hsiao
 
Quantum Variables in Finance and Neuroscience Lecture Slides
Quantum Variables in Finance and Neuroscience Lecture SlidesQuantum Variables in Finance and Neuroscience Lecture Slides
Quantum Variables in Finance and Neuroscience Lecture SlidesLester Ingber
 
Characterization methods - Nanoscience and nanotechnologies
Characterization methods - Nanoscience and nanotechnologiesCharacterization methods - Nanoscience and nanotechnologies
Characterization methods - Nanoscience and nanotechnologiesNANOYOU
 
3D Seismic Attribute Analysis in Browse Basin, Australia
3D Seismic Attribute Analysis in Browse Basin, Australia3D Seismic Attribute Analysis in Browse Basin, Australia
3D Seismic Attribute Analysis in Browse Basin, Australiamohammed abdalaal
 
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Michail Argyriou
 

What's hot (19)

Automatic Sea Turtle Nest Detection via Deep Learning.
Automatic Sea Turtle Nest Detection via Deep Learning.Automatic Sea Turtle Nest Detection via Deep Learning.
Automatic Sea Turtle Nest Detection via Deep Learning.
 
Brownian motion calculus
Brownian motion calculusBrownian motion calculus
Brownian motion calculus
 
thesis
thesisthesis
thesis
 
Cs665 writeup
Cs665 writeupCs665 writeup
Cs665 writeup
 
Dissertation
DissertationDissertation
Dissertation
 
Sparse feature analysis for detection of clustered microcalcifications in mam...
Sparse feature analysis for detection of clustered microcalcifications in mam...Sparse feature analysis for detection of clustered microcalcifications in mam...
Sparse feature analysis for detection of clustered microcalcifications in mam...
 
GE4230 Micromirror Project 2
GE4230 Micromirror Project 2GE4230 Micromirror Project 2
GE4230 Micromirror Project 2
 
MSc_thesis_OlegZero
MSc_thesis_OlegZeroMSc_thesis_OlegZero
MSc_thesis_OlegZero
 
Diffraction grating
Diffraction grating Diffraction grating
Diffraction grating
 
Achieving undelayed initialization in monocular slam with generalized objects...
Achieving undelayed initialization in monocular slam with generalized objects...Achieving undelayed initialization in monocular slam with generalized objects...
Achieving undelayed initialization in monocular slam with generalized objects...
 
Quantum Variables in Finance and Neuroscience Lecture Slides
Quantum Variables in Finance and Neuroscience Lecture SlidesQuantum Variables in Finance and Neuroscience Lecture Slides
Quantum Variables in Finance and Neuroscience Lecture Slides
 
Characterization methods - Nanoscience and nanotechnologies
Characterization methods - Nanoscience and nanotechnologiesCharacterization methods - Nanoscience and nanotechnologies
Characterization methods - Nanoscience and nanotechnologies
 
3D Seismic Attribute Analysis in Browse Basin, Australia
3D Seismic Attribute Analysis in Browse Basin, Australia3D Seismic Attribute Analysis in Browse Basin, Australia
3D Seismic Attribute Analysis in Browse Basin, Australia
 
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
 
Nityanand gopalika Patent3
Nityanand gopalika Patent3Nityanand gopalika Patent3
Nityanand gopalika Patent3
 
Hoifodt
HoifodtHoifodt
Hoifodt
 
Erlangga
ErlanggaErlangga
Erlangga
 
mscthesis
mscthesismscthesis
mscthesis
 
thesis
thesisthesis
thesis
 

Viewers also liked

Cultura dos Povos - Sensibilização - Paris
Cultura dos Povos - Sensibilização - ParisCultura dos Povos - Sensibilização - Paris
Cultura dos Povos - Sensibilização - ParisMargit Didjurgeit
 
What Angles Tell Us 2010
What Angles Tell Us 2010What Angles Tell Us 2010
What Angles Tell Us 2010WSU
 
What Angles Tell Us 2450
What Angles Tell Us 2450What Angles Tell Us 2450
What Angles Tell Us 2450WSU
 
Intro legal practice not for al rsp2013
Intro legal practice   not for al rsp2013Intro legal practice   not for al rsp2013
Intro legal practice not for al rsp2013Lee Peoples
 
History for 5th Graders by Eric Fohl
History for 5th Graders by Eric FohlHistory for 5th Graders by Eric Fohl
History for 5th Graders by Eric FohlEric Fohl
 
Group Events at Brunswick Zone XL
Group Events at Brunswick Zone XLGroup Events at Brunswick Zone XL
Group Events at Brunswick Zone XLlindsaymariebartels
 
Guide privatelawlibraryppt
Guide privatelawlibrarypptGuide privatelawlibraryppt
Guide privatelawlibrarypptLee Peoples
 
Intro legal practice not for alr
Intro legal practice   not for alrIntro legal practice   not for alr
Intro legal practice not for alrLee Peoples
 

Viewers also liked (17)

Cultura dos Povos - Sensibilização - Paris
Cultura dos Povos - Sensibilização - ParisCultura dos Povos - Sensibilização - Paris
Cultura dos Povos - Sensibilização - Paris
 
110 309-1-pb
110 309-1-pb110 309-1-pb
110 309-1-pb
 
What Angles Tell Us 2010
What Angles Tell Us 2010What Angles Tell Us 2010
What Angles Tell Us 2010
 
Alona
AlonaAlona
Alona
 
Alona
AlonaAlona
Alona
 
What Angles Tell Us 2450
What Angles Tell Us 2450What Angles Tell Us 2450
What Angles Tell Us 2450
 
Intro legal practice not for al rsp2013
Intro legal practice   not for al rsp2013Intro legal practice   not for al rsp2013
Intro legal practice not for al rsp2013
 
History for 5th Graders by Eric Fohl
History for 5th Graders by Eric FohlHistory for 5th Graders by Eric Fohl
History for 5th Graders by Eric Fohl
 
Brunswickzonexlevents
BrunswickzonexleventsBrunswickzonexlevents
Brunswickzonexlevents
 
Group Events at Brunswick Zone XL
Group Events at Brunswick Zone XLGroup Events at Brunswick Zone XL
Group Events at Brunswick Zone XL
 
Guide privatelawlibraryppt
Guide privatelawlibrarypptGuide privatelawlibraryppt
Guide privatelawlibraryppt
 
Brightoflipbook
BrightoflipbookBrightoflipbook
Brightoflipbook
 
Intro legal practice not for alr
Intro legal practice   not for alrIntro legal practice   not for alr
Intro legal practice not for alr
 
China Trip
China TripChina Trip
China Trip
 
Planitario
PlanitarioPlanitario
Planitario
 
Html5 em 15_minutos
Html5 em 15_minutosHtml5 em 15_minutos
Html5 em 15_minutos
 
Sap abap certification questions and answers
Sap abap certification   questions and answersSap abap certification   questions and answers
Sap abap certification questions and answers
 

Similar to Monocular simultaneous localization and generalized object mapping with undelayed initialization

Masters Thesis - Joshua Wilson
Masters Thesis - Joshua WilsonMasters Thesis - Joshua Wilson
Masters Thesis - Joshua WilsonJoshua Wilson
 
Master_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_LiuMaster_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_LiuJiaqi Liu
 
Au anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisAu anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisevegod
 
Derya_Sezen_POMDP_thesis
Derya_Sezen_POMDP_thesisDerya_Sezen_POMDP_thesis
Derya_Sezen_POMDP_thesisDerya SEZEN
 
Bachelor Thesis .pdf (2010)
Bachelor Thesis .pdf (2010)Bachelor Thesis .pdf (2010)
Bachelor Thesis .pdf (2010)Dimitar Dimitrov
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAPArjun Aravind
 
Seismic Tomograhy for Concrete Investigation
Seismic Tomograhy for Concrete InvestigationSeismic Tomograhy for Concrete Investigation
Seismic Tomograhy for Concrete InvestigationAli Osman Öncel
 
Maxim_Clarke_Thesis_Submission
Maxim_Clarke_Thesis_SubmissionMaxim_Clarke_Thesis_Submission
Maxim_Clarke_Thesis_SubmissionMaxim Clarke
 
Trade-off between recognition an reconstruction: Application of Robotics Visi...
Trade-off between recognition an reconstruction: Application of Robotics Visi...Trade-off between recognition an reconstruction: Application of Robotics Visi...
Trade-off between recognition an reconstruction: Application of Robotics Visi...stainvai
 
Multiscale Anomaly Detection and Image Registration Algorithms for Airborne L...
Multiscale Anomaly Detection and Image Registration Algorithms for Airborne L...Multiscale Anomaly Detection and Image Registration Algorithms for Airborne L...
Multiscale Anomaly Detection and Image Registration Algorithms for Airborne L...Jeff Barnes
 
Pranav_Shah_Report
Pranav_Shah_ReportPranav_Shah_Report
Pranav_Shah_ReportPranav Shah
 
Place Cell Latex report
Place Cell Latex reportPlace Cell Latex report
Place Cell Latex reportJacob Senior
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Alexander Zhdanov
 

Similar to Monocular simultaneous localization and generalized object mapping with undelayed initialization (20)

PhD_main
PhD_mainPhD_main
PhD_main
 
PhD_main
PhD_mainPhD_main
PhD_main
 
PhD_main
PhD_mainPhD_main
PhD_main
 
Masters Thesis - Joshua Wilson
Masters Thesis - Joshua WilsonMasters Thesis - Joshua Wilson
Masters Thesis - Joshua Wilson
 
Thesis_Prakash
Thesis_PrakashThesis_Prakash
Thesis_Prakash
 
Master_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_LiuMaster_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_Liu
 
Au anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisAu anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesis
 
Derya_Sezen_POMDP_thesis
Derya_Sezen_POMDP_thesisDerya_Sezen_POMDP_thesis
Derya_Sezen_POMDP_thesis
 
Bachelor Thesis .pdf (2010)
Bachelor Thesis .pdf (2010)Bachelor Thesis .pdf (2010)
Bachelor Thesis .pdf (2010)
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAP
 
JJenkinson_Thesis
JJenkinson_ThesisJJenkinson_Thesis
JJenkinson_Thesis
 
Seismic Tomograhy for Concrete Investigation
Seismic Tomograhy for Concrete InvestigationSeismic Tomograhy for Concrete Investigation
Seismic Tomograhy for Concrete Investigation
 
Maxim_Clarke_Thesis_Submission
Maxim_Clarke_Thesis_SubmissionMaxim_Clarke_Thesis_Submission
Maxim_Clarke_Thesis_Submission
 
20120112-Dissertation7-2
20120112-Dissertation7-220120112-Dissertation7-2
20120112-Dissertation7-2
 
Trade-off between recognition an reconstruction: Application of Robotics Visi...
Trade-off between recognition an reconstruction: Application of Robotics Visi...Trade-off between recognition an reconstruction: Application of Robotics Visi...
Trade-off between recognition an reconstruction: Application of Robotics Visi...
 
Multiscale Anomaly Detection and Image Registration Algorithms for Airborne L...
Multiscale Anomaly Detection and Image Registration Algorithms for Airborne L...Multiscale Anomaly Detection and Image Registration Algorithms for Airborne L...
Multiscale Anomaly Detection and Image Registration Algorithms for Airborne L...
 
Pranav_Shah_Report
Pranav_Shah_ReportPranav_Shah_Report
Pranav_Shah_Report
 
Place Cell Latex report
Place Cell Latex reportPlace Cell Latex report
Place Cell Latex report
 
Mak ms
Mak msMak ms
Mak ms
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
 

Monocular simultaneous localization and generalized object mapping with undelayed initialization

  • 1. 國立臺灣大學電機資訊學院資訊工程學系 碩士論文 Department of Computer Science and Information Engineering College of Electrical Engineering and Computer Science National Taiwan University Master Thesis 以單一攝影機完成同步定位、地圖建置與物體追蹤之 非延遲初始化演算法 Monocular Simultaneous Localization and Generalized Object Mapping with Undelayed Initialization 蕭辰翰 Chen-Han Hsiao 指導教授:王傑智 博士 Advisor: Chieh-Chih Wang, Ph.D. 中華民國 99 年 7 月 July, 2010
  • 2.
  • 3. 誌謝 經歷了兩年的研究生活,能順利完成這本論文,確實是件讓人很高興的事。 而這兩年的生活,接觸了系上許多非常有實力、經驗的教授、一起在系館、實驗 室研究的夥伴,還有業界一起合作計畫的工程師。過程中完成了一些 project, 也投出了品質不錯的 journal paper。對我來說,這真是成長很多的一段經歷。 在這段研究生活中,最先要感謝的是我的指導教授王傑智。從第一次聽到老 師演講關於實驗室有趣的研究,就感覺到老師對機器人的熱情。也在當下決定, 考上研究所後,要加入機器人知覺與學習實驗室。而這兩年跟著老師,除了學習 到受用的知識,我覺得更重要的,是老師對於研究時細節的掌握,還有處理事情 的條理,讓我在做事時有跟以往不同的想法。而撰寫論文時,老師給我的鞭策與 鼓勵,使我盡力去克服寫論文所遇到的障礙,使我順利完成論文。而另外四位口 試委員,傅立成教授、莊永裕教授、黃漢邦教授、林達德教授,則是給了我很受 用的建議,讓我的研究能有更多發展的空間。 而實驗室中,幫助我最大的是 Casey 學長與昆翰學長。從碩一剛進實驗室, Casey 學長就帶領我學習實驗室中的各項事務,讓我能漸漸熟悉實驗室的研究。 這兩年中,幾乎是跟著 Casey 一起完成了每一項 project,也一起為了 VSLAM 的 journal paper 努力將近一年的時間。Casey 認真負責但又能讓人輕鬆相處的個 性,是我很欽佩的人。而坐我隔壁的昆翰學長,在一起研究 VSLAM 的過程之中, 也讓我有很多新觀念,跟昆翰一起討論研究上的問題,常常能夠釐清很多原先不 清楚的想法。而同屆的崇瀚,則是一起經歷了研究生活中的苦悶與樂趣,也互相 在研究生活中幫忙。以後有空也要再一起好好討論喜歡的音樂跟電影。另外,總 是給實驗室帶來歡樂的氣氛的 Nicole 學姐;具有穩重氣息的國輝學長;做事認 真、才氣十足的 Alan、紹丞、顥學;設計過令我十分佩服的德州撲克演算法的 Jimmy 學長;去美國唸書的維均學長跟懿柳學姐;熱愛登山健行的郁君;Andi、 Any、俊甫跟宗哲等等實驗室的前輩,感謝有你們,讓我在實驗室學習大家累積 的經驗還有豐富的知識,也感謝你們曾給我過的幫助。 認識了 9 年的高中好友段佳宏、林昭辰,經常在我遇到低潮給我鼓勵。很感 謝他們,使我能克服人生中的不順利。大學好友徐國鐘、譚立暉則是從大學以來 的夥伴 一起去玩過台灣很多地方 也一起從數學系轉換跑道 各自到了資工所、 , , , 財金所,很高興大家都有好的表現,以後要一起在台北工作,相信大家也能繼續 有好成績的。還有,擁有共同興趣的樂團朋友、單車社夥伴,跟你們一起合作、 遠遊的回憶令人感覺人生的美好。 最後要感謝我的家人,我的爺爺奶奶把我帶大、教導我,讓我有良好的價值 觀。而爸爸媽媽辛苦的給了我好的環境、空間,讓我能發揮所長、完成了碩士學 位。 暫時會結束學生的身份、離開學校。但我很高興能在台大待了六年,在數學 系當了四年的大學生,在資工所當了兩年的研究生。每次走在寬廣的椰林大道,
  • 4. 心情總是非常開闊而舒服。這六年,台大給了我自由的環境來學習、發揮,並促 使我去找尋自己真的想要的方向。儘管並非一切順利,但在這樣的過程中,越能 夠確定自己內心的想法 了解自己內心所希望堅持的理想 我很喜歡在這校園中 , 。 , 擁有各式各樣不同生活背景、不同生活目標的人,互相在這裡激發出彼此更大的 能量。感謝所有影響過我的人,謝謝你們讓我變的更美好。 2010/8/16 蕭辰翰 於 台大資工系 R407
  • 5. 摘要 已有不少基於卡爾曼濾波器的研究結果展示了使用單一相機來進行同步定位、 建立地圖(SLAM)的可行性。然而,較少研究探討 SLAM 在動態環境中的可行性。為 了能在動態環境中同時建立靜態與動態地圖,我們提出一個基於卡爾曼濾波器的 演算法架構及新的參數表示法來整合移動物體。藉由新的參數表示法,我們的演 算法能同時估測環境中的靜態物體及動態物體,而達到廣泛物體的地圖建製(SLAM with generalized objects)。這樣的參數表示法繼承了倒數深度表示法(Inverse depth parametrization)的優點,像是較大範圍的距離估測、較佳的線性化參數 表示。目前關於 SLAM 在動態環境中的研究,皆需要數筆測量以確保物體的靜止性 質,再延遲的進行物體初始化。而我們的參數表示法允許無延遲的物體初始化, 使得我們的演算法能利用每一筆的測量而獲得更好的估測。同時,我們也提出了 一個低運算量的動態、靜態物體分類演算法。模擬實驗顯示了我們演算法的準確 性。而真實環境實驗也顯示了我們的演算法能在室內動態環境成功的進行廣泛物 體的地圖建製(SLAM with generalized objects)。
  • 6. MONOCULAR SIMULTANEOUS LOCALIZATION AND GENERALIZED OBJECT MAPPING WITH UNDELAYED INITIALIZATION Chen-Han Hsiao Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan July 2010 Submitted in partial fulfilment of the requirements for the degree of Master of Science Advisor: Chieh-Chih Wang Thesis Committee: Chieh-Chih Wang (Chair) Li-Chen Fu Yung-Yu Chuang Han-Pang Huang Ta-Te Lin c C HEN -H AN H SIAO, 2010
  • 7. ABSTRACT ABSTRACT R ECENT works have shown the feasibility of the extended Kalman filtering(EKF) approach on simultaneous localization and map- ping (SLAM) with a single camera. However, few approaches have addressed the solutions for the insufficiency of SLAM to deal with dynamic environments. For accomplishing SLAM in dynamic environments, we proposed a unified framework based on a new para- metrization for both static and non-static point features. By applying the new parametrization, the algorithm is able to integrate moving features and thus achieve monocular SLAM with generalized objects. The new para- metrization inherits good properties of the inverse depth parametrization such as the ability to adopt large range of depths and better linearity. In addition, the new parametrization allows undelayed feature initialization. Contrary to the existing SLAM algorithms with delayed initialization ap- proach which takes some measurements for the classification usage, our SLAM with generalized objects algorithm and undelayed initialization al- gorithm would utilize each measurement on point features for filtering and has a better estimation of the environment. A low computational classifica- tion algorithm to distinguish static and moving features is also presented. Simulations shows high accuracy of our classification algorithm and esti- mation about features. We also demonstrate the success of our algorithm with real image sequence captured from an indoor environment. ii
  • 8. TABLE OF CONTENTS TABLE OF CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi CHAPTER 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1 1.1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 2. STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT . . . . . . . . . . . . . 4 2.1. STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1. State Vector Definition . . . . . . . . . . . . . . . . . . . . 4 2.1.2. Dynamic Inverse Depth Parametrization . . . . . . . . . . 5 2.1.3. Undelayed Feature Initialization . . . . . . . . . . . . . . 7 CHAPTER 3. STATIC AND MOVING OBJECT CLASSIFICATION . 8 3.1. STATIC AND MOVING OBJECT CLASSIFICATION . . . . . 8 3.1.1. Velocity Convergency . . . . . . . . . . . . . . . . . . . . 8 3.1.2. Define Score Function for Classification . . . . . . . . . . 9 3.1.3. Classification State . . . . . . . . . . . . . . . . . . . . . . 11 3.1.4. Issue on unobservable situations . . . . . . . . . . . . . . 12 CHAPTER 4. EXPERIMENTAL RESULTS . . . . . . . . . . . . . . . 16 4.1. EXPERIMENTAL RESULTS . . . . . . . . . . . . . . . . . . . 16 4.1.1. Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1.2. Real Experiments . . . . . . . . . . . . . . . . . . . . . . . 22 CHAPTER 5. CONCLUSION AND FUTURE WORK . . . . . . . . . 30 5.1. CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . 30 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 iii
  • 9. LIST OF FIGURES LIST OF FIGURES 2.1 rWC denotes the camera position, and qWC denotes quaternion defining orientation of camera. Moving Object is coded with the dynamic inverse depth parametrization. . . . . . . . . . . . . . . . 6 3.1 Velocity convergency of 3 target features under observable condition 10 3.2 Velocity convergency of 3 target features under unobservable condition . . . . . . . . . . . . . . . . . . . . 14 4.1 Effect of different classified threshold on the classification result . 17 4.2 Convergency of our SLAM algorithm shown with boxplot. The lower quartile, median, and upper quartile values of each box shows the distribution of the estimation error of all the objects in each observed frame. (a)estimation error of the camera increases when exploring(frame 1 to frame 450) and decreases when close- loop(frame 450 to frame 600) (b)estimation error of the static features decreases with numbers of observed frame (c)estimation error of the moving features decreases with numbers of observed frame. . . . 21 4.3 The NTU PAL7 robot. Real data experiment platform of monocular SLAM with generalized object. . . . . . . . . . . . . . . . . . . . . 22 4.4 The basement of the CSIE department at NTU. Green/grey dots show the map built using the laser scanner. . . . . . . . . . . . . . 23 4.5 The image sequence collected in the basement and the corresponding monocular SLAMMOT results. Figures 4.5(a), 4.5(c), 4.5(e), 4.5(g), 4.5(i), 4.5(k), 4.5(m), and 4.5(o) show the results of feature extraction and association. Figures 4.5(b), 4.5(d), 4.5(f), 4.5(h), 4.5(j), 4.5(l), 4.5(n), and 4.5(p) show the monocular SLAM with generalized object results in which black and grey triangles and lines indicate the camera poses and trajectories from monocular SLAM with generalized object and LIDAR-based SLAMMOT. Gray points show iv
  • 10. LIST OF FIGURES the occupancy grid map from LIDAR-based SLAM. All the estimation of visual features are inside the reasonable cube. . . . . . . . . . . 27 4.6 The result of the SLAM part of monocular SLAM with generalized object. The definitions of symbols are the same as Figure. 4.5. There are 107 stationary features in the state vector of monocular SLAM with generalized object. . . . . . . . . . . . . . . . . . . . . . . . . 29 v
  • 11. LIST OF TABLES LIST OF TABLES 4.1 Total classification result of 50 Monte Carlo simulation in the observable condition with the threshold ts = 1.6 . . . . . . . . . . . . . . . . . 19 4.2 Total classification result of real experiment . . . . . . . . . . . . . 23 vi
  • 12. 1.1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1. INTRODUCTION Recently, SLAM using a monocular or stereo camera as the only sen- sor has been proven feasible and thus become popular in robotics (Davison et al., 2007; Lemaire et al., 2007). To overcome the weakness of the XYZ encoding system in Davison et al.’s approach, Montiel et al. proposed an inverse depth parameterization approach (Montiel et al., 2006; Civera et al., 2008). Montiel et al.’s approach shows a better Gaussian property for the EKF algorithm, and a non-delayed initialization procedure increasing the speed of convergence. The inverse depth parameterization also provides the feasibility for estimating a feature at potentially infinite. However, the inverse depth parameterization is only defined for positive depth. The in- verse depth of a feature may converge to negative value and therefore cause a catastrophic failure (Parsley & Julier, 2008). Several attempts have been made to solve the SLAM problem in dy- namic environments. Sola discussed the observability issue of bearing-only tracking and proposed using two cameras to solve SLAM and moving ob- ject tracking with some heuristics for detecting moving object in specific scenarios (Sola, 2007). Wangsiripitak and Murray (Wangsiripitak & Murray, 1
  • 13. 1.1 INTRODUCTION 2009) presented an approach to recover the geometry of known 3D moving objects and avoid the effect of wrongly deleting the occluded features. In their approach, manual operations such as deleting features on non-static objects are needed. Migliore et al. (Migliore et al., 2009) demonstrated a monocular SLAMMOT system with a separated SLAM filter and a mov- ing object tracking filter. The classification for the moving object is based on the Uncertain Projective Geometry (Hartley & Zisserman, 2004). Vidal- Calleja et al. analyzed the observability issue of bearing-only SLAM sys- tems and identified the motion for maximizing the number of observable states (Vidal-Calleja et al., 2007). However, the current approaches about SLAMMOT decouple the tracking part and the SLAM part. Also, the clas- sification between moving objects and static objects takes several steps. Ex- isting approaches adapted delayed initialization and thus the estimation of SLAM cannot benefit from the observation information during the classifi- cation steps. In this thesis, we propose a framework for monocular Simultaneous Localization and Generalized Object Mapping. This research presents an un-delayed initialization approach and a better classification method. Both the simulation and real experiment will be demonstrated and evaluated. The static environment assumption will not be needed in our approach. In chapter 2, we define the state vector for EKF SLAM with SLAM with generalized object, especially the proposed parametrization for landmarks in dynamic environments. The un-delayed initialization method is also il- lustrated. Chapter 3 gives the detail of our classification algorithm for dis- tinguishing static features and moving features. In addition, the observabil- ity issue for classification is discussed. In chapter 4, both the simulation 2
  • 14. 1.1 INTRODUCTION and real experimental results are provided to show the performance of our approach. 3
  • 15. 2.1 STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT CHAPTER 2 STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT 2.1. STATE VECTOR DEFINITION IN SLAM WITH GEN- ERALIZED OBJECT 2.1.1. State Vector Definition To build a feature-based map, we applied the Extended Kalman Filter (EKF) based Simultaneous Localization and Mapping (SLAM) algorithm. Following the standard EKF SLAM, we maintain a state vector containing a pose of the camera and locations of features. ⊤ ⊤ χ = (x⊤ , o1 , o2 , . . . , on ⊤ )⊤ k k k k (2.1) The variable xk is composed of rW camera position, qW quaternion defining orientation, vW velocity and ω C angular velocity. rW  qW  xk =  W  v  (2.2) ωC The constant velocity and constant angular velocity motion model de- rived from Montiel’s approach is applied in our monocular system (Mon- tiel et al., 2006). For the generalized object oi , the encoding parametrization k 4
  • 16. 2.1 STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT could be the inverse depth parametrization for static features or the dy- namic inverse depth parametrization for both static and moving features, which is composed of the position and the velocity. The proposed parame- trization will be introduced in section 2.1.2. 2.1.2. Dynamic Inverse Depth Parametrization Landmarks in dynamic environments may not be stationary. Thus, para- metrization containing only position is not enough for the non-static land- marks. To represent a dynamic environment, we come up with dynamic inverse parametrization combining the inverse depth parametrization and the 3-axis velocities to model each landmark. Each landmark is coded with the 9-dimension state vector. ⊤ ⊤ oi = k i ok vi k ⊤ = xk yk zk θk φk ρk vx vy vz k k k (2.3) i ok is the 3D location of the i-th landmark with the inverse depth parame- y trization, and vi = (vx vk vz )⊤ denotes the 3-axis velocities in the world k k k coordinate system. The 3D location of the feature with respect to the XYZ coordinate is:     Xi xk Yi  = loc(oi ) = yk  + 1 × G (θk , φk ) (2.4) Zi k zk ρk In the prediction stage of the EKF algorithm, the features are predicted by applying the constant velocity model as illustrated in Figure. 2.1. The prediction state of the features can be calculated in a closed form: 5
  • 17. 2.1 STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT k+1 = loc(ok ) + vk · ∆t oi i i (2.5) 1 = ri + Gk + vi · ∆t ρk k 1 = ri + Gk+1 ρk+1 where Gk and Gk+1 are the directional vectors. Figure 2.1. rWC denotes the camera position, and qWC denotes quaternion defining orientation of camera. Moving Ob- ject is coded with the dynamic inverse depth parametri- zation. In the update stage of the EKF algorithm, the measurement model of the features is also derived from Montiel et al.’s approach. In our approach, each feature is either coded with the inverse depth parametrization or the 6
  • 18. 2.1 STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT dynamic inverse depth parametrization. Position of all the features is repre- sented by the inverse depth parametrization, thus we follow the measure- ment model proposed in Montiel et al.’s approach. 2.1.3. Undelayed Feature Initialization As the dynamic inverse depth parametrization is an extension of the inverse depth parametrization. The initial values for the position of a new feature can be calculated from rWC , qWC , h = ( u v )⊤ , ρ0 as in Montiel et ˆ ˆ al.’s approach. For the initial value in velocity, v0 is set to be 0. And for the covariance value in velocity, σv is designed to cover its 95% acceptance region [−|v|max , |v|max ]. So: |v|max σv = (2.6) 2 The initial state of an observed feature is y(ˆWC , qWC , h, ρ0 , v0 ) = (xi , yi , zi , θi , φi , ρi , v0 )⊤ ˆ r ˆ ˆ ˆ ˆ ˆ ˆ ˆ (2.7) After adding the feature into the state vector, the state variance Pk|k becomes ˆ   Pk|k 0 0 0  0 Rj 0 0  ⊤ ˆ new Pk|k = J  J  0 0 σρ 0  2 0 0 0 σv 2 I 0 J= ∂y ∂y ∂y ∂y ∂y , ∂ rWC ∂ qWC , 0, . . . , 0, ∂h, ∂ρ , ∂v By using the dynamic inverse depth parametrization, we are able to add a new feature into the state vector at the first observed frame. Through the undelayed feature initialization, our monocular system would uses each measurement on the point features to estimate both the camera position and the feature locations and gets better estimations. 7
  • 19. 3.1 STATIC AND MOVING OBJECT CLASSIFICATION CHAPTER 3 STATIC AND MOVING OBJECT CLASSIFICATION 3.1. STATIC AND MOVING OBJECT CLASSIFICATION Retaining stationary features in the map would be needed for getting better estimation or for the usage of close loop. Hence, classification is nec- essary. We propose a classification method of low-computation cost based on the estimation of the velocity state of the features in this chapter. 3.1.1. Velocity Convergency We run simulation experiments with the dynamic inverse depth para- metrization and the undelayed feature initialization technique discussed in 2.1.1 to verify the velocity convergency of the features in dynamic environ- ments. In the environment, there were 40 static landmarks and 2 moving landmarks. 39 static landmarks were added to the state vector using the in- verse depth parametrization as known features. One static landmark (target 1) and two moving landmarks (target 2 and target 3) were initialized at the first observed frame with the dynamic inverse depth parametrization and added to the state vector. The camera trajectory was designed as a helix. We checked the velocity distribution of these 3 landmarks (target 1, target 2 8
  • 20. 3.1 STATIC AND MOVING OBJECT CLASSIFICATION and target 3) coded in the dynamic inverse depth parametrization after 150 EKF steps. Figure. 3.1 shows the velocity convergency. In this simulation example, we found that velocity distribution of the features converged and thus providing useful information for classifying the type of these features. Hence we developed a classification algorithm based on the estimation of the velocity distribution. 3.1.2. Define Score Function for Classification 3.1.2.1. score function for classifying static objects. To classify the features as static or moving, we define a score function mapping a velocity distribution to a score value. Then we use the score to determine the feature as static or moving. Given a 3-dimension velocity distribution X = N (µ , Σ), the score function is defined as: 1 Cs (X) = fX (0) = (3.1) −1 (2π )3/2 |Σ|1/2e 2 (0−µ )⊤ Σ−1 (0−µ ) fX is the probability density function of Gaussian distribution X. That is, the score function calculates the probability density function value of the velocity distribution at (0, 0, 0)⊤ . The score reveals the relative likelihood of the velocity variable to occur at (0, 0, 0)⊤ . For a static feature oi , the velocity vi is expected to converge closed to k k (0, 0, 0)⊤ . The score would thus increases and exceeds threshold ts and then helps the monocular system to classified static objects. 3.1.2.2. score function for classifying moving objects. For classify- ing each object as either static type or moving type, we further define the score function for classifying moving objects. Given a 3-dimension velocity distribution X = N (µ , Σ), the score function is defined as: Cm (X) = DX (0) = 2 (0 − µ )⊤ Σ−1 (0 − µ ) (3.2) 9
  • 21. 3.1 STATIC AND MOVING OBJECT CLASSIFICATION (a) Target 1 (static object marked with green circle) under observable condition. Ground- truth velocity of the target v = (0, 0, 0) (b) Target 2 (moving object marked with green circle) under observable condition. Ground- truth velocity of the target v = (1, 0, 0) (c) Target 3 (moving object marked with green circle) under observable condition. Ground- truth velocity of the target v = (0, 0, 0.5) Figure 3.1. Velocity convergency of 3 target features under observ- able condition 10
  • 22. 3.1 STATIC AND MOVING OBJECT CLASSIFICATION DX is the Mahalanobis distance function under distribution X. Mahalanobis distance could be used to detect outlier. Thus, we can check whether the point (0, 0, 0)⊤ is an outlier of the distribution. For a moving feature oi , the velocity vi is expected to converge away k k from (0, 0, 0)⊤ . The score would thus increases and exceeds threshold tm and then helps the monocular system to classified moving objects. 3.1.3. Classification State With the dynamic inverse depth parametrization and the proposed clas- sification algorithm, SLAM with generalized object could be implement as following. Each feature is initialized at the first observed frame with the dy- namic inverse depth parametrization and labeled as unknown state. In each of the following observed frame, we examine the estimated distribution of the feature using the two score functions. 3.1.3.1. from unknown state to static state. If we find that the score value Cs (X) of a unknown-state feature exceeds the threshold ts at a certain frame, we immediately classify the feature as static object and label the fea- ture as static. Also, the velocity distribution of the feature is adjusted to satisfied the property of a static object. The velocity is set to (0, 0, 0)⊤ and the correspond covariance covariance is set to 0. After the feature is classi- fied as static, we also assume the feature is static and make not prediction at the prediction stage to ensure the velocity of the object fixed at (0, 0, 0)⊤ . The trasition process can be expressed as a function: f ( xk yk zk θk φk ρk vx vk vz ) y k k (3.3) = xk yk zk θk φk ρk 0 0 0 11
  • 23. 3.1 STATIC AND MOVING OBJECT CLASSIFICATION In fact, the feature coded in the 9-dimension dynamic inverse depth parametrization contribute to the estimation process as the same as a static feature coded in 6-dimension inverse depth parametrization. Also, the tran- sition from an unknown state to a static state makes the covariance matrix become sparse and thus reduce the computation cost in the implementation of SLAM with generalized object. The computation complexity for a static feature coded in the dynamic inverse depth parametrization with zero ve- locity is the same as the computation complexity for a static feature coded in the inverse depth parametrization. 3.1.3.2. from unknown state to moving state. If we find that the score value Cm (X) of an unknown-state feature exceeds the threshold tm at a certain frame, we immediately classify the feature as a moving object and la- bel the feature as moving. Note that the feature has been initialized with the dynamic inverse depth parametrization, both the position and the velocity are already being estimated thus there is no need to adjust the distribution and the motion model. ⊤ ⊤ Finally, the state vector χ = (x⊤ , o1 , o2 , . . . , on ⊤ )⊤ is composed of three k k k k types of features (unknown, static, moving). Different type of feature is ap- plied different motion model. Unknown-type and moving-type features are applied motion model with acceleration noise while the static-type features are applied stationary assumption. Thus a generalized object mapping ap- proach is achieved. 3.1.4. Issue on unobservable situations 3.1.4.1. Non-converged velocity distribution under unobservable sit- uations. Under unobservable situations, the monocular system cannot accurately estimate the location of a moving feature. Thus, the system also 12
  • 24. 3.1 STATIC AND MOVING OBJECT CLASSIFICATION cannot accurately estimate the velocity of the feature. We check the effect of unobservable situations on the proposed classification algorithm with sim- ulation. The scenario in simulation is set as the same as the scenario in Figure. 3.1. Same moving objects with same moving patterns are in the scenario ex- cept that the camera moves at constant speed. Note that under such camera trajectory, the projection of target 1 (static object) is the same as the projec- tion of target 3 (moving object). This condition matches the observability issue which means the disability of monocular system to find an unique trajectory of an object under the constant-velocity assumption on moving objects. We checked the velocity distribution of the 3 landmarks coded in the dynamic inverse depth paramatrization after 150 EKF steps. From Figure 3.2, we can find that velocity distribution of these three target objects do not converged. Three velocity distributions cover large area so that we can not ensure the velocity of these objects. 3.1.4.2. ambiguation of a static object and a parallel-moving object. The location distribution and velocity distribution of both target 1 and tar- get 3 are the same. While target 1 is static and target 3 is moving, we cannot distinguish the state of them according to the velocity distribution. In fact, the projection points of target 1 and target 3 are the same during these 150 frames. The estimation of these two objects must be the same. Thus, it is impossible to classify target 1 and target 3 as static or moving under the monocular system. This ambiguation could be extended to any static ob- ject since we can find a corresponding moving object whose projections are the same to the static object under unobservable situations. Note that such corresponding moving object must moves parallel to the camera. From the 13
  • 25. 3.1 STATIC AND MOVING OBJECT CLASSIFICATION (a) Target 1 (static object marked with green circle) under unobservable condition. Ground- truth velocity of the target v = (0, 0, 0) (b) Target 2 (moving object marked with green circle) under unobservable condition. Ground-truth velocity of the target v = (1, 0, 0) (c) Target 3 (moving object marked with green circle) under unobservable condition. Ground-truth velocity of the target v = (0, 0, 0.5) Figure 3.2. Velocity convergency of 3 target features under unobservable condition 14
  • 26. 3.1 STATIC AND MOVING OBJECT CLASSIFICATION simulation, we can understand the disability of classification under unob- servable situations. 3.1.4.3. Non parallel-moving object. However, velocity distribution of target 2 reveals another fact. 95% confidence region of the velocity es- timation do not cover origin point (0, 0, 0)⊤ . The zero velocity was filtered out with our SLAM with generalized object algorithm. Thus, the moving object would be classified as a moving object even under unobservable sit- uations. In fact, no static object would have the same projection as the non parallel-moving objects, which mean there is no ambiguation between static objects and non parallel-moving objects. Thus, we could find out the possi- ble range of velocity distribution by filtering technique and classified those non parallel-moving objects as moving under unobservable situations. 3.1.4.4. Assumptions for solving the unobservable issue. Although we have seen the disability of classification under unobservable situations, especially the ambiguation of a static object and a parallel-moving object due to the same projection in the monocular system, assumptions may help us to classify the object under the unobservable. For example, for an en- vironments that has little moving object moved parallel to the camera, am- biguation between static object and a parallel-moving object is solved. Clas- sification could be done using our algorithm. 15
  • 27. 4.1 EXPERIMENTAL RESULTS CHAPTER 4 EXPERIMENTAL RESULTS 4.1. EXPERIMENTAL RESULTS 4.1.1. Simulation 4.1.1.1. Effect of different threshold on the classification result. To evaluate the effect of different threshold on the classification, simulation ex- periments were conducted. The accuracy of classification will be compared under different thresholds. Since there are three possible states (unknown, static and moving) of an estimating feature, the wrongly classified error and the misclassified error are defined: wrongly classified error: A feature would be added with an unknown state. If the feature is finally classified as a different type as it should be, we say the feature is wrongly classified. For example, a static feature is classified as a moving feature or a moving feature is finally classified as a static feature. misclassified error: A feature would be added with an unknown state. If the feature is wrongly classified or not be classified as either static or moving type, we say the feature is misclassified. The simulated scenarios are shown in Figure. 4.1(a) and Figure.4.1(b). In Figure. 4.1(a), the camera moved with a non-constant speed on the circle 16
  • 28. 4.1 EXPERIMENTAL RESULTS to avoid the unobservability situation. In Figure. 4.1(b), the camera moved with a constant speed on four connected lines to test the performance under an unoberservable situation. 300 static landmarks and 288 moving land- marks were randomly located in a 3D cube with a width of 30 meters in each scenario. 50 Monte Carlo simulations of each scenario were run and evaluated. 25 25 20 20 15 15 Z[m] 10 Z[m] 10 5 5 0 0 −5 5 −5 5 0 0 −5 −5 Y[m] 25 Y[m] 25 20 20 15 15 10 10 X[m] 5 X[m] 5 0 0 −5 −5 (a) Observable scenario (b) Unobservable scenario misclassified ratio[moving object] misclassified ratio[moving object] misclassified ratio[static object] misclassified ratio[static object] wrongly classified [static object] wrongly classified [static object] 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 ts ts (c) Misclassified ratio in observable scenario (d) Misclassified ratio in unobservable sce- nario Figure 4.1. Effect of different classified threshold on the classification result Instead of using a ROC curve to present the performance, the relations between threshold and classified ratio are shown directly in Figure. 4.1(c) 17
  • 29. 4.1 EXPERIMENTAL RESULTS and 4.1(d). In both scenarios, the misclassified ratio of static features in- creases when the threshold ts increases, while the misclassified ratio of mov- ing features decreases. For example, misclassified ratio of static features is increasing from 0 to 0.5 when ts = 1 is increasing to ts = 3.5 under observ- able situations. Meanwhile, misclassified ratio of static features is increas- ing from 0.1 to 1 when ts = 1 is increasing to ts = 3.5 under unobservable situations. And misclassified ratio of moving features is decreasing from 0.03 to 0.01 when ts = 1 is increasing to ts = 3.5 under observable situations. Mean- while, misclassified ratio of moving features is decreasing from 0.03 to 0.01 when ts = 1 is increasing to ts = 3.5 under unobservable situations. This finding satisfied our expectation that a larger threshold ts would re- sult in less features classified as static feature. Thus, when a larger threshold ts is chosen, the misclassified ratio of static features would increase and mis- classified ratio of moving features would decrease. The trade-off between these two misclassified ratios should be considered according to the usage of the monocular system. Further, the classification performance is better under an observable sit- uations. By comparing Figure. 4.1(a) and Figure.4.1(b), it can be seen that misclassified ratio of static features is smaller under observable situations than the ratio under unobservable situations. The results match the dis- cussions in Sec 3.1. Thus, an observable situations would be necessary for better classification performance. However, we should note that a small portion of misclassified features are caused by wrongly classification. That means the classification algo- rithm does not provide incorrect information. It retains the unknown state 18
  • 30. 4.1 EXPERIMENTAL RESULTS of a feature when the information is insufficient and still letting the un- known state features contribute to our monocular system. By choosing the threshold ts = 1.6, we can obtain a good classification performance. Table 4.1 shows the classification result under the threshold ts = 1.6 in the observable condition. Thus, we choose the threshold ts = 1.6 in the following experiments for checking the convergency of our SLAM algorithm and the real experiment. classification state static moving unknown Static 9480 16 1 Moving 71 6734 30 Table 4.1. Total classification result of 50 Monte Carlo simulation in the observable condition with the threshold ts = 1.6 4.1.1.2. Convergency of our SLAM algorithm. We further checked the convergency of our SLAM algorithm in the observable scenario. The estimation error on the camera, static features and moving features of 50 Monte Carlo simulation results are shown with boxplot in Figure. 4.2. The estimation error on the camera increases when the robot is exploring the environment from frame 1 to frame 450. The camera starts to close loop from frame 450, thus the error decreases. The mean of the errors in Figure. 4.2(a) reveals the finding. During the procedure of SLAM, the estimation errors do not diverged. As presented in Figure. 4.2(b) and Figure. 4.2(c), estimation errors of both static features and moving features decrease when the number of estimated frame increases. 19
  • 31. 4.1 EXPERIMENTAL RESULTS estimation error of the camera 2 1.8 1.6 camera estimation error [m] 1.4 1.2 1 0.8 0.6 0.4 0.2 0 60 120 180 240 300 360 420 480 540 600 frame (a) estimation error of the camera estimation error of the static objects 15 static object estimation error [m] 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 frame (b) estimation error of the static objects 20
  • 32. 4.1 EXPERIMENTAL RESULTS estimation error of the moving objects 15 moving object estimation error [m] 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 frame (c) estimation error of the moving objects Figure 4.2. Convergency of our SLAM algorithm shown with box- plot. The lower quartile, median, and upper quartile values of each box shows the distribution of the esti- mation error of all the objects in each observed frame. (a)estimation error of the camera increases when explor- ing(frame 1 to frame 450) and decreases when close- loop(frame 450 to frame 600) (b)estimation error of the static features decreases with numbers of observed frame (c)estimation error of the moving features decreases with numbers of observed frame. 21
  • 33. 4.1 EXPERIMENTAL RESULTS 4.1.2. Real Experiments Figure 4.3. The NTU PAL7 robot. Real data experiment platform of monocular SLAM with generalized object. A real experiment with 1793 loop closing image sequence was run and evaluated. Figure. 4.3 shows the robotic platform, NTU-PAL7, in which a Point Grey Dragonfly2 wide-angle camera was used to collect image data with 13 frames per second, and a SICK LMS-100 laser scanner was used for ground truthing. The field of view of the camera is 79.48 degree. The resolution of the images is 640 × 480. The experiment were conducted in Department of Computer Science and Information Engineering (CSIE), Na- tional Taiwan University (NTU). Figure. 4.4 shows the basement (15.2 × 11.3 meters) in which we verified the overall performance of SLAM with generalized object such as loop closing, classification and tracking. During the experiments, a person moved around in the environment and appeared 3 times in front of the camera. In the experiments, there are 107 static features and 12 moving features. Each time the moving person appeared, 4 features on the person are gener- ated and initialized. Table 4.2 shows the performance of our classification 22
  • 34. 4.1 EXPERIMENTAL RESULTS 15 Wall 2 ↓ 10 Desks 2 ↓ ← Desks 1 5 Wall 3 → Z[m] Checker ↑ ← Wall 1 0 Board Desks 3 → −5 Wall 4 ↑ −10 20 15 10 5 0 −5 −10 −15 −20 X[m] Figure 4.4. The basement of the CSIE department at NTU. Green/grey dots show the map built using the laser scanner. algorithm. None of the feature is wrongly classified. 107 static features in the environment are all classified correctly as static. And 12 moving features are also classified correctly as moving. classification state static moving unknown Static 107 0 0 Moving 0 12 0 Table 4.2. Total classification result of real experiment 23
  • 35. 4.1 EXPERIMENTAL RESULTS The estimation with time is shown in Figure. 4.5. We estimated the trajectory of the camera, built the map and estimated the moving object successfully using our proposed algorithm. Figure. 4.5 (f), (j) and (n) shows the estimation of the moving person. Clearly, the algorithm did not fail while the image sequence was captured from a dynamic environment. (a) Frame 10. In the beginning of our SLAM (b) Top view: Frame 10. Squares indicate the with generalized object algorithm, each fea- stationary features and blue shadows indi- ture are assumed static and added into the cates the 95% acceptance regions of the es- state vector. The ellipses show the projected timates. The possible location of the features 2σ bounds of the features. have not converged. (c) Frame 220. The robot start to explore the (d) Top view: Frame 220. The possible loca- environment. tion of the features start to converged. 24
  • 36. 4.1 EXPERIMENTAL RESULTS (e) Frame 330. The person appeared in front (f) Top view: Frame 330. Red shadows indi- of the camera the first time. 4 feature are lo- cates the 95% acceptance regions of the mov- cated and initialized in the state vector. Red ing features. ellipses show the projected 2σ bounds of the moving features. Note that some static fea- tures are occluded by the person. Cyan el- lipses show the projected 2σ bounds of the non-associated features. (g) Frame 730 (h) Top view: Frame 730 25
  • 37. 4.1 EXPERIMENTAL RESULTS (i) Frame 950. The person appeared in front (j) Top view: Frame 950. Green shad- of the camera the second time. And as the ows indicates the 95% acceptance regions of robot exploring the environment, some new the newly initialized features with unknown feature would be added as unknown feature. state Green ellipses show the projected 2σ bounds of those newly initialized features with un- known state. (k) Frame 1260 (l) Top view: Frame 1260 26
  • 38. 4.1 EXPERIMENTAL RESULTS (m) Frame 1350. The person appeared in (n) Top view: Frame 1350 front of the camera the third time. (o) Frame 1560 (p) Top view: Frame 1560 Figure 4.5. The image sequence collected in the basement and the corresponding monocular SLAMMOT results. Figures 4.5(a), 4.5(c), 4.5(e), 4.5(g), 4.5(i), 4.5(k), 4.5(m), and 4.5(o) show the results of feature extraction and association. Figures 4.5(b), 4.5(d), 4.5(f), 4.5(h), 4.5(j), 4.5(l), 4.5(n), and 4.5(p) show the monocular SLAM with generalized object results in which black and grey triangles and lines indicate the camera poses and trajectories from monoc- ular SLAM with generalized object and LIDAR-based SLAMMOT. Gray points show the occupancy grid map from LIDAR-based SLAM. All the estimation of visual features are inside the reasonable cube. 27
  • 39. 4.1 EXPERIMENTAL RESULTS The final map is shown in Figure. 4.6 in both top view and side view. Compared with the gray map built with laser-SLAMMOT algorithm, the map built with our monocular SLAM with generalized object algorithm is closed to the location of the environments. All the estimated features are located within a reasonable cube. No estimation of features goes outside the reasonable range. 28
  • 40. 4.1 EXPERIMENTAL RESULTS (a) Top view: Frame 1690 (b) Side view: Frame 1690 Figure 4.6. The result of the SLAM part of monocular SLAM with generalized object. The definitions of symbols are the same as Figure. 4.5. There are 107 stationary features in the state vector of monocular SLAM with generalized object. 29
  • 41. 5.1 CONCLUSION AND FUTURE WORK CHAPTER 5 CONCLUSION AND FUTURE WORK 5.1. CONCLUSION AND FUTURE WORK We have illustrated the procedures of our proposed algorithm, includ- ing the state vector definition, the motion model for moving objects, feature initialization and the classification algorithm. The proposed dynamic in- verse depth parametrization achieves the undelayed feature initialization. The parametrization is able to encode both static features and moving fea- tures. The algorithm benefits from the undelayed initialization and thus have a better estimation about the camera and both static features and mov- ing features. Also, the parametrization provides a way for tracking a mov- ing feature. Both the location and the velocity information are estimated inside the state vector. Further, the little computation cost classification al- gorithm provides the feasibility for real-time SLAM in dynamic environ- ments. For dealing with more dynamic environments, applying the Constant Acceleration Model would be a possible solution for dealing with high- degree motion pattern objects. We plan to investigate the tracking perfor- mance for the moving objects with more complicated motion pattern in the future. Also, approaches for dealing with move-stop-move objects will be a 30
  • 42. 5.1 CONCLUSION AND FUTURE WORK further research interest. In addition, SLAM with generalized object using a stereo camera could be further studied. 31
  • 43. BIBLIOGRAPHY BIBLIOGRAPHY Civera, J., Davison, A. J., & Montiel, J. M. M. (2008). Inverse depth para- metrization for monocular SLAM. IEEE Transactions on Robotics, 24(5), 932–945. Davison, A. J., Reid, I. D., Molton, N. D., & Stasse, O. (2007). Monoslam: Real-time single camera slam. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 1052–1067. Hartley, R. & Zisserman, A. (2004). Multiple View Geometry in Computer Vi- sion. Cambridge University Press. Lemaire, T., Berger, C., Jung, I.-K., & Lacroix, S. (2007). Vision-based slam: Stereo and monocular approaches. International Journal of Computer Vision, 74(3), 343–364. Migliore, D., Rigamonti, R., Marzorati, D., Matteucci, M., & Sorrenti, D. G. (2009). Use a single camera for simultaneous localization and mapping with mobile object tracking in dynamic environments. In ICRA Work- shop on Safe navigation in open and dynamic environments: Application to au- tonomous vehicles. Montiel, J. M. M., Civera, J., & Davison, A. J. (2006). Unified inverse depth parametrization for monocular slam. In Robotics: Science and Systems, Philadelphia, USA. 32
  • 44. BIBLIOGRAPHY Parsley, M. P. & Julier, S. J. (2008). Avoiding negative depth in inverse depth bearing-only SLAM. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (pp. 2066–2071)., Nice, France. Sola, J. (2007). Towards Visual Localization, Mapping and Moving Objects Track- ing by a Mobile Robot: a Geometric and Probabilistic Approach. PhD thesis, Institut National Polytechnique de Toulouse. Vidal-Calleja, T., Bryson, M., Sukkarieh, S., Sanfeliu, A., & Andrade-Cetto, J. (2007). On the observability of bearing-only slam. In IEEE International Conference on Robotics and Automation (ICRA), (pp. 4114–4119)., Roma, Italy. Wangsiripitak, S. & Murray, D. W. (2009). Avoiding moving outliers in vi- sual slam by tracking moving objects. In IEEE International Conference on Robotics and Automation (ICRA), (pp. 375–380)., Kobe, Japan. 33
  • 45. BIBLIOGRAPHY Document Log: Manuscript Version 1 — 19 July 2010 Typeset by AMS-L TEX — 19 August 2010 A C HEN -H AN H SIAO T HE R OBOT P ERCEPTION AND L EARNING L AB ., D EPARTMENT OF C OMPUTER S CI - ENCE AND I NFORMATION E NGINEERING , N ATIONAL TAIWAN U NIVERSITY, N O .1, S EC . 4, R OOSEVELT R D ., D A - AN D ISTRICT, TAIPEI C ITY, 106, TAIWAN , Tel. : (+886) 2-3366- 4888 EXT.407 E-mail address: r97922120@ntu.edu.tw Typeset by AMS-L TEX A 34