DeepVO - Towards Visual Odometry with Deep Learning

•

5 likes•1,064 views

Author: Sen Wang1,2, Ronald Clark2, Hongkai Wen2 and Niki Trigoni2 1. Edinburgh Centre for Robotics, Heriot-Watt University, UK 2. University of Oxford, UK Download this paper: http://senwang.gitlab.io/DeepVO/#paper Watch video: http://senwang.gitlab.io/DeepVO/#video

Engineering

DeepVO
Towards End-to-End Visual Odometry with Deep
Recurrent Convolutional Neural Networks
National Chung Cheng University, Taiwan
Robot Vision Laboratory
2017/11/08
Jacky Liu

About this work
DeepVO : Towards Visual Odometry with Deep Learning
Sen Wang1,2, Ronald Clark2, Hongkai Wen2 and Niki Trigoni2
1. Edinburgh Centre for Robotics, Heriot-Watt University, UK
2. University of Oxford, UK
Download this paper: http://senwang.gitlab.io/DeepVO/#paper
Watch video: http://senwang.gitlab.io/DeepVO/#video
2
DeepVO : Towards Visual Odometry with Deep Learning

Contributions
1. Proving that
Monocular VO could
be build by End-to-
End training
2. RCNN architecture
could generalized to
unseen environment
3. Complex movement
could be modeled by
RCNN
3
DeepVO : Towards Visual Odometry with Deep Learning

Related works
4
Visual odometry
Geometric
Sparse Direct
Learning

Related works
Sparse
 PTAM
 ORB-SLAM
Direct
 DTAM
5
Network
 CNN
 RNN
 LSTM

Network design
1. Traditional computer vision learn knowledge from
appearance and image context
2. Visual odometry should learn from geometry.
This is what RCNN tried to address
6
DeepVO : Towards Visual Odometry with Deep Learning

Network design
7
DeepVO : Towards Visual Odometry with Deep Learning

8
DeepVO : Towards Visual Odometry with Deep Learning

Preprocessing
 Normalizing inputs (speed up training)
=> subtracting the mean RGB values of the
training set
 Resize image to 64x
 Stack two images to form a tensor
9
DeepVO : Towards Visual Odometry with Deep Learning

CNN
 What this research mean by learning
“geometric” feature?
=> They stacking two RGB images and feed it
into CNN. Expecting the network to perform
feature extraction on the concatenation of
two consecutive monocular RGB images.
10
DeepVO : Towards Visual Odometry with Deep Learning

RNN
 RNN is not suitable to directly learn sequential
representation from high-dimensional raw
data, such as images.
 Hidden state:
ℎ 𝑘 = ℋ 𝑊𝑥ℎ 𝑥 𝑘 + 𝑊ℎℎℎ 𝑘−1 + 𝑏ℎ
 Output:
𝑦 𝑘 = 𝑊ℎ𝑦ℎ 𝑘 + 𝑏 𝑦
11
DeepVO : Towards Visual Odometry with Deep Learning
𝑏: bias vector𝑊: weight matrix
𝑘: time index ℋ: activation function
Vanishing gradient
problem

LSTM (Long short-term memory)
12
DeepVO : Towards Visual Odometry with Deep Learning
Need depth to
learn high level
representation

13
DeepVO : Towards Visual Odometry with Deep Learning

14
Cost function
𝜃∗
= argmin
𝜃
1
𝑁
෍
𝑖=1
𝑁
෍
𝑘=1
𝑡
Ƹ𝑝 𝑘 − 𝑝 𝑘 2
2
+ 𝜘 ො𝜑 𝑘 − 𝜑 𝑘 2
2
Conditional probability of pose
𝑝 𝑌𝑡 𝑋𝑡 = 𝑝(𝑦1, … , 𝑦𝑡|𝑥1, … , 𝑥𝑡)
𝜃∗
= argmin
𝜃
𝑝(𝑌𝑡|𝑋𝑡; 𝜃)
Ground truth pose (𝑝 𝑘, 𝜑 𝑘) = (position, orientation)
𝑠𝑐𝑎𝑙𝑒 𝑓𝑎𝑐𝑡𝑜𝑟

Training & testing
1. Dataset: KITTI VO/SLAM benchmark
(22 sequences of images / 10fps / dynamic object)
2. 7410 training samples (image and trajectory pair)
3. Implemented based on Theano
4. Hardware: Nvidia Tesla K40 GPU
5. 200 epochs
6. Learning rate 0.001
7. Regularization: dropout / early stopping
8. CNN: transfer learning from FlowNet
16

overfitting
 Orientation is more
prone to overfitting
17
DeepVO : Towards Visual Odometry with Deep Learning

Compare with
traditional VO
 Open-source VO library
LIBVISO2
 Monocular / Stereo
18
DeepVO : Towards Visual Odometry with Deep Learning

Trajectory (1/2)
19
DeepVO : Towards Visual Odometry with Deep Learning

Trajectory (2/2)
 No ground truth:
Seq11~19
20
DeepVO : Towards Visual Odometry with Deep Learning

21
DeepVO : Towards Visual Odometry with Deep Learning

Dynamic
 This research don’t
know how to deal
with this issue
 Traditional VO –
RANSAC (remove
outlier)
 Get more training
data
22
DeepVO : Towards Visual Odometry with Deep Learning

Conclusion
23
 End-to-end monocular VO based on Deep learning
 Deep RCNN
 No need to carefully tune the parameters of the
VO system
 It is not expected as a replacement to the classic
geometry based approach

What's hot

Visual Object Tracking: review

Dmytro Mishkin

Yolo

NEHA Kapoor

Moving object detection

Raviraj singh shekhawat

Object detection

Jksuryawanshi

Object detection is an important computer vision technique with applications in several domains such as autonomous driving, personal and industrial robotics. The below slides cover the history of object detection from before deep learning until recent research. The slides aim to cover the history and future directions of object detection, as well as some guidelines for how to choose which type of object detector to use for your own project.

A Brief History of Object Detection / Tommi Kerola

Preferred Networks

Ray tracing

Muhammad Azam

You Only Look Once: Unified, Real-Time Object Detection

DADAJONJURAKUZIEV

Anchor free object detection by deep learning

Yu Huang

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.

Object detection

ROUSHAN RAJ KUMAR

Object tracking

Sri vidhya k

Introduction to object detection

Brodmann17

DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group

Lihang Li

Deep learning for object detection

Wenjing Chen

[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection

Taegyun Jeon

You only look once

Gin Kyeng Lee

YOLOv4: optimal speed and accuracy of object detection review

LEE HOSEONG

[解説スライド] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Kento Doi

Introduction to object detection

Amar Jindal

제 PR12 첫번째 발표 논문은 FlowNet이라는 논문입니다. Optical Flow는 비디오의 인접한 Frame에 대하여 각 Pixel이 첫 번째 Frame에서 두 번째 Frame으로 얼마나 이동했는지의 Vector를 모든 위치에 대하여 나타낸 Map입니다. Video에 Motion을 분석하는 일은 매우 중요하기 때문에, 이러한 Optical Flow 역시 굉장히 중요한 요소 중 하나인데요, 이번 영상에서는 고전적인 Computer Vision에서 쓰였던 다양한 Optical Flow 알고리즘들과, Deep Learning Based로 Optical Flow를 구하는 Neural Network인 FlowNet에 대하여 알아보겠습니다. 감사합니다!! 영상 링크: https://youtu.be/Z_t0shK98pM 논문 링크: http://openaccess.thecvf.com/content_iccv_2015/html/Dosovitskiy_FlowNet_Learning_Optical_ICCV_2015_paper.html

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

Hyeongmin Lee

Lec14 multiview stereo

BaliThorat1

What's hot (20)

Visual Object Tracking: review

Yolo

Moving object detection

Object detection

A Brief History of Object Detection / Tommi Kerola

Ray tracing

You Only Look Once: Unified, Real-Time Object Detection

Anchor free object detection by deep learning

Object detection

Object tracking

Introduction to object detection

DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group

Deep learning for object detection

[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection

You only look once

YOLOv4: optimal speed and accuracy of object detection review

[解説スライド] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Introduction to object detection

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

Lec14 multiview stereo

Similar to DeepVO - Towards Visual Odometry with Deep Learning

(Research Note) Delving deeper into convolutional neural networks for camera ...

Jacky Liu

https://imatge.upc.edu/web/publications/video-saliency-prediction-deep-neural-networks Saliency prediction is a topic undergoing intense study in computer vision with a broad range of applications. It consists in predicting where the attention is going to be received in an image or a video by a human. Our work is based on a deep neural network named SalGAN, which was trained on a saliency annotated dataset of static images. In this thesis we investigate different approaches for extending SalGAN to the video domain. To this end, we investigate the recently proposed saliency annotated video dataset DHF1K to train and evaluate our models. The obtained results indicate that techniques such as depth estimation or coordconv can effectively be used as additional modalities to enhance the saliency prediction of static images obtained with SalGAN, achieving encouraging results in the DHF1K benchmark. Our work is based on pytorch and it is publicly available here.

Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019

Universitat Politècnica de Catalunya

Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Fadwa Fouad

Deep Learning Hardware: Past, Present, & Future

Rouyun Pan

Human motion is fundamental to understanding behaviour. In spite of advancement on single image 3 Dimensional pose and estimation of shapes, current video-based state of the art methods unsuccessful to produce precise and motion of natural sequences due to inefficiency of ground-truth 3 Dimensional motion data for training. Recognition of Human action for programmed video surveillance applications is an interesting but forbidding task especially if the videos are captured in an unpleasant lighting environment. It is a Spatial-temporal feature-based correlation filter, for concurrent observation and identification of numerous human actions in a little-light environment. Estimated the presentation of a proposed filter with immense experimentation on night-time action datasets. Tentative results demonstrate the potency of the merging schemes for vigorous action recognition in a significantly low light environment.

Review of Pose Recognition Systems

vivatechijri

В докладе представлена тема глубокого обучения (Deep Learning) для распознавания изображений. Рассматриваются практические аспекты обучения глубоких сверточных сетей на GPU, обсуждается личный опыт портирования обученных нейросетей в приложение на основе библиотеки OpenCV, проводится сравнение полученного детектора домашних животных на основе подхода Lazy Deep Learning с детектором Виолы-Джонса. Докладчики: Артем Чернодуб – эксперт в области искусственных нейронных сетей и систем искусственного интеллекта. В 2007 году закончил Московский физико-технический институт. Руководит направлением Computer Vision в компании ZZ Wolf, а также по совместительству работает научным сотрудником в Институте проблем математических машин и систем НАНУ. Юрий Пащенко – специалист в области систем машинного зрения и машинного обучения, магистр НТУУ «Киевский Политехнический Институт», факультет прикладной математики (2014). Работает в компании ZZ Wolf на должности R&D Engineer.

Details of Lazy Deep Learning for Images Recognition in ZZ Photo app

PAY2 YOU

Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...

Luba Elliott

In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...

CSCJournals

Deep Learning R Vignette Documentation: https://github.com/0xdata/h2o/tree/master/docs/deeplearning/ Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of enterprise-scale problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization and optimization for class imbalance. World record performance on the classic MNIST dataset, best-in-class accuracy for eBay text classification and others showcase the power of this game changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core. About the Speaker: Arno Candel Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world's largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

H2O Distributed Deep Learning by Arno Candel 071614

Sri Ambati

Iciap 2

Ionut Mironica

Human Action Recognition Based on Spacio-temporal features-Poster

nikhilus85

Sparse representation based human action recognition using an action region-a...

Wesley De Neve

Action Genome: Action As Composition of Spatio Temporal Scene Graphs

Sangmin Woo

Sybian Technologies Pvt Ltd Final Year Projects & Real Time live Projects JAVA(All Domains) DOTNET(All Domains) ANDROID EMBEDDED VLSI MATLAB Project Support Abstract, Diagrams, Review Details, Relevant Materials, Presentation, Supporting Documents, Software E-Books, Software Development Standards & Procedure E-Book, Theory Classes, Lab Working Programs, Project Design & Implementation 24/7 lab session Final Year Projects For BE,ME,B.Sc,M.Sc,B.Tech,BCA,MCA PROJECT DOMAIN: Cloud Computing Networking Network Security PARALLEL AND DISTRIBUTED SYSTEM Data Mining Mobile Computing Service Computing Software Engineering Image Processing Bio Medical / Medical Imaging Contact Details: Sybian Technologies Pvt Ltd, No,33/10 Meenakshi Sundaram Building, Sivaji Street, (Near T.nagar Bus Terminus) T.Nagar, Chennai-600 017 Ph:044 42070551 Mobile No:9790877889,9003254624,7708845605 Mail Id:sybianprojects@gmail.com,sunbeamvijay@yahoo.com

Exploring visual and motion saliency for automatic video object extraction

Muthu Samy

Exploring visual and motion saliency for automatic video object extraction

Muthu Samy

Sub-sampled dictionaries for coarse-to-fine sparse representation-based human...

Wesley De Neve

lec_11_self_supervised_learning.pdf

AlamgirAkash3

Particle filter framework for salient object detection in videos

Projectsatbangalore

最近の研究情勢についていくために - Deep Learningを中心に -

Hiroshi Fukui

Developing an inexpensive optical measurement device to estimate ocular media density. Selection of HDR camera, deep learning software stack for Purkinje image detection and segmentation, NVIDIA Jetson as the embedded computer triggering either LED or LASER lights to be projected on the cornea Alternative download link: https://www.dropbox.com/s/fh7r8szuc2pctfr/purkinje_imaging_inPractice.pdf?dl=0

Multispectral Purkinje Imaging

PetteriTeikariPhD

Similar to DeepVO - Towards Visual Odometry with Deep Learning (20)

(Research Note) Delving deeper into convolutional neural networks for camera ...

Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019

Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Deep Learning Hardware: Past, Present, & Future

Review of Pose Recognition Systems

Details of Lazy Deep Learning for Images Recognition in ZZ Photo app

Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...

H2O Distributed Deep Learning by Arno Candel 071614

Iciap 2

Human Action Recognition Based on Spacio-temporal features-Poster

Sparse representation based human action recognition using an action region-a...

Action Genome: Action As Composition of Spatio Temporal Scene Graphs

Exploring visual and motion saliency for automatic video object extraction

Sub-sampled dictionaries for coarse-to-fine sparse representation-based human...

lec_11_self_supervised_learning.pdf

Particle filter framework for salient object detection in videos

最近の研究情勢についていくために - Deep Learningを中心に -

Multispectral Purkinje Imaging

Recently uploaded

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

ssuser89054b

Wadi Rum luxhotel lodge Analysis case study.pptx

NadaHaitham1

HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx

SCMS School of Architecture

Model Call Girl Services in Delhi reach out to us at 🔝 9953056974🔝✔️✔️ Our agency presents a selection of young, charming call girls available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives, College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in- call and out-call services for your convenience. Our in-call location in Delhi ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500 /in-call, ₱6000/out-call Body to body massage with sex: ₱3000/in-call Full night for one person: ₱7000/in-call, ₱10000/out-call Full night for more than 1 person : Contact us at 🔝 9953056974🔝. for details Operating 24/7, we serve various locations in Delhi, including Green Park, Lajpat Nagar, Saket, and Hauz Khas near metro stations. For premium call girl services in Delhi 🔝 9953056974🔝. Thank you for considering us

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7

9953056974 Low Rate Call Girls In Saket, Delhi NCR

Built environment is known for its capacity, capability, role, relevance and importance to change the quality of life of the occupants and communities. Presentation focuses on options which need to be leveraged to make buildings sustainable, cost-effective, energy efficient, resource efficient, qualitative over its entire life-cycle through designing, construction, operation. It calls for making buildings green and sustainable.

COST-EFFETIVE and Energy Efficient BUILDINGS ptx

JIT KUMAR GUPTA

Moment Distribution Method For Btech Civil

VinayVitekari

Thermal Engineering-R & A / C - unit - V

DineshKumar4165

Design For Accessibility: Getting it right from the start

Quintin Balsdon

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service

meghakumariji156

Education system forms the backbone of every nation. And hence it is important to provide a strong educational foundation to the young generation to ensure the development of open-minded global citizens securing the future for everyone. Advanced technology available today can play a crucial role in streamlining education-related processes to promote solidarity among students, teachers and the school staff. School Management System(SMS) consists of tasks such as registering students, attendance record keeping to control absentees, producing report cards, producing official transcript, preparing timetable and producing different reports for teachers, officials from Dr.Mohiuddin Education foundation and other stakeholders. Automation is the utilization of technology to replace human with a machine that can perform more quickly and more continuously. By automating SMS documents that took up many large storage rooms can be stored on few disks. Transcript images can be annotate. It reduces the time to retrieve old transcripts from hours to seconds.

School management system project Report.pdf

Kamal Acharya

Unleashing the Power of the SORA AI lastest leap

RishantSharmaFr

GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE

selvakumar948

A Study of Urban Area Plan for Pabna Municipality

Morshed Ahmed Rahath

Double Revolving field theory-how the rotor develops torque

BhangaleSonal

LECTURE 01 Introduction to Computers Computers in Society Components of a Computer Types of Computers Definition: “A computer is an electronic device that manipulates information, or data. It has the ability to store, retrieve, and process data” The word "computer" is derived from the Latin word "computare," which means "to calculate" Computers are used for Businesses Communication Entertainment Education Medical Field

Computer Lecture 01.pptxIntroduction to Computers

MairaAshraf6

Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...

drmkjayanthikannan

AIRCANVAS[1].pdf mini project for btech students

vanyagupta248

Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx

MuhammadAsimMuhammad6

“HOSTEL MANAGEMENT SYSTEM” is a software developed for managing various activities in the hostel. For the past few years the number of educational institutions are increasing rapidly. Thereby the number of hostels are also increasing for the accommodation of the students studying in this institution. And hence there is a lot of strain on the person who are running the hostel and software’s are not usually used in this context. This particular project deals with the problems on managing a hostel and avoids the problems which occur when carried manually.

Hostel management system project report..pdf

Kamal Acharya

STEAM NOZZLES AND TURBINES Flow of steam through nozzles, shapes of nozzles, effect of friction, critical pressure ratio, supersaturated flow - impulse and reaction principles, velocity diagram, work done and efficiency – types of compounding - governors. AIR COMPRESSORS Classification - working principle - type of compressors, work of compression with and without clearance - volumetric efficiency - isothermal and isentropic efficiency of reciprocating compressors - multistage air compressor with inter cooling.

Thermal Engineering -unit - III & IV.ppt

DineshKumar4165

Recently uploaded (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Wadi Rum luxhotel lodge Analysis case study.pptx

HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7

COST-EFFETIVE and Energy Efficient BUILDINGS ptx

Moment Distribution Method For Btech Civil

Thermal Engineering-R & A / C - unit - V

Design For Accessibility: Getting it right from the start

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service

School management system project Report.pdf

Unleashing the Power of the SORA AI lastest leap

GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE

A Study of Urban Area Plan for Pabna Municipality

Double Revolving field theory-how the rotor develops torque

Computer Lecture 01.pptxIntroduction to Computers

Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...

AIRCANVAS[1].pdf mini project for btech students

Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx

Hostel management system project report..pdf

Thermal Engineering -unit - III & IV.ppt

DeepVO - Towards Visual Odometry with Deep Learning

1. DeepVO Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks National Chung Cheng University, Taiwan Robot Vision Laboratory 2017/11/08 Jacky Liu

2. About this work DeepVO : Towards Visual Odometry with Deep Learning Sen Wang1,2, Ronald Clark2, Hongkai Wen2 and Niki Trigoni2 1. Edinburgh Centre for Robotics, Heriot-Watt University, UK 2. University of Oxford, UK Download this paper: http://senwang.gitlab.io/DeepVO/#paper Watch video: http://senwang.gitlab.io/DeepVO/#video 2 DeepVO : Towards Visual Odometry with Deep Learning

3. Contributions 1. Proving that Monocular VO could be build by End-to- End training 2. RCNN architecture could generalized to unseen environment 3. Complex movement could be modeled by RCNN 3 DeepVO : Towards Visual Odometry with Deep Learning

5. Related works Sparse  PTAM  ORB-SLAM Direct  DTAM 5 Network  CNN  RNN  LSTM

6. Network design 1. Traditional computer vision learn knowledge from appearance and image context 2. Visual odometry should learn from geometry. This is what RCNN tried to address 6 DeepVO : Towards Visual Odometry with Deep Learning

7. Network design 7 DeepVO : Towards Visual Odometry with Deep Learning

8. 8 DeepVO : Towards Visual Odometry with Deep Learning

9. Preprocessing  Normalizing inputs (speed up training) => subtracting the mean RGB values of the training set  Resize image to 64x  Stack two images to form a tensor 9 DeepVO : Towards Visual Odometry with Deep Learning

10. CNN  What this research mean by learning “geometric” feature? => They stacking two RGB images and feed it into CNN. Expecting the network to perform feature extraction on the concatenation of two consecutive monocular RGB images. 10 DeepVO : Towards Visual Odometry with Deep Learning

11. RNN  RNN is not suitable to directly learn sequential representation from high-dimensional raw data, such as images.  Hidden state: ℎ 𝑘 = ℋ 𝑊𝑥ℎ 𝑥 𝑘 + 𝑊ℎℎℎ 𝑘−1 + 𝑏ℎ  Output: 𝑦 𝑘 = 𝑊ℎ𝑦ℎ 𝑘 + 𝑏 𝑦 11 DeepVO : Towards Visual Odometry with Deep Learning 𝑏: bias vector𝑊: weight matrix 𝑘: time index ℋ: activation function Vanishing gradient problem

12. LSTM (Long short-term memory) 12 DeepVO : Towards Visual Odometry with Deep Learning Need depth to learn high level representation

13. 13 DeepVO : Towards Visual Odometry with Deep Learning

14. 14 Cost function 𝜃∗ = argmin 𝜃 1 𝑁 ෍ 𝑖=1 𝑁 ෍ 𝑘=1 𝑡 Ƹ𝑝 𝑘 − 𝑝 𝑘 2 2 + 𝜘 ො𝜑 𝑘 − 𝜑 𝑘 2 2 Conditional probability of pose 𝑝 𝑌𝑡 𝑋𝑡 = 𝑝(𝑦1, … , 𝑦𝑡|𝑥1, … , 𝑥𝑡) 𝜃∗ = argmin 𝜃 𝑝(𝑌𝑡|𝑋𝑡; 𝜃) Ground truth pose (𝑝 𝑘, 𝜑 𝑘) = (position, orientation) 𝑠𝑐𝑎𝑙𝑒 𝑓𝑎𝑐𝑡𝑜𝑟

15. Experimental results DeepVO VISO2 15

16. Training & testing 1. Dataset: KITTI VO/SLAM benchmark (22 sequences of images / 10fps / dynamic object) 2. 7410 training samples (image and trajectory pair) 3. Implemented based on Theano 4. Hardware: Nvidia Tesla K40 GPU 5. 200 epochs 6. Learning rate 0.001 7. Regularization: dropout / early stopping 8. CNN: transfer learning from FlowNet 16

17. overfitting  Orientation is more prone to overfitting 17 DeepVO : Towards Visual Odometry with Deep Learning

18. Compare with traditional VO  Open-source VO library LIBVISO2  Monocular / Stereo 18 DeepVO : Towards Visual Odometry with Deep Learning

19. Trajectory (1/2) 19 DeepVO : Towards Visual Odometry with Deep Learning

20. Trajectory (2/2)  No ground truth: Seq11~19 20 DeepVO : Towards Visual Odometry with Deep Learning

21. 21 DeepVO : Towards Visual Odometry with Deep Learning

22. Dynamic  This research don’t know how to deal with this issue  Traditional VO – RANSAC (remove outlier)  Get more training data 22 DeepVO : Towards Visual Odometry with Deep Learning

23. Conclusion 23  End-to-end monocular VO based on Deep learning  Deep RCNN  No need to carefully tune the parameters of the VO system  It is not expected as a replacement to the classic geometry based approach

DeepVO - Towards Visual Odometry with Deep Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DeepVO - Towards Visual Odometry with Deep Learning

Similar to DeepVO - Towards Visual Odometry with Deep Learning (20)

Recently uploaded

Recently uploaded (20)

DeepVO - Towards Visual Odometry with Deep Learning