SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
PRM-RL: Long-range Robotics Navigation
Tasks by Combining Reinforcement
Learning and Sampling-based Planning
IEEE International Conference on Robotics and Automation (ICRA), 2018
Best Paper Award in Service Robotics
Aleksandra Faust et al.
Google Brain Robotics
Presented by Dongmin Lee
December 1, 2019
Outline
• Abstract
• Introduction
• Reinforcement Learning
• Methods
• Results
1
Abstract
2
PRM-RL (Probabilistic Roadmap-Reinforcement Learning):
• A hierarchical method for long-range navigation task
• Combines sampling-based path planning with RL
• Uses feature-based and deep neural net policies (DDPG) in continuous
state and action spaces
Experiments: simulation and robot on two navigation tasks (end-to-end)
• Indoor (drive) navigation in office environments - selected
• Aerial cargo delivery in urban environments
Abstract
3
PRM-RL (Probabilistic Roadmap-Reinforcement Learning):
• A hierarchical method for long-range navigation task
• Combines sampling-based path planning with RL
• Uses feature-based and deep neural net policies (DDPG) in continuous
state and action spaces
Experiments: simulation and robot on two navigation tasks (end-to-end)
• Indoor (drive) navigation in office environments - selected
• Aerial cargo delivery in urban environments
Introduction
4
PRM-RL YouTube video
• https://bit.ly/34zCTmd
Traditional Motion Planning (or Path Planning)
• CS287 Advanced Robotics (Fall 2019), Lecture 9: Motion Planning
• https://people.eecs.berkeley.edu/~pabbeel/cs287-fa19/slides/Lec10-
motion-planning.pdf
Probabilistic Roadmap (PRM) YouTube video
• https://bit.ly/34rRKz0
• https://bit.ly/35Nb61Q
Rapidly-exploring Random Tree* (RRT*) YouTube video
• https://bit.ly/2OXiocb
• https://bit.ly/2OQbUvM
5
RL provides a formalism for behaviors
• Problem of a goal-directed agent interacting with an uncertain environment
• Interaction à adaptation
feedback & decision
Reinforcement Learning
6
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals
• Safety / reliability
• Simulator
Reinforcement Learning
7
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals due to long-range navigation
• Safety / reliability
• Simulator
Reinforcement Learning
8
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals due to long-range navigation
à Solve with hierarchical waypoints
• Safety / reliability
• Simulator
Reinforcement Learning
9
So, What’s the advantage of PRM-RL than traditional methods?
• In PRM-RL, an RL agent is trained to execute a local point-to-point task
without knowledge of the topology, learning the task constraints.
• The PRM-RL builds a roadmap using the RL agent instead of the traditional
collision-free straight-line planner.
• Thus, the resulting long-range navigation planner combines the planning
efficiency of a PRM with the robustness of an RL agent.
Introduction
10
Experiment: environments used for the indoor navigation tasks
Introduction
11
Three stages:
1. RL agent training
2. PRM construction (roadmap creation)
3. PRM-RL querying (roadmap querying)
Methods
12
Methods
1. RL agent training
Definition
• 𝑆: robot’s state space
• 𝑠: start state in state space 𝑆
• 𝑔: goal state in state space 𝑆
• C-space: a space of all possible robot configurations
(e.g., state space 𝑆 is a superset of the C-space)
• C-free: a partition of C-space consisting of only collision-free paths
• 𝐿(𝑠): some task predicate (attribute) to satisfies the task constraints
• 𝑝(𝑠): a state space point’s estimate onto C-space that belong in C-free
The task is completed when the system is sufficiently close to the goal state:
∥ 𝑝 𝑠 − 𝑝 𝑔 ∥ ≤ 𝜖
Our goal is to find a transfer function:
𝑠, = 𝑓(𝑠, 𝑎)
1. RL agent training
Markov Decision Process (MDP):
• 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot
• 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77
(the goal 𝑔 in polar coordinates and LIDAR observations 𝑜)
• 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform
• 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ=
(two-dimensional vector of wheel speeds)
• 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of
a simplified black-box simulator without knowing the full non-linear system dynamics
• 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles.
Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴:
𝜋 𝑠 = 𝑎
Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize
long-term return:
𝜋∗(𝑠) = arg max
J∈K
𝔼 M
NOP
Q
𝛾N 𝑅 𝑠N
13
Methods
14
Methods
1. RL agent training
Markov Decision Process (MDP):
• 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot
• 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77
(the goal 𝑔 in polar coordinates and LIDAR observations 𝑜)
• 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform
• 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ=
(two-dimensional vector of wheel speeds)
• 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of
a simplified black-box simulator without knowing the full non-linear system dynamics
• 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles.
Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴:
𝜋 𝑠 = 𝑎
Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize
long-term return:
𝜋∗(𝑠) = arg max
J∈K
𝔼 M
NOP
Q
𝛾N 𝑅 𝑠N
15
Methods
1. RL agent training
Training with DDPG algorithm for the indoor navigation tasks
16
Methods
2. PRM construction (roadmap creation)
Algorithm 1: connect two nodes using PRM-RL
17
Methods
3. PRM querying (roadmap querying)
Generate long-range trajectories
• We query a roadmap that return a list of waypoints to a higher-level planner.
• The higher-level planner then invokes a RL agent to produce a trajectory to the next
waypoint.
• When the robot is within the waypoint’s goal range, the higher-level planner changes
the goal with the next waypoint in the list.
18
Results
Indoor navigation
1. Roadmap construction evaluation
2. Expected trajectory characteristics
3. Actual trajectory characteristics
4. Physical robot experiments
à Each roadmap is evaluated on 100 randomly generated queries from the C-free.
1. Roadmap construction evaluation
• The higher sampling density produces larger maps and more successful
queries.
• The number of nodes in the map does not depend on the local planner, but
the number of edges and collision checks do.
• Roadmaps built with the RL local planner are more densely connected with 15%
and 50% more edges.
• The RL agent can go around the corners and small obstacles.
19
Results
20
Results
2. Expected trajectory characteristics
• The RL agent does not require the robot to come to rest at the goal region,
therefore the robot experiences some inertia when the waypoint is
switched. This causes some of the failures.
• The PRM-RL paths contain more waypoints except Building 3.
• Expected trajectory length and duration are longer for the RL agent.
21
Results
3. Actual trajectory characteristics
• We look at the query characteristics for successful versus unsuccessful
queries.
• The RL agent produces higher success rate than the PRM-SL.
• The successful trajectories have fewer waypoints than the expected
waypoints, which means that the shorter queries are more likely to succeed.
4. Physical robot experiments
• To transfer of our approach on a real robot, we created a simple slalom-like
environment with four obstacles.
22
Results
23
Results
PRM-RL YouTube video
• https://bit.ly/34zCTmd
Thank You!
Any Questions?

Más contenido relacionado

La actualidad más candente

Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 
Episodic Policy Gradient Training
Episodic Policy Gradient TrainingEpisodic Policy Gradient Training
Episodic Policy Gradient TrainingHung Le
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and ProgrammingMelanie Knight
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Chris Ohk
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Chris Ohk
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic RegressionDong Guo
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learningJie-Han Chen
 
Model Based Episodic Memory
Model Based Episodic MemoryModel Based Episodic Memory
Model Based Episodic MemoryHung Le
 
Recent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy LearningRecent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy LearningSungjoon Choi
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLJie-Han Chen
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoHridyesh Bisht
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratchJie-Han Chen
 

La actualidad más candente (20)

Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Episodic Policy Gradient Training
Episodic Policy Gradient TrainingEpisodic Policy Gradient Training
Episodic Policy Gradient Training
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and Programming
 
Activation function
Activation functionActivation function
Activation function
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Anu 8th sem
Anu 8th semAnu 8th sem
Anu 8th sem
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic Regression
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Model Based Episodic Memory
Model Based Episodic MemoryModel Based Episodic Memory
Model Based Episodic Memory
 
Recent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy LearningRecent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy Learning
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RL
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 

Similar a PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

Reactive Deformation of Path for Navigation Among Dynamic Obstacles
Reactive Deformation of Path for Navigation Among Dynamic ObstaclesReactive Deformation of Path for Navigation Among Dynamic Obstacles
Reactive Deformation of Path for Navigation Among Dynamic ObstaclesAnand Taralika
 
SPLT Transformer.pptx
SPLT Transformer.pptxSPLT Transformer.pptx
SPLT Transformer.pptxSeungeon Baek
 
Multiple UGV SLAM Map Sharing
Multiple UGV SLAM Map SharingMultiple UGV SLAM Map Sharing
Multiple UGV SLAM Map SharingAkash Borate
 
NEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium PosterNEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium PosterBarbara Jean Neal
 
Welch Verolog 2013
Welch Verolog 2013Welch Verolog 2013
Welch Verolog 2013Philip Welch
 
CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...
CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...
CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...Defence and Security Accelerator
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptxSeungeon Baek
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM SeminarDong-Won Shin
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Poolingivaderivader
 
Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Aritra Sarkar
 
Rapid motor adaptation for legged robots
Rapid motor adaptation for legged robotsRapid motor adaptation for legged robots
Rapid motor adaptation for legged robotsRohit Choudhury
 
FastCampus 2018 SLAM Workshop
FastCampus 2018 SLAM WorkshopFastCampus 2018 SLAM Workshop
FastCampus 2018 SLAM WorkshopDong-Won Shin
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigationguest90654fd
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigationguest90654fd
 
Robotics:The computational motion planning:: Sampling based algorithms
Robotics:The computational motion planning:: Sampling based algorithmsRobotics:The computational motion planning:: Sampling based algorithms
Robotics:The computational motion planning:: Sampling based algorithmsPRAVEENTALARI4
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsNAVER Engineering
 
Driving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIDriving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIYu Huang
 

Similar a PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning (20)

Reactive Deformation of Path for Navigation Among Dynamic Obstacles
Reactive Deformation of Path for Navigation Among Dynamic ObstaclesReactive Deformation of Path for Navigation Among Dynamic Obstacles
Reactive Deformation of Path for Navigation Among Dynamic Obstacles
 
Robotics Navigation
Robotics NavigationRobotics Navigation
Robotics Navigation
 
Robotics Localization
Robotics LocalizationRobotics Localization
Robotics Localization
 
SPLT Transformer.pptx
SPLT Transformer.pptxSPLT Transformer.pptx
SPLT Transformer.pptx
 
Multiple UGV SLAM Map Sharing
Multiple UGV SLAM Map SharingMultiple UGV SLAM Map Sharing
Multiple UGV SLAM Map Sharing
 
NEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium PosterNEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium Poster
 
Welch Verolog 2013
Welch Verolog 2013Welch Verolog 2013
Welch Verolog 2013
 
CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...
CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...
CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM Seminar
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
 
Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24
 
20210226 esa-science-coffee-v2.0
20210226 esa-science-coffee-v2.020210226 esa-science-coffee-v2.0
20210226 esa-science-coffee-v2.0
 
Rapid motor adaptation for legged robots
Rapid motor adaptation for legged robotsRapid motor adaptation for legged robots
Rapid motor adaptation for legged robots
 
FastCampus 2018 SLAM Workshop
FastCampus 2018 SLAM WorkshopFastCampus 2018 SLAM Workshop
FastCampus 2018 SLAM Workshop
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
 
Robotics:The computational motion planning:: Sampling based algorithms
Robotics:The computational motion planning:: Sampling based algorithmsRobotics:The computational motion planning:: Sampling based algorithms
Robotics:The computational motion planning:: Sampling based algorithms
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environments
 
Driving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIDriving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VI
 

Más de Dongmin Lee

Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation LearningDongmin Lee
 
Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation LearningDongmin Lee
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
 
Let's do Inverse RL
Let's do Inverse RLLet's do Inverse RL
Let's do Inverse RLDongmin Lee
 
모두를 위한 PG여행 가이드
모두를 위한 PG여행 가이드모두를 위한 PG여행 가이드
모두를 위한 PG여행 가이드Dongmin Lee
 
Safe Reinforcement Learning
Safe Reinforcement LearningSafe Reinforcement Learning
Safe Reinforcement LearningDongmin Lee
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!Dongmin Lee
 
Planning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsPlanning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsDongmin Lee
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2Dongmin Lee
 
강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1Dongmin Lee
 
강화학습의 개요
강화학습의 개요강화학습의 개요
강화학습의 개요Dongmin Lee
 

Más de Dongmin Lee (12)

Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
 
Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
 
Let's do Inverse RL
Let's do Inverse RLLet's do Inverse RL
Let's do Inverse RL
 
모두를 위한 PG여행 가이드
모두를 위한 PG여행 가이드모두를 위한 PG여행 가이드
모두를 위한 PG여행 가이드
 
Safe Reinforcement Learning
Safe Reinforcement LearningSafe Reinforcement Learning
Safe Reinforcement Learning
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!
 
Planning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsPlanning and Learning with Tabular Methods
Planning and Learning with Tabular Methods
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2
 
강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1
 
강화학습의 개요
강화학습의 개요강화학습의 개요
강화학습의 개요
 

Último

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 

Último (20)

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 

PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

  • 1. PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning IEEE International Conference on Robotics and Automation (ICRA), 2018 Best Paper Award in Service Robotics Aleksandra Faust et al. Google Brain Robotics Presented by Dongmin Lee December 1, 2019
  • 2. Outline • Abstract • Introduction • Reinforcement Learning • Methods • Results 1
  • 3. Abstract 2 PRM-RL (Probabilistic Roadmap-Reinforcement Learning): • A hierarchical method for long-range navigation task • Combines sampling-based path planning with RL • Uses feature-based and deep neural net policies (DDPG) in continuous state and action spaces Experiments: simulation and robot on two navigation tasks (end-to-end) • Indoor (drive) navigation in office environments - selected • Aerial cargo delivery in urban environments
  • 4. Abstract 3 PRM-RL (Probabilistic Roadmap-Reinforcement Learning): • A hierarchical method for long-range navigation task • Combines sampling-based path planning with RL • Uses feature-based and deep neural net policies (DDPG) in continuous state and action spaces Experiments: simulation and robot on two navigation tasks (end-to-end) • Indoor (drive) navigation in office environments - selected • Aerial cargo delivery in urban environments
  • 5. Introduction 4 PRM-RL YouTube video • https://bit.ly/34zCTmd Traditional Motion Planning (or Path Planning) • CS287 Advanced Robotics (Fall 2019), Lecture 9: Motion Planning • https://people.eecs.berkeley.edu/~pabbeel/cs287-fa19/slides/Lec10- motion-planning.pdf Probabilistic Roadmap (PRM) YouTube video • https://bit.ly/34rRKz0 • https://bit.ly/35Nb61Q Rapidly-exploring Random Tree* (RRT*) YouTube video • https://bit.ly/2OXiocb • https://bit.ly/2OQbUvM
  • 6. 5 RL provides a formalism for behaviors • Problem of a goal-directed agent interacting with an uncertain environment • Interaction à adaptation feedback & decision Reinforcement Learning
  • 7. 6 What are the challenges of RL? • Huge # of samples: millions • Fast, stable learning • Hyperparameter tuning • Exploration • Sparse reward signals • Safety / reliability • Simulator Reinforcement Learning
  • 8. 7 What are the challenges of RL? • Huge # of samples: millions • Fast, stable learning • Hyperparameter tuning • Exploration • Sparse reward signals due to long-range navigation • Safety / reliability • Simulator Reinforcement Learning
  • 9. 8 What are the challenges of RL? • Huge # of samples: millions • Fast, stable learning • Hyperparameter tuning • Exploration • Sparse reward signals due to long-range navigation à Solve with hierarchical waypoints • Safety / reliability • Simulator Reinforcement Learning
  • 10. 9 So, What’s the advantage of PRM-RL than traditional methods? • In PRM-RL, an RL agent is trained to execute a local point-to-point task without knowledge of the topology, learning the task constraints. • The PRM-RL builds a roadmap using the RL agent instead of the traditional collision-free straight-line planner. • Thus, the resulting long-range navigation planner combines the planning efficiency of a PRM with the robustness of an RL agent. Introduction
  • 11. 10 Experiment: environments used for the indoor navigation tasks Introduction
  • 12. 11 Three stages: 1. RL agent training 2. PRM construction (roadmap creation) 3. PRM-RL querying (roadmap querying) Methods
  • 13. 12 Methods 1. RL agent training Definition • 𝑆: robot’s state space • 𝑠: start state in state space 𝑆 • 𝑔: goal state in state space 𝑆 • C-space: a space of all possible robot configurations (e.g., state space 𝑆 is a superset of the C-space) • C-free: a partition of C-space consisting of only collision-free paths • 𝐿(𝑠): some task predicate (attribute) to satisfies the task constraints • 𝑝(𝑠): a state space point’s estimate onto C-space that belong in C-free The task is completed when the system is sufficiently close to the goal state: ∥ 𝑝 𝑠 − 𝑝 𝑔 ∥ ≤ 𝜖 Our goal is to find a transfer function: 𝑠, = 𝑓(𝑠, 𝑎)
  • 14. 1. RL agent training Markov Decision Process (MDP): • 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot • 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77 (the goal 𝑔 in polar coordinates and LIDAR observations 𝑜) • 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform • 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ= (two-dimensional vector of wheel speeds) • 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of a simplified black-box simulator without knowing the full non-linear system dynamics • 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles. Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴: 𝜋 𝑠 = 𝑎 Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize long-term return: 𝜋∗(𝑠) = arg max J∈K 𝔼 M NOP Q 𝛾N 𝑅 𝑠N 13 Methods
  • 15. 14 Methods 1. RL agent training Markov Decision Process (MDP): • 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot • 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77 (the goal 𝑔 in polar coordinates and LIDAR observations 𝑜) • 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform • 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ= (two-dimensional vector of wheel speeds) • 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of a simplified black-box simulator without knowing the full non-linear system dynamics • 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles. Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴: 𝜋 𝑠 = 𝑎 Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize long-term return: 𝜋∗(𝑠) = arg max J∈K 𝔼 M NOP Q 𝛾N 𝑅 𝑠N
  • 16. 15 Methods 1. RL agent training Training with DDPG algorithm for the indoor navigation tasks
  • 17. 16 Methods 2. PRM construction (roadmap creation) Algorithm 1: connect two nodes using PRM-RL
  • 18. 17 Methods 3. PRM querying (roadmap querying) Generate long-range trajectories • We query a roadmap that return a list of waypoints to a higher-level planner. • The higher-level planner then invokes a RL agent to produce a trajectory to the next waypoint. • When the robot is within the waypoint’s goal range, the higher-level planner changes the goal with the next waypoint in the list.
  • 19. 18 Results Indoor navigation 1. Roadmap construction evaluation 2. Expected trajectory characteristics 3. Actual trajectory characteristics 4. Physical robot experiments à Each roadmap is evaluated on 100 randomly generated queries from the C-free.
  • 20. 1. Roadmap construction evaluation • The higher sampling density produces larger maps and more successful queries. • The number of nodes in the map does not depend on the local planner, but the number of edges and collision checks do. • Roadmaps built with the RL local planner are more densely connected with 15% and 50% more edges. • The RL agent can go around the corners and small obstacles. 19 Results
  • 21. 20 Results 2. Expected trajectory characteristics • The RL agent does not require the robot to come to rest at the goal region, therefore the robot experiences some inertia when the waypoint is switched. This causes some of the failures. • The PRM-RL paths contain more waypoints except Building 3. • Expected trajectory length and duration are longer for the RL agent.
  • 22. 21 Results 3. Actual trajectory characteristics • We look at the query characteristics for successful versus unsuccessful queries. • The RL agent produces higher success rate than the PRM-SL. • The successful trajectories have fewer waypoints than the expected waypoints, which means that the shorter queries are more likely to succeed.
  • 23. 4. Physical robot experiments • To transfer of our approach on a real robot, we created a simple slalom-like environment with four obstacles. 22 Results
  • 24. 23 Results PRM-RL YouTube video • https://bit.ly/34zCTmd