I reviewed the PRM-RL paper.
PRM-RL (Probabilistic Roadmap-Reinforcement Learning) is a hierarchical method that combines sampling-based path planning with RL. It uses feature-based and deep neural net policies (DDPG) in continuous state and action spaces. In experiment, authors evaluate PRM- RL, both in simulation and on-robot, on two navigation tasks: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments.
Outline
- Abstract
- Introduction
- Reinforcement Learning
- Methods
- Results
Thank you.
Processing & Properties of Floor and Wall Tiles.pptx
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning
1. PRM-RL: Long-range Robotics Navigation
Tasks by Combining Reinforcement
Learning and Sampling-based Planning
IEEE International Conference on Robotics and Automation (ICRA), 2018
Best Paper Award in Service Robotics
Aleksandra Faust et al.
Google Brain Robotics
Presented by Dongmin Lee
December 1, 2019
3. Abstract
2
PRM-RL (Probabilistic Roadmap-Reinforcement Learning):
• A hierarchical method for long-range navigation task
• Combines sampling-based path planning with RL
• Uses feature-based and deep neural net policies (DDPG) in continuous
state and action spaces
Experiments: simulation and robot on two navigation tasks (end-to-end)
• Indoor (drive) navigation in office environments - selected
• Aerial cargo delivery in urban environments
4. Abstract
3
PRM-RL (Probabilistic Roadmap-Reinforcement Learning):
• A hierarchical method for long-range navigation task
• Combines sampling-based path planning with RL
• Uses feature-based and deep neural net policies (DDPG) in continuous
state and action spaces
Experiments: simulation and robot on two navigation tasks (end-to-end)
• Indoor (drive) navigation in office environments - selected
• Aerial cargo delivery in urban environments
5. Introduction
4
PRM-RL YouTube video
• https://bit.ly/34zCTmd
Traditional Motion Planning (or Path Planning)
• CS287 Advanced Robotics (Fall 2019), Lecture 9: Motion Planning
• https://people.eecs.berkeley.edu/~pabbeel/cs287-fa19/slides/Lec10-
motion-planning.pdf
Probabilistic Roadmap (PRM) YouTube video
• https://bit.ly/34rRKz0
• https://bit.ly/35Nb61Q
Rapidly-exploring Random Tree* (RRT*) YouTube video
• https://bit.ly/2OXiocb
• https://bit.ly/2OQbUvM
6. 5
RL provides a formalism for behaviors
• Problem of a goal-directed agent interacting with an uncertain environment
• Interaction à adaptation
feedback & decision
Reinforcement Learning
7. 6
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals
• Safety / reliability
• Simulator
Reinforcement Learning
8. 7
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals due to long-range navigation
• Safety / reliability
• Simulator
Reinforcement Learning
9. 8
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals due to long-range navigation
à Solve with hierarchical waypoints
• Safety / reliability
• Simulator
Reinforcement Learning
10. 9
So, What’s the advantage of PRM-RL than traditional methods?
• In PRM-RL, an RL agent is trained to execute a local point-to-point task
without knowledge of the topology, learning the task constraints.
• The PRM-RL builds a roadmap using the RL agent instead of the traditional
collision-free straight-line planner.
• Thus, the resulting long-range navigation planner combines the planning
efficiency of a PRM with the robustness of an RL agent.
Introduction
12. 11
Three stages:
1. RL agent training
2. PRM construction (roadmap creation)
3. PRM-RL querying (roadmap querying)
Methods
13. 12
Methods
1. RL agent training
Definition
• 𝑆: robot’s state space
• 𝑠: start state in state space 𝑆
• 𝑔: goal state in state space 𝑆
• C-space: a space of all possible robot configurations
(e.g., state space 𝑆 is a superset of the C-space)
• C-free: a partition of C-space consisting of only collision-free paths
• 𝐿(𝑠): some task predicate (attribute) to satisfies the task constraints
• 𝑝(𝑠): a state space point’s estimate onto C-space that belong in C-free
The task is completed when the system is sufficiently close to the goal state:
∥ 𝑝 𝑠 − 𝑝 𝑔 ∥ ≤ 𝜖
Our goal is to find a transfer function:
𝑠, = 𝑓(𝑠, 𝑎)
14. 1. RL agent training
Markov Decision Process (MDP):
• 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot
• 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77
(the goal 𝑔 in polar coordinates and LIDAR observations 𝑜)
• 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform
• 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ=
(two-dimensional vector of wheel speeds)
• 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of
a simplified black-box simulator without knowing the full non-linear system dynamics
• 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles.
Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴:
𝜋 𝑠 = 𝑎
Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize
long-term return:
𝜋∗(𝑠) = arg max
J∈K
𝔼 M
NOP
Q
𝛾N 𝑅 𝑠N
13
Methods
15. 14
Methods
1. RL agent training
Markov Decision Process (MDP):
• 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot
• 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77
(the goal 𝑔 in polar coordinates and LIDAR observations 𝑜)
• 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform
• 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ=
(two-dimensional vector of wheel speeds)
• 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of
a simplified black-box simulator without knowing the full non-linear system dynamics
• 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles.
Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴:
𝜋 𝑠 = 𝑎
Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize
long-term return:
𝜋∗(𝑠) = arg max
J∈K
𝔼 M
NOP
Q
𝛾N 𝑅 𝑠N
16. 15
Methods
1. RL agent training
Training with DDPG algorithm for the indoor navigation tasks
18. 17
Methods
3. PRM querying (roadmap querying)
Generate long-range trajectories
• We query a roadmap that return a list of waypoints to a higher-level planner.
• The higher-level planner then invokes a RL agent to produce a trajectory to the next
waypoint.
• When the robot is within the waypoint’s goal range, the higher-level planner changes
the goal with the next waypoint in the list.
19. 18
Results
Indoor navigation
1. Roadmap construction evaluation
2. Expected trajectory characteristics
3. Actual trajectory characteristics
4. Physical robot experiments
à Each roadmap is evaluated on 100 randomly generated queries from the C-free.
20. 1. Roadmap construction evaluation
• The higher sampling density produces larger maps and more successful
queries.
• The number of nodes in the map does not depend on the local planner, but
the number of edges and collision checks do.
• Roadmaps built with the RL local planner are more densely connected with 15%
and 50% more edges.
• The RL agent can go around the corners and small obstacles.
19
Results
21. 20
Results
2. Expected trajectory characteristics
• The RL agent does not require the robot to come to rest at the goal region,
therefore the robot experiences some inertia when the waypoint is
switched. This causes some of the failures.
• The PRM-RL paths contain more waypoints except Building 3.
• Expected trajectory length and duration are longer for the RL agent.
22. 21
Results
3. Actual trajectory characteristics
• We look at the query characteristics for successful versus unsuccessful
queries.
• The RL agent produces higher success rate than the PRM-SL.
• The successful trajectories have fewer waypoints than the expected
waypoints, which means that the shorter queries are more likely to succeed.
23. 4. Physical robot experiments
• To transfer of our approach on a real robot, we created a simple slalom-like
environment with four obstacles.
22
Results