[Title] Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (2018)
[Authors] Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine
[Link] https://arxiv.org/abs/1805.12114
* This paper is accepted for the spotlight session at NIPS 2018
This presentation includes some of the contents related to the paper, "Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning", Nagabandi et al. (ICML 2018).
Biology for Computer Engineers Course Handout.pptx
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (2018)
1. Terry Taewoong Um (terry.t.um@gmail.com)
University of Waterloo
Department of Electrical & Computer Engineering
Terry Taewoong Um
DEEP REINFORCEMENT LEARNING IN A
HANDFUL OF TRIALS USING PROBABIL-
ISTIC DYNAMICS MODELS
1
3. REINFORCEMENT LEARNING IS HOT
Terry Taewoong Um (terry.t.um@gmail.com)
(Pictures from Karpathy’s blog)
• Baselines
(https://www.cs.ubc.ca/~gberseth/blog/demystifying-the-
many-deep-reinforcement-learning-algorithms.html)
3
4. WHAT IS THE PROBLEM?
Terry Taewoong Um (terry.t.um@gmail.com)
Gu and Holly et al., “Deep Reinforcement Learning for Robotic
Manipulation with Asynchronous Off-Policy Updates”, 2016.
• RL requires a lot of data
- Rewards in RL give more indirect
information than labels in
supervised learning
• RL is not generalize well in new
tasks / environments
- Meta learning
• RL have been used for robotics
before the era of deep RL
- RL with Gaussian process
4
5. MODEL-FREE VS. MODEL-BASED
Terry Taewoong Um (terry.t.um@gmail.com)
5
Model
Performance : Model-free RL > Model-based RL
Data efficiency : Model-free RL < Model-based RL
(MLSS2017, Jan Peters)
6. GP MODEL VS NN MODEL
Terry Taewoong Um (terry.t.um@gmail.com)
6
Learning speed : GP model > NN model
For small data : GP model > NN model
Capacity : GP model < NN model
For large data : GP model < NN model
Q) How can we make a NN-model-based RL with less weaknesses?
In other words, how can we make a NN-model-based RL which is
also good for small data?
7. Terry Taewoong Um (terry.t.um@gmail.com)
7
ICML 2018
https://sites.google.com/view/mbmf
8. NN-MODEL-BASED RL
Terry Taewoong Um (terry.t.um@gmail.com)
8
• How can we choose the optimal actions with a learned model ?
• What is model predictive control (MPC)?
14. ALEATORIC: PROBABILISTIC NN (P)
Terry Taewoong Um (terry.t.um@gmail.com)
14
• Probabilistic NN (P)
• Deterministic NN (D)
15. EPISTEMIC: ENSEMBLE (E)
Terry Taewoong Um (terry.t.um@gmail.com)
15
• Ensemble : Look at the variance of the predictions
16. HOW DO WE USE THESE UNCERTAINTIES?
Terry Taewoong Um (terry.t.um@gmail.com)
16
Nagabandi et al. (ICML2018)
• Action selection
Random shooting CEM
(Samples actions closer to the action
samples that yield high reward)
• Computing the expected trajectory
reward using recursive state prediction
closed-form is generally intractable
particle-based state propagation
17. STATE PROPAGATION METHODS
Terry Taewoong Um (terry.t.um@gmail.com)
17
• Expectation (E) : deterministic approach
• Moment matching (MM)
• Distribution sampling (DS)
• Trajectory sampling (TS)
22. CONCLUSION
Terry Taewoong Um (terry.t.um@gmail.com)
• Probabilistic NN, Ensemble-based uncertainty estimation, MPC,
and trajectory sampling methods are combined for the proposed
model-based approach
22
• It is more data-efficient than model-free approaches and
achieves a comparable performance
• Probabilistic model takes the most important role for achieving a
good performance in model-based RL
• [Idea] A state propagation that consider the kinematics of the body?