Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (2018)

Terry Taewoong Um (terry.t.um@gmail.com)
University of Waterloo
Department of Electrical & Computer Engineering
Terry Taewoong Um
DEEP REINFORCEMENT LEARNING IN A
HANDFUL OF TRIALS USING PROBABIL-
ISTIC DYNAMICS MODELS
1

2
NIPS 2018

REINFORCEMENT LEARNING IS HOT
(Pictures from Karpathy’s blog)
• Baselines
(https://www.cs.ubc.ca/~gberseth/blog/demystifying-the-
many-deep-reinforcement-learning-algorithms.html)
3

WHAT IS THE PROBLEM?
Gu and Holly et al., “Deep Reinforcement Learning for Robotic
Manipulation with Asynchronous Off-Policy Updates”, 2016.
• RL requires a lot of data
- Rewards in RL give more indirect
information than labels in
supervised learning
• RL is not generalize well in new
tasks / environments
- Meta learning
• RL have been used for robotics
before the era of deep RL
- RL with Gaussian process
4

MODEL-FREE VS. MODEL-BASED
5
Model
Performance : Model-free RL > Model-based RL
Data efficiency : Model-free RL < Model-based RL
(MLSS2017, Jan Peters)

GP MODEL VS NN MODEL
6
Learning speed : GP model > NN model
For small data : GP model > NN model
Capacity : GP model < NN model
For large data : GP model < NN model
Q) How can we make a NN-model-based RL with less weaknesses?
In other words, how can we make a NN-model-based RL which is
also good for small data?

7
ICML 2018
https://sites.google.com/view/mbmf

NN-MODEL-BASED RL
8
• How can we choose the optimal actions with a learned model ?
• What is model predictive control (MPC)?

TRAINING
9
• Training the model
• Choose the optimal policy

NN-MODEL-BASED RL
10
Initialize the model with MBRL
and fine-tune with MFRL

11
NIPS 2018
ICML 2018

UNCERTAINTY IN DL
12
• Two types of uncertainty :
aleatoric (w/ data) & epistemic (w/o data) uncertainty

UNCERTAINTY IN DL
13

ALEATORIC: PROBABILISTIC NN (P)
14
• Probabilistic NN (P)
• Deterministic NN (D)

EPISTEMIC: ENSEMBLE (E)
15
• Ensemble : Look at the variance of the predictions

HOW DO WE USE THESE UNCERTAINTIES?
16
Nagabandi et al. (ICML2018)
• Action selection
Random shooting  CEM
(Samples actions closer to the action
samples that yield high reward)
• Computing the expected trajectory
reward using recursive state prediction
 closed-form is generally intractable
 particle-based state propagation

STATE PROPAGATION METHODS
17
• Expectation (E) : deterministic approach
• Moment matching (MM)
• Distribution sampling (DS)
• Trajectory sampling (TS)

ALGORITHM SUMMARY
18

19
EXPERIMENTS
https://sites.google.com/view/drl-in-a-handful-of-trials/home

EXPERIMENTS
20

EXPERIMENTS
21

CONCLUSION
• Probabilistic NN, Ensemble-based uncertainty estimation, MPC,
and trajectory sampling methods are combined for the proposed
model-based approach
22
• It is more data-efficient than model-free approaches and
achieves a comparable performance
• Probabilistic model takes the most important role for achieving a
good performance in model-based RL
• [Idea] A state propagation that consider the kinematics of the body?

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (2018)

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

More from Terry Taewoong Um

More from Terry Taewoong Um (6)

Recently uploaded

Recently uploaded (20)

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (2018)