The talk addresses the full workflow for Deep Reinforcement Learning: choosing an adequate environment, crafting a reward function, choosing a policy function, training and deployment. Using Model-Based Design, the talk demonstrates how to build and control a virtual biped humanoid robot in Simulink and leverages Deep Reinforcement Learning in MATLAB, specifically the Deep Deterministic Policy Gradient (DDPG), to successfully train the agent. Finally, we discuss how to deploy the optimal policies to the target hardware, using C/C++ or CUDA.
9. 9
A walking robot – the traditional way
Observations
Motor
Commands
Camera
Data
Feature
Extraction
State
Estimation
Control
System
Motor
Commands
Observations
Sensors
Motor
Control
Leg & Trunk
Trajectories
Balance
10. 10
A walking robot – the alternative approach
Observations
Camera
Data
Feature
Extraction
State
Estimation
Control
System
Sensors
Motor
Commands
Motor
Commands
Observations
Camera
Data
Sensors
Black Box
Controller
11. 11
What is Reinforcement Learning?
Reinforcement learning is learning what to do—how to map
situations to actions—so as to maximize a numerical reward signal.
The learner is not told which actions to take, but instead must
discover which actions yield the most reward by trying them.
Sutton and Barto,
Reinforcement Learning: An Introduction
“
”
20. 20
Reward
A function that outputs a scalar number that represents the "goodness" of
an agent being in a particular state and taking a particular action.
25. 25
The Agent
Policy
function that maps
observations to actions
Reinforcement
Learning Algorithm
optimization method
used to find the
optimal policy
26. 26
The Policy
Tells the agent which
actions to take given
the current state
reward the instantaneous benefit of being in a state and taking a specific action
value the total reward an agent expects to receive from a state and onwards into the future
32. 32
Training our Deep Reinforcement Learning Agent
Accelerate training by running simulations in parallel
on multicore computers, clusters or the cloud
Train on the GPU when using
Deep Neural Networks for Actor
or Critic representations
37. 37
Key takeaways
▪ Reinforcement Learning can solve complicated problems
▪ Deep Neural Networks can handle continuous or high-dimensional
state and action spaces
▪ MATLAB and Simulink provide a complete workflow for Deep
Reinforcement Learning
Can’t wait to play with it? Visit our booth!
Code
github.com/mathworks/msra-walking-robot
Download MATLAB
mathworks.com/matlab-bigth19