SlideShare a Scribd company logo
1 of 22
Terry Taewoong Um (terry.t.um@gmail.com)
University of Waterloo
Department of Electrical & Computer Engineering
Terry Taewoong Um
DEEP REINFORCEMENT LEARNING IN A
HANDFUL OF TRIALS USING PROBABIL-
ISTIC DYNAMICS MODELS
1
Terry Taewoong Um (terry.t.um@gmail.com)
2
NIPS 2018
REINFORCEMENT LEARNING IS HOT
Terry Taewoong Um (terry.t.um@gmail.com)
(Pictures from Karpathy’s blog)
• Baselines
(https://www.cs.ubc.ca/~gberseth/blog/demystifying-the-
many-deep-reinforcement-learning-algorithms.html)
3
WHAT IS THE PROBLEM?
Terry Taewoong Um (terry.t.um@gmail.com)
Gu and Holly et al., “Deep Reinforcement Learning for Robotic
Manipulation with Asynchronous Off-Policy Updates”, 2016.
• RL requires a lot of data
- Rewards in RL give more indirect
information than labels in
supervised learning
• RL is not generalize well in new
tasks / environments
- Meta learning
• RL have been used for robotics
before the era of deep RL
- RL with Gaussian process
4
MODEL-FREE VS. MODEL-BASED
Terry Taewoong Um (terry.t.um@gmail.com)
5
Model
Performance : Model-free RL > Model-based RL
Data efficiency : Model-free RL < Model-based RL
(MLSS2017, Jan Peters)
GP MODEL VS NN MODEL
Terry Taewoong Um (terry.t.um@gmail.com)
6
Learning speed : GP model > NN model
For small data : GP model > NN model
Capacity : GP model < NN model
For large data : GP model < NN model
Q) How can we make a NN-model-based RL with less weaknesses?
In other words, how can we make a NN-model-based RL which is
also good for small data?
Terry Taewoong Um (terry.t.um@gmail.com)
7
ICML 2018
https://sites.google.com/view/mbmf
NN-MODEL-BASED RL
Terry Taewoong Um (terry.t.um@gmail.com)
8
• How can we choose the optimal actions with a learned model ?
• What is model predictive control (MPC)?
TRAINING
Terry Taewoong Um (terry.t.um@gmail.com)
9
• Training the model
• Choose the optimal policy
NN-MODEL-BASED RL
Terry Taewoong Um (terry.t.um@gmail.com)
10
Initialize the model with MBRL
and fine-tune with MFRL
Terry Taewoong Um (terry.t.um@gmail.com)
11
NIPS 2018
ICML 2018
UNCERTAINTY IN DL
Terry Taewoong Um (terry.t.um@gmail.com)
12
• Two types of uncertainty :
aleatoric (w/ data) & epistemic (w/o data) uncertainty
UNCERTAINTY IN DL
Terry Taewoong Um (terry.t.um@gmail.com)
13
ALEATORIC: PROBABILISTIC NN (P)
Terry Taewoong Um (terry.t.um@gmail.com)
14
• Probabilistic NN (P)
• Deterministic NN (D)
EPISTEMIC: ENSEMBLE (E)
Terry Taewoong Um (terry.t.um@gmail.com)
15
• Ensemble : Look at the variance of the predictions
HOW DO WE USE THESE UNCERTAINTIES?
Terry Taewoong Um (terry.t.um@gmail.com)
16
Nagabandi et al. (ICML2018)
• Action selection
Random shooting  CEM
(Samples actions closer to the action
samples that yield high reward)
• Computing the expected trajectory
reward using recursive state prediction
 closed-form is generally intractable
 particle-based state propagation
STATE PROPAGATION METHODS
Terry Taewoong Um (terry.t.um@gmail.com)
17
• Expectation (E) : deterministic approach
• Moment matching (MM)
• Distribution sampling (DS)
• Trajectory sampling (TS)
ALGORITHM SUMMARY
Terry Taewoong Um (terry.t.um@gmail.com)
18
19
EXPERIMENTS
Terry Taewoong Um (terry.t.um@gmail.com)
https://sites.google.com/view/drl-in-a-handful-of-trials/home
EXPERIMENTS
Terry Taewoong Um (terry.t.um@gmail.com)
20
EXPERIMENTS
Terry Taewoong Um (terry.t.um@gmail.com)
21
CONCLUSION
Terry Taewoong Um (terry.t.um@gmail.com)
• Probabilistic NN, Ensemble-based uncertainty estimation, MPC,
and trajectory sampling methods are combined for the proposed
model-based approach
22
• It is more data-efficient than model-free approaches and
achieves a comparable performance
• Probabilistic model takes the most important role for achieving a
good performance in model-based RL
• [Idea] A state propagation that consider the kinematics of the body?

More Related Content

What's hot

About Two Motion Planning Papers
About Two Motion Planning PapersAbout Two Motion Planning Papers
About Two Motion Planning PapersTerry Taewoong Um
 
Deep Variational Bayes Filters (2017)
Deep Variational Bayes Filters (2017)Deep Variational Bayes Filters (2017)
Deep Variational Bayes Filters (2017)Terry Taewoong Um
 
Introduction to Deep Learning with TensorFlow
Introduction to Deep Learning with TensorFlowIntroduction to Deep Learning with TensorFlow
Introduction to Deep Learning with TensorFlowTerry Taewoong Um
 
On Calibration of Modern Neural Networks (2017)
On Calibration of Modern Neural Networks (2017)On Calibration of Modern Neural Networks (2017)
On Calibration of Modern Neural Networks (2017)Terry Taewoong Um
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lectureSara-Jayne Terp
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Zachary Thomas
 

What's hot (6)

About Two Motion Planning Papers
About Two Motion Planning PapersAbout Two Motion Planning Papers
About Two Motion Planning Papers
 
Deep Variational Bayes Filters (2017)
Deep Variational Bayes Filters (2017)Deep Variational Bayes Filters (2017)
Deep Variational Bayes Filters (2017)
 
Introduction to Deep Learning with TensorFlow
Introduction to Deep Learning with TensorFlowIntroduction to Deep Learning with TensorFlow
Introduction to Deep Learning with TensorFlow
 
On Calibration of Modern Neural Networks (2017)
On Calibration of Modern Neural Networks (2017)On Calibration of Modern Neural Networks (2017)
On Calibration of Modern Neural Networks (2017)
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lecture
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 

More from Terry Taewoong Um

#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전
#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전
#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전Terry Taewoong Um
 
A brief introduction to OCR (Optical character recognition)
A brief introduction to OCR (Optical character recognition)A brief introduction to OCR (Optical character recognition)
A brief introduction to OCR (Optical character recognition)Terry Taewoong Um
 
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)Terry Taewoong Um
 
Deep learning (Machine learning) tutorial for beginners
Deep learning (Machine learning) tutorial for beginnersDeep learning (Machine learning) tutorial for beginners
Deep learning (Machine learning) tutorial for beginnersTerry Taewoong Um
 
로봇과 인공지능, 그리고 미래의 노동
로봇과 인공지능, 그리고 미래의 노동로봇과 인공지능, 그리고 미래의 노동
로봇과 인공지능, 그리고 미래의 노동Terry Taewoong Um
 
Lie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot MechanicsLie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot MechanicsTerry Taewoong Um
 

More from Terry Taewoong Um (6)

#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전
#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전
#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전
 
A brief introduction to OCR (Optical character recognition)
A brief introduction to OCR (Optical character recognition)A brief introduction to OCR (Optical character recognition)
A brief introduction to OCR (Optical character recognition)
 
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)
 
Deep learning (Machine learning) tutorial for beginners
Deep learning (Machine learning) tutorial for beginnersDeep learning (Machine learning) tutorial for beginners
Deep learning (Machine learning) tutorial for beginners
 
로봇과 인공지능, 그리고 미래의 노동
로봇과 인공지능, 그리고 미래의 노동로봇과 인공지능, 그리고 미래의 노동
로봇과 인공지능, 그리고 미래의 노동
 
Lie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot MechanicsLie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot Mechanics
 

Recently uploaded

An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Recently uploaded (20)

An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (2018)

  • 1. Terry Taewoong Um (terry.t.um@gmail.com) University of Waterloo Department of Electrical & Computer Engineering Terry Taewoong Um DEEP REINFORCEMENT LEARNING IN A HANDFUL OF TRIALS USING PROBABIL- ISTIC DYNAMICS MODELS 1
  • 2. Terry Taewoong Um (terry.t.um@gmail.com) 2 NIPS 2018
  • 3. REINFORCEMENT LEARNING IS HOT Terry Taewoong Um (terry.t.um@gmail.com) (Pictures from Karpathy’s blog) • Baselines (https://www.cs.ubc.ca/~gberseth/blog/demystifying-the- many-deep-reinforcement-learning-algorithms.html) 3
  • 4. WHAT IS THE PROBLEM? Terry Taewoong Um (terry.t.um@gmail.com) Gu and Holly et al., “Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates”, 2016. • RL requires a lot of data - Rewards in RL give more indirect information than labels in supervised learning • RL is not generalize well in new tasks / environments - Meta learning • RL have been used for robotics before the era of deep RL - RL with Gaussian process 4
  • 5. MODEL-FREE VS. MODEL-BASED Terry Taewoong Um (terry.t.um@gmail.com) 5 Model Performance : Model-free RL > Model-based RL Data efficiency : Model-free RL < Model-based RL (MLSS2017, Jan Peters)
  • 6. GP MODEL VS NN MODEL Terry Taewoong Um (terry.t.um@gmail.com) 6 Learning speed : GP model > NN model For small data : GP model > NN model Capacity : GP model < NN model For large data : GP model < NN model Q) How can we make a NN-model-based RL with less weaknesses? In other words, how can we make a NN-model-based RL which is also good for small data?
  • 7. Terry Taewoong Um (terry.t.um@gmail.com) 7 ICML 2018 https://sites.google.com/view/mbmf
  • 8. NN-MODEL-BASED RL Terry Taewoong Um (terry.t.um@gmail.com) 8 • How can we choose the optimal actions with a learned model ? • What is model predictive control (MPC)?
  • 9. TRAINING Terry Taewoong Um (terry.t.um@gmail.com) 9 • Training the model • Choose the optimal policy
  • 10. NN-MODEL-BASED RL Terry Taewoong Um (terry.t.um@gmail.com) 10 Initialize the model with MBRL and fine-tune with MFRL
  • 11. Terry Taewoong Um (terry.t.um@gmail.com) 11 NIPS 2018 ICML 2018
  • 12. UNCERTAINTY IN DL Terry Taewoong Um (terry.t.um@gmail.com) 12 • Two types of uncertainty : aleatoric (w/ data) & epistemic (w/o data) uncertainty
  • 13. UNCERTAINTY IN DL Terry Taewoong Um (terry.t.um@gmail.com) 13
  • 14. ALEATORIC: PROBABILISTIC NN (P) Terry Taewoong Um (terry.t.um@gmail.com) 14 • Probabilistic NN (P) • Deterministic NN (D)
  • 15. EPISTEMIC: ENSEMBLE (E) Terry Taewoong Um (terry.t.um@gmail.com) 15 • Ensemble : Look at the variance of the predictions
  • 16. HOW DO WE USE THESE UNCERTAINTIES? Terry Taewoong Um (terry.t.um@gmail.com) 16 Nagabandi et al. (ICML2018) • Action selection Random shooting  CEM (Samples actions closer to the action samples that yield high reward) • Computing the expected trajectory reward using recursive state prediction  closed-form is generally intractable  particle-based state propagation
  • 17. STATE PROPAGATION METHODS Terry Taewoong Um (terry.t.um@gmail.com) 17 • Expectation (E) : deterministic approach • Moment matching (MM) • Distribution sampling (DS) • Trajectory sampling (TS)
  • 18. ALGORITHM SUMMARY Terry Taewoong Um (terry.t.um@gmail.com) 18
  • 19. 19 EXPERIMENTS Terry Taewoong Um (terry.t.um@gmail.com) https://sites.google.com/view/drl-in-a-handful-of-trials/home
  • 20. EXPERIMENTS Terry Taewoong Um (terry.t.um@gmail.com) 20
  • 21. EXPERIMENTS Terry Taewoong Um (terry.t.um@gmail.com) 21
  • 22. CONCLUSION Terry Taewoong Um (terry.t.um@gmail.com) • Probabilistic NN, Ensemble-based uncertainty estimation, MPC, and trajectory sampling methods are combined for the proposed model-based approach 22 • It is more data-efficient than model-free approaches and achieves a comparable performance • Probabilistic model takes the most important role for achieving a good performance in model-based RL • [Idea] A state propagation that consider the kinematics of the body?