SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Learning To Run
Deep Learning Course
Emanuele Ghelfi Leonardo Arcari Emiliano Gagliardi
https://github.com/MultiBeerBandits/learning-to-run
March 31, 2019
Politecnico di Milano
Our Goal
Our Goal
The goal of this project is to replicate the results of Reason8 team
in the NIPS 2017 Learning To Run competition 1.
• Given a human musculoskeletal model and a physics-based
simulation environment
• Develop a controller that runs as fast as possible
1
https://www.crowdai.org/challenges/nips-2017-learning-to-run
1
Background
Reinforcement Learning
Reinforcement Learning (RL) deals with sequential decision making
problems. At each timestep the agent observes the world state,
selects an action and receives a reward.
πs a
Agent
r
∼  (⋅ ∣ s, a)s
′
Goal: Maximize the expected discounted sum of rewards:
Jπ = E
[∑H
t=0 γtr(st, at)
]
.
2
Deep Reinforcement Learning
The policy πθ is encoded in a neural network with weights θ.
s a
Agent
r
(a ∣ s)πθ
∼  (⋅ ∣ s, a)s
′
How? Gradient ascent over policy parameters: θ′ = θ + η∇θJπ
(Policy gradient theorem).
3
Learning To Run
Learning To Run
s ∈ ℝ
34
(s)πθ
a ∈ [0, 1]
18
∼  (⋅ ∣ s, a)s
′
• State space represents kinematic quantities of joints and links.
• Actions represents muscles activations.
• Reward is proportional to the speed of the body. A penalization is given
when the pelvis height is below a threshold, and the episode restarts. 4
Deep Deterministic Policy Gradient - DDPG
• State of the art algorithm in Deep Reinforcement Learning.
• Off-policy.
• Actor-critic method.
• Combines in an effective way Deterministic Policy Gradient
(DPG) and Deep Q-Network (DQN).
5
Deep Deterministic Policy Gradient - DDPG
Main characteristics of DDPG:
• Deterministic actor π(s) : S → A.
• Replay Buffer to solve the sample independence problem while
training.
• Separated target networks with soft-updates to improve
convergence stability.
6
DDPG Improvements
We implemented several improvements over vanilla DDPG:
• Parameter noise (with layer normalization) and action noise to
improve exploration.
• State and action flip (data augmentation).
• Relative Positions (feature engineering).
7
DDPG Improvements
Dispatch sampling
jobs
Samples
ready
no
yes
Train
Store in replay
buffer
Dispatch
evaluation job
Evaluation 
ready
no
yes
Display statistics
Time expired
no
yes
Sampling workers
dispatch
Testing workers
dispatch
Replay buffer
dispatch
8
DDPG Improvements
yes
no
yes
Sampling workers Testing workers
Replay buffer
dispatch
Actori
s a
πθi
9
Results
Results - Thread number impact
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35Distance(m)
20 Threads
10 Threads
10
Results - Ablation study
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35
Distance(m)
Flip - PN
Flip - No PN
No Flip - PN
No Flip - No PN
0 2 16 18 69 71 74 97
Training time (h)
11
Thank you all!
11
Backup slides
Results - Full state vs Reduced State
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35Distance(m)
reduced
full
12
Actor-Critic networks
Elu Elu σ
s ∈ ℝ
34
64 64 a ∈ [0, 1]
18
T anh T anh Linear
64 32
a ∈ [0, 1]
18
s ∈ ℝ
34
Actor Critic
1
13

Más contenido relacionado

La actualidad más candente

La actualidad más candente (9)

0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPG
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Lecture 7.3 bt
Lecture 7.3 btLecture 7.3 bt
Lecture 7.3 bt
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learning
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
 

Similar a Learning To Run

Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
rohan_anil
 
consistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_review
Yoonho Na
 

Similar a Learning To Run (20)

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
 
Efficient aggregation for graph summarization
Efficient aggregation for graph summarizationEfficient aggregation for graph summarization
Efficient aggregation for graph summarization
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
 
consistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_review
 
Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoApplying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdf
 
Imitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCSImitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCS
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 

Último

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 

Último (20)

chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 

Learning To Run

  • 1. Learning To Run Deep Learning Course Emanuele Ghelfi Leonardo Arcari Emiliano Gagliardi https://github.com/MultiBeerBandits/learning-to-run March 31, 2019 Politecnico di Milano
  • 3. Our Goal The goal of this project is to replicate the results of Reason8 team in the NIPS 2017 Learning To Run competition 1. • Given a human musculoskeletal model and a physics-based simulation environment • Develop a controller that runs as fast as possible 1 https://www.crowdai.org/challenges/nips-2017-learning-to-run 1
  • 5. Reinforcement Learning Reinforcement Learning (RL) deals with sequential decision making problems. At each timestep the agent observes the world state, selects an action and receives a reward. πs a Agent r ∼  (⋅ ∣ s, a)s ′ Goal: Maximize the expected discounted sum of rewards: Jπ = E [∑H t=0 γtr(st, at) ] . 2
  • 6. Deep Reinforcement Learning The policy πθ is encoded in a neural network with weights θ. s a Agent r (a ∣ s)πθ ∼  (⋅ ∣ s, a)s ′ How? Gradient ascent over policy parameters: θ′ = θ + η∇θJπ (Policy gradient theorem). 3
  • 8. Learning To Run s ∈ ℝ 34 (s)πθ a ∈ [0, 1] 18 ∼  (⋅ ∣ s, a)s ′ • State space represents kinematic quantities of joints and links. • Actions represents muscles activations. • Reward is proportional to the speed of the body. A penalization is given when the pelvis height is below a threshold, and the episode restarts. 4
  • 9. Deep Deterministic Policy Gradient - DDPG • State of the art algorithm in Deep Reinforcement Learning. • Off-policy. • Actor-critic method. • Combines in an effective way Deterministic Policy Gradient (DPG) and Deep Q-Network (DQN). 5
  • 10. Deep Deterministic Policy Gradient - DDPG Main characteristics of DDPG: • Deterministic actor π(s) : S → A. • Replay Buffer to solve the sample independence problem while training. • Separated target networks with soft-updates to improve convergence stability. 6
  • 11. DDPG Improvements We implemented several improvements over vanilla DDPG: • Parameter noise (with layer normalization) and action noise to improve exploration. • State and action flip (data augmentation). • Relative Positions (feature engineering). 7
  • 12. DDPG Improvements Dispatch sampling jobs Samples ready no yes Train Store in replay buffer Dispatch evaluation job Evaluation  ready no yes Display statistics Time expired no yes Sampling workers dispatch Testing workers dispatch Replay buffer dispatch 8
  • 13. DDPG Improvements yes no yes Sampling workers Testing workers Replay buffer dispatch Actori s a πθi 9
  • 15. Results - Thread number impact 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35Distance(m) 20 Threads 10 Threads 10
  • 16. Results - Ablation study 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35 Distance(m) Flip - PN Flip - No PN No Flip - PN No Flip - No PN 0 2 16 18 69 71 74 97 Training time (h) 11
  • 19. Results - Full state vs Reduced State 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35Distance(m) reduced full 12
  • 20. Actor-Critic networks Elu Elu σ s ∈ ℝ 34 64 64 a ∈ [0, 1] 18 T anh T anh Linear 64 32 a ∈ [0, 1] 18 s ∈ ℝ 34 Actor Critic 1 13