Learning To Run

•

0 recomendaciones•137 vistas

We present our approach for the NIPS 2017 "Learning To Run" challenge. The goal of the challenge is to develop a controller able to run in a complex environment, by training a model with Deep Reinforcement Learning methods. We follow the approach of the team Reason8 (3rd place). We begin from the algorithm that performed better on the task, DDPG. We implement and benchmark several improvements over vanilla DDPG, including parallel sampling, parameter noise, layer normalization and domain specific changes. We were able to reproduce results of the Reason8 team, obtaining a model able to run for more than 30m.

Ingeniería

Learning To Run
Deep Learning Course
Emanuele Ghelfi Leonardo Arcari Emiliano Gagliardi
https://github.com/MultiBeerBandits/learning-to-run
March 31, 2019
Politecnico di Milano

Our Goal
The goal of this project is to replicate the results of Reason8 team
in the NIPS 2017 Learning To Run competition 1.
• Given a human musculoskeletal model and a physics-based
simulation environment
• Develop a controller that runs as fast as possible
1
https://www.crowdai.org/challenges/nips-2017-learning-to-run
1

Reinforcement Learning
Reinforcement Learning (RL) deals with sequential decision making
problems. At each timestep the agent observes the world state,
selects an action and receives a reward.
πs a
Agent
r
∼  (⋅ ∣ s, a)s
′
Goal: Maximize the expected discounted sum of rewards:
Jπ = E
[∑H
t=0 γtr(st, at)
]
.
2

Deep Reinforcement Learning
The policy πθ is encoded in a neural network with weights θ.
s a
Agent
r
(a ∣ s)πθ
∼  (⋅ ∣ s, a)s
′
How? Gradient ascent over policy parameters: θ′ = θ + η∇θJπ
(Policy gradient theorem).
3

Learning To Run
s ∈ ℝ
34
(s)πθ
a ∈ [0, 1]
18
∼  (⋅ ∣ s, a)s
′
• State space represents kinematic quantities of joints and links.
• Actions represents muscles activations.
• Reward is proportional to the speed of the body. A penalization is given
when the pelvis height is below a threshold, and the episode restarts. 4

Deep Deterministic Policy Gradient - DDPG
• State of the art algorithm in Deep Reinforcement Learning.
• Off-policy.
• Actor-critic method.
• Combines in an effective way Deterministic Policy Gradient
(DPG) and Deep Q-Network (DQN).
5

Deep Deterministic Policy Gradient - DDPG
Main characteristics of DDPG:
• Deterministic actor π(s) : S → A.
• Replay Buffer to solve the sample independence problem while
training.
• Separated target networks with soft-updates to improve
convergence stability.
6

DDPG Improvements
We implemented several improvements over vanilla DDPG:
• Parameter noise (with layer normalization) and action noise to
improve exploration.
• State and action flip (data augmentation).
• Relative Positions (feature engineering).
7

DDPG Improvements
Dispatch sampling
jobs
Samples
ready
no
yes
Train
Store in replay
buffer
Dispatch
evaluation job
Evaluation
ready
no
yes
Display statistics
Time expired
no
yes
Sampling workers
dispatch
Testing workers
dispatch
Replay buffer
dispatch
8

DDPG Improvements
yes
no
yes
Sampling workers Testing workers
Replay buffer
dispatch
Actori
s a
πθi
9

Results - Thread number impact
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35Distance(m)
20 Threads
10 Threads
10

Results - Ablation study
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35
Distance(m)
Flip - PN
Flip - No PN
No Flip - PN
No Flip - No PN
0 2 16 18 69 71 74 97
Training time (h)
11

Results - Full state vs Reduced State
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35Distance(m)
reduced
full
12

Actor-Critic networks
Elu Elu σ
s ∈ ℝ
34
64 64 a ∈ [0, 1]
18
T anh T anh Linear
64 32
a ∈ [0, 1]
18
s ∈ ℝ
34
Actor Critic
1
13

Más contenido relacionado

La actualidad más candente

0415_seminar_DeepDPG

Hye-min Ahn

Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés) However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks. With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!

An introduction to deep reinforcement learning

Big Data Colombia

Lecture 7.3 bt

btmathematics

Reinforcement Learning (RL) is a genre of Machine Learning in which an agent learns to choose optimal actions in different states in order to reach its specified goal, solely by interacting with the environment through trial and error. Unlike supervised learning, the agent does not get examples of "correct" actions in given states as ground truth. Instead, it has to use feedback from the environment (which can be sparse and delayed) to improve its policy over time. The formulation of the RL problem closely resembles the way in which human beings learn to act in different situations. Hence it is often considered the gateway to achieving the goal of Artificial General Intelligence. The motivation of this talk is to introduce the audience to key theoretical concepts like formulation of the RL problem using Markov Decision Process (MDP) and solution of MDP using dynamic programming and policy gradient based algorithms. State-of-the-art deep reinforcement learning algorithms will also be covered. A case study of the application of reinforcement learning in robotics will also be presented.

An Introduction to Reinforcement Learning - The Doors to AGI

Anirban Santara

Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...

Uday Haral

An introduction to reinforcement learning

Subrat Panda, PhD

Frontier in reinforcement learning

Jie-Han Chen

Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015

Chris Ohk

Kernel, RKHS, and Gaussian Processes

Sungjoon Choi

La actualidad más candente (9)

0415_seminar_DeepDPG

An introduction to deep reinforcement learning

Lecture 7.3 bt

An Introduction to Reinforcement Learning - The Doors to AGI

Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...

An introduction to reinforcement learning

Frontier in reinforcement learning

Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015

Kernel, RKHS, and Gaussian Processes

Similar a Learning To Run

Reinforcement learning

DongHyun Kwak

Introduction of Deep Reinforcement Learning

NAVER Engineering

B4UConference_machine learning_deeplearning

Hoa Le

https://telecombcn-dl.github.io/2018-dlai/ Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.

Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018

Universitat Politècnica de Catalunya

DDPG algortihm for angry birds

Wangyu Han

Efficient aggregation for graph summarization

aftab alam

Deep Q-learning from Demonstrations DQfD

Ammar Rashed

Reinforcement Learning

Salem-Kabbani

Rohan's Masters presentation

rohan_anil

consistency regularization for generative adversarial networks_review

Yoonho Na

Transfer Learning: Breve introducción a modelos pre-entrenados.

Fernando Constantino

Using SigOpt to Tune Deep Learning Models with Nervana Cloud

SigOpt

Learning visual representation without human label

Kai-Wen Zhao

This talk will be a summary of the recent advances in deep learning research, current trends in the industry, and the opportunities that lie ahead. We will discuss topics in research such as: Transformers, GPT-3, BERT Neural Architecture Search, Evolutionary Search Distillation, self-learning NeRF Self-Attention Also shifting industry trends such as: The move to free data Rising importance of 3D vision Using synthetic data (Sim2Real) Mobile vision & Federated Learning

The Frontier of Deep Learning in 2020 and Beyond

NUS-ISS

Intro to Deep Reinforcement Learning

Khaled Saleh

Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego

DEVCON

MILA DL & RL summer school highlights

Natalia Díaz Rodríguez

cs330_2021_lifelong_learning.pdf

Kuan-Tsae Huang

Imitation Learning for Autonomous Driving in TORCS

Preferred Networks

Online advertising and large scale model fitting

Wush Wu

Similar a Learning To Run (20)

Reinforcement learning

Introduction of Deep Reinforcement Learning

B4UConference_machine learning_deeplearning

Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018

DDPG algortihm for angry birds

Efficient aggregation for graph summarization

Deep Q-learning from Demonstrations DQfD

Reinforcement Learning

Rohan's Masters presentation

consistency regularization for generative adversarial networks_review

Transfer Learning: Breve introducción a modelos pre-entrenados.

Using SigOpt to Tune Deep Learning Models with Nervana Cloud

Learning visual representation without human label

The Frontier of Deep Learning in 2020 and Beyond

Intro to Deep Reinforcement Learning

Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego

MILA DL & RL summer school highlights

cs330_2021_lifelong_learning.pdf

Imitation Learning for Autonomous Driving in TORCS

Online advertising and large scale model fitting

Último

Increased aeration of the soil; Stabilized soil structure; Higher and more diversified crop production; Better workability of the land; Earlier planting dates; Reduction of peak discharges by an increased temporary storage of water in the soil decomposition of organic matter; soil subsidence; reduced irrigation efficiency; increased risk of drought. excessive leaching of valuable nutrients from the soil; downstream environmental damage by salty or otherwise polluted drainage water; the presence of ditches, canals, and structures impending accessibility and interfering with other infrastructural elements of the land.

chapter 5.pptx: drainage and irrigation engineering

mulugeta48

Unit 1 - Soil Classification and Compaction.pdf

RagavanV2

GAS POWER CYCLES Cycles: Otto, Diesel, Dual, Brayton - Calculation of mean effective pressure - Air standard efficiency - Comparison of cycles INTERNAL COMBUSTION ENGINES Classification - Components and their function - valve timing diagram and port timing diagram - actual and theoretical p-v diagram of two stroke and four stroke engines – carburettor - diesel pump and injector system - battery and magneto ignition system - principles of combustion and detonation in CI engines - lubrication and cooling systems - performance parameters and calculations.

Thermal Engineering Unit - I & II . ppt

DineshKumar4165

Employee leave management system project.

Kamal Acharya

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads

Arindam Chakraborty, Ph.D., P.E. (CA, TX)

Call Girl Meerut Indira Call Now: 8617697112 Meerut Escorts Booking Contact Details WhatsApp Chat: +91-8617697112 Meerut Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts Meerut understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide –

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7

Call Girls in Nagpur High Profile Call Girls

STEAM NOZZLES AND TURBINES Flow of steam through nozzles, shapes of nozzles, effect of friction, critical pressure ratio, supersaturated flow - impulse and reaction principles, velocity diagram, work done and efficiency – types of compounding - governors. AIR COMPRESSORS Classification - working principle - type of compressors, work of compression with and without clearance - volumetric efficiency - isothermal and isentropic efficiency of reciprocating compressors - multistage air compressor with inter cooling.

Thermal Engineering -unit - III & IV.ppt

DineshKumar4165

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand

amitlee9823

Call girls in delhi ✔️✔️🔝 9953056974 🔝✔️✔️Welcome To Vip Escort Services In Delhi [ ]Noida Gurgaon 24/7 Open Sex Escort Services With Happy Ending ServiCe Done By Most Attractive Charming Soft Spoken Bold Beautiful Full Cooperative Independent Escort Girls ServiCe In All-Star Hotel And Home Service In All Over Delhi, Noida, Gurgaon, Faridabad, Ghaziabad, Greater Noida, • IN CALL AND OUT CALL SERVICE IN DELHI NCR • 3* 5* 7* HOTELS SERVICE IN DELHI NCR • 24 HOURS AVAILABLE IN DELHI NCR • INDIAN, RUSSIAN, PUNJABI, KASHMIRI ESCORTS • REAL MODELS, COLLEGE GIRLS, HOUSE WIFE, ALSO AVAILABLE • SHORT TIME AND FULL TIME SERVICE AVAILABLE • HYGIENIC FULL AC NEAT AND CLEAN ROOMS AVAIL. IN HOTEL 24 HOURS • DAILY NEW ESCORTS STAFF AVAILABLE • MINIMUM TO MAXIMUM RANGE AVAILABLE. Call Girls in Delhi & Independent Escort Service – CALL GIRLS SERVICE DELHI NCR Vip call girls in Delhi Call Girls in Delhi, Call Girl Service 24×7 open Call Girls in Delhi Best Delhi Escorts in Delhi Low Rate Call Girls In Saket Delhi X~CALL GIRLS IN Ramesh Nagar Metro best Delhi call girls and Delhi escort service. CALL GIRLS SERVICE IN ALL DELHI … (Delhi) Call Girls in (Chanakyapuri) Hot And Sexy Independent Model Escort Service In Delhi Unlimited Enjoy Genuine 100% Profiles And Trusted Door Step Call Girls Feel Free To Call Us Female Service Hot Busty & Sexy Party Girls Available For Complete Enjoyment. We Guarantee Full Satisfaction & In Case Of Any Unhappy Experience, We Would Refund Your Fees, Without Any Questions Asked. Feel Free To Call Us Female Service Provider Hours Opens Thanks. Delhi Escorts Services 100% secure Services.Incall_OutCall Available and outcall Services provide. We are available 24*7 for Full Night and short Time Escort Services all over Delhi NCR. Delhi All Hotel Services available 3* 4* 5* Call Call Delhi Escorts Services And Delhi Call Girl Agency 100% secure Services in my agency. Incall and outcall Services provide. We are available 24*7 for Full Night and short Time Escort Services my agency in all over New Delhi Delhi All Hotel Services available my agency SERVICES [✓✓✓] Housewife College Girl VIP Escort Independent Girl Aunty Without a Condom sucking )? Sexy Aunty.DSL (Dick Sucking Lips)? DT (Dining at the Toes English Spanking) Doggie (Sex style from no behind)?? OutCall- All Over Delhi Noida Gurgaon 24/7 FOR APPOINTMENT Call/Whatsop / 9953056974

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service

9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking Booking Now open +91- 7737669865 Why you Choose Us- +91- 7737669865 HOT⇄ 7737669865 Mr ashu ji Call Mr ashu Ji +91- 7737669865 (V020524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking

roncy bisnoi

Call Girl Aurangabad Indira Call Now: 8617697112 Aurangabad Escorts Booking Contact Details WhatsApp Chat: +91-8617697112 Aurangabad Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts Aurangabad understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide –

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7

Call Girls in Nagpur High Profile Call Girls

In the dynamic landscape of energy storage, choosing the right battery pack is a critical decision that significantly impacts performance, efficiency, and overall product design. This webinar aims to unravel the complexities surrounding standard and custom battery packs (primarily lithium), providing you with a comprehensive understanding to make informed decisions. We'll embark on a journey to explore the fundamental distinctions between standard off-the-shelf battery packs and their bespoke counterparts tailored to specific applications. We will delve into the nuance of what is defined as a standard battery pack along with its benefits, limitations, and how it caters to broad market needs. Simultaneously, we will dissect the world of custom battery packs, diving into the advantages they offer in terms of precise energy requirements, form factors, and unique design considerations. Whether you are a product designer, engineer, or industry professional, this webinar is designed to equip you with the knowledge necessary to navigate the intricate terrain of battery pack selection. Join us for this webinar as we navigate through the intricacies of standard and custom battery packs, empowering you to make strategic decisions that align with your project goal. For more information on our battery pack solutions, visit https://www.epectec.com/batteries/.

Standard vs Custom Battery Packs - Decoding the Power Play

Epec Engineered Technologies

Thermal Engineering-R & A / C - unit - V

DineshKumar4165

Work-Permit-Receiver-in-Saudi-Aramco.pptx

JuliansyahHarahap1

Unit 2- Effective stress & Permeability.pdf

RagavanV2

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor Escorts Service Available Whatsapp Chaya ☎️ : [+91-7001035870] Escorts Service are always ready to make their clients happy. Their exotic looks and sexy personalities are sure to turn heads. You can enjoy with them, including massages and erotic encounters. Our area Escorts are young and sexy, so you can expect to have an exotic time with them. They are trained to satiate your naughty nerves and they can handle anything that you want. They are also intelligent, so they know how to make you feel comfortable and relaxed Independent Escorts Service They know all the sex positions and can satisfy you in any way that you desire. They can even give you erotic massages to help you relax before your session. This is essential, because a man who is stressed won’t be receptive to the pleasures of sex. They also know how to play with your sexy organs, so you’ll have plenty of foreplay and cuddling. P252024SS SERVICE ✅ ❣️ ⭐➡️HOT & SEXY MODELS // COLLEGE GIRLS HOUSE WIFE RUSSIAN , AIR HOSTES ,VIP MODELS . AVAILABLE FOR COMPLETE ENJOYMENT WITH HIGH PROFILE INDIAN MODEL AVAILABLE HOTEL & HOME ★ SAFE AND SECURE HIGH CLASS SERVICE AFFORDABLE RATE ★ SATISFACTION,UNLIMITED ENJOYMENT. ★ All Meetings are confidential and no information is provided to any one at any cost. ★ EXCLUSIVE PROFILes Are Safe and Consensual with Most Limits Respected ★ Service Available In: - HOME & HOTEL Star Hotel Service .In Call & Out call SeRvIcEs : ★ A-Level (star escort) ★ Strip-tease ★ BBBJ (Bareback Blowjob)Receive advanced sexual techniques in different mode make their life more pleasurable. ★ Spending time in hotel rooms ★ BJ (Blowjob Without a Condom) ★ Completion (Oral to completion) ★ Covered (Covered blowjob Without condom ★ANAL SERVICES.

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor

dharasingh5698

Introduction to Serverless with AWS Lambda

Omar Fathy

Rule 1 − Check for the blocks connected in series and simplify. Rule 2 − Check for the blocks connected in parallel and simplify. Rule 3 − Check for the blocks connected in feedback loop and simplify. Rule 4 − If there is difficulty with take-off point while simplifying, shift it towards right. Rule 5 − If there is difficulty with summing point while simplifying, shift it towards left. Rule 6 − Repeat the above steps till you get the simplified form, i.e., single block.

Block diagram reduction techniques in control systems.ppt

NANDHAKUMARA10

notes on Evolution Of Analytic Scalability.ppt

MsecMca

data_management_and _data_science_cheat_sheet.pdf

JiananWang21

Learning To Run

1. Learning To Run Deep Learning Course Emanuele Ghelfi Leonardo Arcari Emiliano Gagliardi https://github.com/MultiBeerBandits/learning-to-run March 31, 2019 Politecnico di Milano

2. Our Goal

3. Our Goal The goal of this project is to replicate the results of Reason8 team in the NIPS 2017 Learning To Run competition 1. • Given a human musculoskeletal model and a physics-based simulation environment • Develop a controller that runs as fast as possible 1 https://www.crowdai.org/challenges/nips-2017-learning-to-run 1

4. Background

5. Reinforcement Learning Reinforcement Learning (RL) deals with sequential decision making problems. At each timestep the agent observes the world state, selects an action and receives a reward. πs a Agent r ∼  (⋅ ∣ s, a)s ′ Goal: Maximize the expected discounted sum of rewards: Jπ = E [∑H t=0 γtr(st, at) ] . 2

6. Deep Reinforcement Learning The policy πθ is encoded in a neural network with weights θ. s a Agent r (a ∣ s)πθ ∼  (⋅ ∣ s, a)s ′ How? Gradient ascent over policy parameters: θ′ = θ + η∇θJπ (Policy gradient theorem). 3

7. Learning To Run

8. Learning To Run s ∈ ℝ 34 (s)πθ a ∈ [0, 1] 18 ∼  (⋅ ∣ s, a)s ′ • State space represents kinematic quantities of joints and links. • Actions represents muscles activations. • Reward is proportional to the speed of the body. A penalization is given when the pelvis height is below a threshold, and the episode restarts. 4

9. Deep Deterministic Policy Gradient - DDPG • State of the art algorithm in Deep Reinforcement Learning. • Off-policy. • Actor-critic method. • Combines in an effective way Deterministic Policy Gradient (DPG) and Deep Q-Network (DQN). 5

10. Deep Deterministic Policy Gradient - DDPG Main characteristics of DDPG: • Deterministic actor π(s) : S → A. • Replay Buffer to solve the sample independence problem while training. • Separated target networks with soft-updates to improve convergence stability. 6

11. DDPG Improvements We implemented several improvements over vanilla DDPG: • Parameter noise (with layer normalization) and action noise to improve exploration. • State and action flip (data augmentation). • Relative Positions (feature engineering). 7

12. DDPG Improvements Dispatch sampling jobs Samples ready no yes Train Store in replay buffer Dispatch evaluation job Evaluation ready no yes Display statistics Time expired no yes Sampling workers dispatch Testing workers dispatch Replay buffer dispatch 8

13. DDPG Improvements yes no yes Sampling workers Testing workers Replay buffer dispatch Actori s a πθi 9

14. Results

15. Results - Thread number impact 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35Distance(m) 20 Threads 10 Threads 10

16. Results - Ablation study 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35 Distance(m) Flip - PN Flip - No PN No Flip - PN No Flip - No PN 0 2 16 18 69 71 74 97 Training time (h) 11

17. Thank you all! 11

18. Backup slides

19. Results - Full state vs Reduced State 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35Distance(m) reduced full 12

20. Actor-Critic networks Elu Elu σ s ∈ ℝ 34 64 64 a ∈ [0, 1] 18 T anh T anh Linear 64 32 a ∈ [0, 1] 18 s ∈ ℝ 34 Actor Critic 1 13

Learning To Run

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (9)

Similar a Learning To Run

Similar a Learning To Run (20)

Último

Último (20)

Learning To Run