More Related Content
Similar to Can increasing input dimensionality improve deep reinforcement learning? (20)
More from harmonylab (20)
Can increasing input dimensionality improve deep reinforcement learning?
- 1. Can Increasing Input Dimensionality
Improve Deep Reinforcement Learning?
åæµ·éå€§åŠ å€§åŠé¢ æ
å ±ç§åŠé¢
調å系工åŠç 究宀
修士課çš1幎 å€§æ± åŒå³»
- 2. 1è«ææ
å ±
⢠Kei Ota1, Tomoaki Oiki1, Devesh K. Jha2, Toshisada Mariyama1,
Daniel Nikovski2
â 1Mitsubishi Electric Corporation
â 2Mitsubishi Electric Research Laboratory
⢠International Conference on Machine Learning (ICML 2020)
⢠è«æ
https://arxiv.org/abs/2003.01629
⢠ã¹ã©ã€ãçºè¡šïŒSlidesLiveïŒ
https://slideslive.com/38928117/can-increasing-input-
dimensionality-improve-deep-reinforcement-learning
⢠ã³ãŒã
https://www.merl.com/research/license/OFENet
- 5. 4é¢é£ç 究: ML-DDPG
⢠芳枬ç¶æ
ãåŠç¿ããããããã¯ãŒã¯ãDDPGã«è¿œå
⢠å
éšè¡šçŸ ð ð ð
ãDDPGã®å
¥åã«äœ¿çš
⢠ãããã¯ãŒã¯ã¯æ¬¡å
éšè¡šçŸ ð ð ð+ð
ãšå ±é
¬ ð ð+ðãäºæž¬
ð¿ ð = ð ð ð¡+1
â ð ð ð¡+1
2
+ ð ð ðð¡+1 â ðð¡+1
2
⢠ãã ãð ð ð
ã®å€§ãã㯠ð ð ã®1/3ãšãªã£ãŠããïŒå§çž®ïŒ
ð ð
FC
ð ð ð
ð ð
concat
FC
FCFC
ð ð ð+ð
ð ð+ð
- 7. 6è£å©ã¿ã¹ã¯ã®åŠç¿
⢠次ç¶æ
ãäºæž¬ããããã®ã¢ãžã¥ãŒã« ðpredãè¿œå
⢠ãã©ã¡ãŒã¿ ðœ ðð®ð± = {ðœ ð ð
, ðœ ð ð,ð
, ðœ ð©ð«ðð} ã以äžã®æ倱é¢æ°
ã§æé©å
ð¿ ðð¢ð¥ = ðŒ ð ð¡,ð ð¡ ~ð,ð ðpred ð ð ð¡,ð ð¡
â ðð¡+1
2
ð ð
State
Feature Extractor
ð ð,ð
State-Action
Feature Extractor
ð ð ð ð ð
ð ð
ð ð ð,ð ð
ð ð©ð«ðð
Linear
Network
ð ð+ð
ðœ ð ð
ðœ ð ð,ð
ðœ ð©ð«ðð
- 9. 8å®éšâ æé©ãªã¢ãŒããã¯ãã£ã®èª¿æ»
⢠è£å©ã¿ã¹ã¯ãšå®éã®ã¿ã¹ã¯ïŒå ±é
¬ã®æ倧åïŒã䜿ã£ãŠæé©ãª
OFENetã®ã¢ãŒããã¯ãã£ã調æ»
â å±€å士ã®æ¥ç¶æ¹æ³: ððð, ððð ððð¬ððð, ððð ððð§ð¬ðððð
â å±€ã®æ°: MLPã®å Žå nlayers â {1, 2, 3, 4}ãããä»¥å€ nlayersâ {2, 4, 6, 8}
â 掻æ§åé¢æ°: ðððð, ððð§ð¡, ðððð€ð² ðððð, ð¬ð°ð¢ð¬ð¡, ðððð
⢠è£å©ã¹ã³ã¢ã®æž¬å®: ã©ã³ãã ã«åéãã100kã®é·ç§»ãèšç·Žã«ã
20kãè©äŸ¡ã«äœ¿çš
⢠å®ã¹ã³ã¢ã®æž¬å®: 500kã¹ãããåŠç¿ããSACã®å ±é
¬ã䜿çš
FC
ð ð
FC
ð ð ð
MLP Net
FC
ð ð
FC
ð ð ð
MLP ResNet
FC
ð ð
FC
ð ð ð
MLP DenseNet
concat
concat
- 13. 12Ablation study â OFENetã®æç¡
⢠SACãAnt-v2ã§åŠç¿
⢠åçŽã«SACã®ãã©ã¡ãŒã¿ãå¢ãããã ãã§ã¯ã¹ã³ã¢ã¯
倧ããæ¹åããªã
- 14. 13Ablation study â Batch-Normalization
⢠SACãAnt-v2ã§åŠç¿
⢠Batch-Normalizationããªã³ã©ã€ã³åŠç¿äžã«å€åããå
¥å
ååžã®åœ±é¿ãæå¶
- 15. 14Ablation study â è£å©ã¿ã¹ã¯ãšãªã³ã©ã€ã³åŠç¿
⢠SACãAnt-v2ã§åŠç¿
⢠å®ã¿ã¹ã¯ïŒå ±é
¬ã®æ倧åïŒã§ã¯é«æ¬¡å
è¡šçŸã®ç²åŸãäžå¯
⢠ãªã³ã©ã€ã³åŠç¿ã«ãã£ãŠæ°ããªé·ç§»ã«å¯Ÿå¿