SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
1
off-policy
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Guided Meta-Policy Search
Presenter:Tatsuya Matsushima @__tmats__ , Matsuo Lab
• off-policy arXiv
• Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
[Rakelly+ 2019] (2019/3/19)
• Guided Meta-Policy Search [Mendonca+ 2019] (2019/4/1)
• MAML meta-training
off-policy
2
3
 (meta learning)
• : Wiki http://ibisforest.org/index.php?%E3%83%A1%E3%82%BF%E5%AD%A6%E7%BF%92
• [DL ]Meta-Learning Probabilistic Inference for Prediction ( )
• https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for-
prediction-126167192
4
MAML
MAML (Model Agnostic Meta-Learning) [Finn+ 2017]
•
• adapt
• MAML
• 2
• 1 [Nichol+2018]
• meta-test 

5
min
θ ∑
𝒯
ℒ (θ − α∇θℒ (θ, 𝒟tr
𝒯), 𝒟val
𝒯 ) = min
θ ∑
𝒯
ℒ (ϕ 𝒯, 𝒟val
𝒯 )
θ ϕ 𝒯
ϕ 𝒯test
= θ − α∇θℒ (θ, 𝒟tr
𝒯test)
MAML
• loss loss( )
• MAML model-based [Nagabandi+ 2018] [Gupta+ 2018]
• [DL ]Meta Reinforcement Learning ( )
• https://www.slideshare.net/DeepLearningJP2016/dl-130067084
6
ℒRL (ϕ, 𝒟 𝒯i) = −
1
𝒟 𝒯i
∑
st,at∈𝒟
ri (st, at)
= − 𝔼st,at∼πϕ,q 𝒯i [
1
H
H
∑
t=1
ri (st, at)
]
( ) On-policy v.s. Off-policy
On-policy ( )
• ( )
•
• ) ε-greedy
Off-policy ( )
•
•
※ MAML train
test (= off-policy )
7
Efficient Off-Policy Meta-Reinforcement

Learning via Probabilistic Context Variables
8
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context
Variables
• https://arxiv.org/abs/1903.08254 (Submitted on 19 Mar 2019)
• Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, Sergey Levine
• UC Berkeley (BAIR)
• Deep RL ”UC Berkeley”
•
• https://github.com/katerakelly/oyster
• (BAIR )PyTorch rlkit
9
TL; DR
• meta learning off-policy (PEARL)
• (context)
• permutation invariant
• 20-100
10
( MAML )
• meta-training adaptation on-policy
• MAML meta-train meta-test off-policy
• adapt
•
11
12
• off-policy RL (soft actor-critic, SAC [Haarnoja+ 2018]) 

context (PEARL)
• Meta-training adapt
• meta-train policy
context
• meta-test context policy
adapt
• policy off-policy meta-train meta-
test on-policy
13
MDP
•
•
•
• :
• :
• 1
•
•
14
p(𝒯)
𝒯 𝒯 = {p (s0), p (st+1 |st, at), r (st, at)}
𝒯 c 𝒯
n = (sn, an, rn, s′n)
c = c 𝒯
1:N
p(𝒯)
context
• adapt
•
• (Inference network)
•
• prior Gaussian
• meta-train meta-test
15
z
z
qϕ(z|c)
𝔼 𝒯
[
𝔼z∼qϕ(z|c 𝒯
) [
R(𝒯, z) + βDKL (qϕ (z|c 𝒯
) ∥p(z))]]
p(z)
qϕ(z|c) ϕ zz
context
• MDP
• permutation invariant
• Inference network
• Gaussian
16
{si, ai, s′i, ri}
qϕ (z|c1:N) ∝ ΠN
n=1Ψϕ (z|cn)
Ψϕ (z|cn) = 𝒩 (fμ
ϕ (cn), fσ
ϕ (cn))
off-policy
• policy 

• actor ciritic 

• 

• on-policy 

on-policy test
17
qϕ(z|c)
ℬ
𝒮c
off-policy
• Soft Actor-Critic (SAC) [Haarnoja+ 2018] context
• SAC maxEntRL( ) off-policy actor-critic
• actor critic reparameterization trick
• critic loss: 

• actor loss:
18
ℒcritic = 𝔼(s, a, r, s′
) ∼ ℬ
z ∼ qϕ(z|c)
[Qθ(s, a, z) − (r + V (s′, z))]
2
z
ℒactor = 𝔼s∼ℬ,a∼πθ
DKL
(
πθ(a|s, z)∥
exp (Qθ(s, a, z))
𝒵θ(s) )
19
• MuJoCo 6
• Half-Cheetah, Humanoid, Ant, Walker (Half-Cheetah Ant 2 )
•
• adapt
• 20-100 

• : meta-training
• :
20
• on-policy (MAESN[Gupta+ 2018])
• sparse navigation
• meta-test 

• 

• context
• MAESN
21
Ablation Study
•
• Half-Cheetah-Vel
• RNN
• RNN-tran: de-correlated
• RNN-traj:
• permutation invariant 

22
Ablation Study
•
• Half-Cheetah-Vel
•
• off-policy: off-policy( )
• off-policy RL-batch: policy
• 

(PEARL)
23
Ablation Study
• context
• sparse navigation
• context
• 

24
25
• off-policy (PEARL)
• context policy context
off-policy
• meta-training
26
Guided Meta-Policy Search
27
Guided Meta-Policy Search
• https://arxiv.org/abs/1904.00956 (Submitted on 1 Apr 2019)
• Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea
Finn
• UC Berkeley (BAIR)
• …
•
• https://github.com/RussellM2020/GMPS
• Website
• https://sites.google.com/berkeley.edu/guided-metapolicy-search
28
TL; DR
• meta learning off-policy (GMPS)
• meta-train RL
• meta-train meta-objective( ) imitation learning (behaviour cloning)
• meta-training task learning meta-learning 2
29
( MAML )
• meta-training adaptation on-policy
• [Rakelly+ 2019]
• meta-training meta-test
30
31
• meta-train meta-objective( ) (behaviour cloning)
• meta-training 2
• task learning: meta-training policy
• policy meta-test expert
• meta-learning: policy meta-level supervised
32
[Rakelly+ 2019]
•
•
•
33
p(𝒯)
𝒯 𝒯 = {p (s0), p (st+1 |st, at), r (st, at)}
p(𝒯)
task learning
• meta-training 

/ policy
•
meta-learning
• MAML
• adapt
• MAML
• (behaviour cloning)
34
𝒯i
{π*i
}
ℒRL (ϕi, 𝒟i)
ϕi 𝒯i
ℒBC (ϕi, 𝒟i) ≜ −
∑
(st,at)∈𝒟
log πϕ (at |st)
meta-learning
• meta-training 



• policy 

meta-objective
• 



behaviour cloning compounding error 

35
𝒯i
π*i
D*i
min
θ ∑
𝒯i
∑
𝒟val
i ∼𝒟*i
𝔼 𝒟tr
i ∼πθ [
ℒBC (θ − α∇θℒRL (θ, 𝒟tr
i ), 𝒟val
i )]
θ
𝒯i
ϕi
D*i
• meta-learning task learning meta-learning
• policy
•
• meta-training
• ) reward shaping
• MAML
36
policy
• policy 

contextual policy
• ( ID )
• meta-training
• meta-test meta-training
• soft actor-critic(SAC) [Haarnoja+ 2018]
37
πθ (at |st, ω)
ω
• Behaviour cloning meta-objective 

• 

• 



•
• Behaviour cloning
38
θ
ϕi
πθ
ϕi = θ + α𝔼τ∼πθ
[
πθ(τ)
πθinit
(τ)
∇θlog πθ(τ)Ai(τ)
]
Ai
θ ← θ − β∇θℒBC (ϕi, 𝒟val
i )
39
•
• Pushing (full state)
•
•
• Pushing (vision)
•
• Door opening
•
•
• (Ant)
•
https://sites.google.com/berkeley.edu/guided-metapolicy-search 40
•
• meta-training task context( )
• SAC
• : meta-training :
41
•
• Door Opening Ant
•
• pushing
•
42
43
• off-policy (GMPS)
• meta-training task learning meta-learning 2
(behaviour cloning)
• meta-training
44
45
• 2
• one-step update adapt (BAIR )
• ) MAML[Finn+ 2017]
• adapt (DeepMind )
• ) Neural Processes[Garnelo+ 2018], GQN[Eslami+ 2018]
•
•
• [DL ]Meta-Learning Probabilistic Inference for Prediction
• https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for-
prediction-126167192
• pro-con
46
Appendix
47
References
[Eslami+ 2018] Eslami, S. M. Ali, Danilo Jimenez Rezende, Frédéric Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman,
Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil C.
Rabinowitz, Helen King, Chloe Hillier, Matthew M Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis. “Neural scene
representation and rendering.” Science 360 (2018): 1204-1210. http://science.sciencemag.org/content/360/6394/1204
{Finn+ 2017] Chelsea Finn, Pieter Abbeel and Sergey Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,”
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1126-1135, 2017. http://proceedings.mlr.press/v70/
finn17a.html
[Garnelo+ 2018] Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola and Danilo J. Rezende, S.M. Ali Eslami and Yee Whye
Teh. “Neural Processes”. https://arxiv.org/abs/1807.01622.
[Gupta+ 2018] Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel and Sergey Levine. ”Meta-Reinforcement Learning of
Structured Exploration Strategies”. In Advances in Neural Information Processing Systems, 2018. https://nips.cc/Conferences/2018/
Schedule?showEvent=12658
[Haarnoja+ 2018] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel and Sergey Levine. “Soft Actor-Critic: Off-Policy Maximum Entropy Deep
Reinforcement Learning with a Stochastic Actor”. Proceedings of the 35th International Conference on Machine Learning, PMLR
80:1861-1870, 2018. http://proceedings.mlr.press/v80/haarnoja18b.html
[Mendonca+ 2019] Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine and Chelsea Finn. “Guided Meta-
Policy Search”. https://arxiv.org/abs/1904.00956
[Nagabandi+ 2018] Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine and Chelsea Finn.
“Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning”. https://arxiv.org/abs/1803.11347
[Nichol+2018] Alex Nichol, Joshua Achiam and John Schulman. “On First-Order Meta-Learning Algorithms”. https://arxiv.org/abs/1803.02999
[Rakelly+ 2019] Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn ands Sergey Levine. “Efficient Off-Policy Meta-Reinforcement
Learning via Probabilistic Context Variables”. https://arxiv.org/abs/1903.08254
48

Más contenido relacionado

La actualidad más candente

NIPS KANSAI Reading Group #7: 逆強化学習の行動解析への応用
NIPS KANSAI Reading Group #7: 逆強化学習の行動解析への応用NIPS KANSAI Reading Group #7: 逆強化学習の行動解析への応用
NIPS KANSAI Reading Group #7: 逆強化学習の行動解析への応用Eiji Uchibe
 
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learningゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement LearningPreferred Networks
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...Deep Learning JP
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷Eiji Sekiya
 
多様な強化学習の概念と課題認識
多様な強化学習の概念と課題認識多様な強化学習の概念と課題認識
多様な強化学習の概念と課題認識佑 甲野
 
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...Deep Learning JP
 
深層強化学習でマルチエージェント学習(前篇)
深層強化学習でマルチエージェント学習(前篇)深層強化学習でマルチエージェント学習(前篇)
深層強化学習でマルチエージェント学習(前篇)Junichiro Katsuta
 
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy GeneralizationDeep Learning JP
 
RL_chapter1_to_chapter4
RL_chapter1_to_chapter4RL_chapter1_to_chapter4
RL_chapter1_to_chapter4hiroki yamaoka
 
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展Deep Learning JP
 
報酬設計と逆強化学習
報酬設計と逆強化学習報酬設計と逆強化学習
報酬設計と逆強化学習Yusuke Nakata
 
DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜Jun Okumura
 
強化学習アルゴリズムPPOの解説と実験
強化学習アルゴリズムPPOの解説と実験強化学習アルゴリズムPPOの解説と実験
強化学習アルゴリズムPPOの解説と実験克海 納谷
 
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling ProblemDeep Learning JP
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANsDeep Learning JP
 
強化学習 DQNからPPOまで
強化学習 DQNからPPOまで強化学習 DQNからPPOまで
強化学習 DQNからPPOまでharmonylab
 
「これからの強化学習」勉強会#1
「これからの強化学習」勉強会#1「これからの強化学習」勉強会#1
「これからの強化学習」勉強会#1Chihiro Kusunoki
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP
 
[DL輪読会]Meta Reinforcement Learning
[DL輪読会]Meta Reinforcement Learning[DL輪読会]Meta Reinforcement Learning
[DL輪読会]Meta Reinforcement LearningDeep Learning JP
 
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement LearningDeep Learning JP
 

La actualidad más candente (20)

NIPS KANSAI Reading Group #7: 逆強化学習の行動解析への応用
NIPS KANSAI Reading Group #7: 逆強化学習の行動解析への応用NIPS KANSAI Reading Group #7: 逆強化学習の行動解析への応用
NIPS KANSAI Reading Group #7: 逆強化学習の行動解析への応用
 
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learningゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
 
多様な強化学習の概念と課題認識
多様な強化学習の概念と課題認識多様な強化学習の概念と課題認識
多様な強化学習の概念と課題認識
 
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
 
深層強化学習でマルチエージェント学習(前篇)
深層強化学習でマルチエージェント学習(前篇)深層強化学習でマルチエージェント学習(前篇)
深層強化学習でマルチエージェント学習(前篇)
 
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
 
RL_chapter1_to_chapter4
RL_chapter1_to_chapter4RL_chapter1_to_chapter4
RL_chapter1_to_chapter4
 
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
 
報酬設計と逆強化学習
報酬設計と逆強化学習報酬設計と逆強化学習
報酬設計と逆強化学習
 
DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜
 
強化学習アルゴリズムPPOの解説と実験
強化学習アルゴリズムPPOの解説と実験強化学習アルゴリズムPPOの解説と実験
強化学習アルゴリズムPPOの解説と実験
 
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs
 
強化学習 DQNからPPOまで
強化学習 DQNからPPOまで強化学習 DQNからPPOまで
強化学習 DQNからPPOまで
 
「これからの強化学習」勉強会#1
「これからの強化学習」勉強会#1「これからの強化学習」勉強会#1
「これからの強化学習」勉強会#1
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
[DL輪読会]Meta Reinforcement Learning
[DL輪読会]Meta Reinforcement Learning[DL輪読会]Meta Reinforcement Learning
[DL輪読会]Meta Reinforcement Learning
 
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
 

Similar a [DL輪読会] off-policyなメタ強化学習

Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...Deep Learning JP
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS DatasetKan Yuenyong
 
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation LearningDeep Learning JP
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureNecip Oguz Serbetci
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesAndreas Metzger
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsGiulio Carducci
 
Introduction Machine Learning Syllabus
Introduction Machine Learning SyllabusIntroduction Machine Learning Syllabus
Introduction Machine Learning SyllabusAndres Mendez-Vazquez
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesIntuit Inc.
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...MOVING Project
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptxShree Shree
 
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Ingo Frommholz
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...Marcel Schmitz
 

Similar a [DL輪読会] off-policyなメタ強化学習 (20)

Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender Systems
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS Dataset
 
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
 
Deep Meta Learning
Deep Meta Learning Deep Meta Learning
Deep Meta Learning
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media Posts
 
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
 
Introduction Machine Learning Syllabus
Introduction Machine Learning SyllabusIntroduction Machine Learning Syllabus
Introduction Machine Learning Syllabus
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
 
Course outline
Course outlineCourse outline
Course outline
 
Licentiate Defense Slide
Licentiate Defense SlideLicentiate Defense Slide
Licentiate Defense Slide
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
 

Más de Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについてDeep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLMDeep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP
 

Más de Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 

Último

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

[DL輪読会] off-policyなメタ強化学習

  • 1. 1 off-policy Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables Guided Meta-Policy Search Presenter:Tatsuya Matsushima @__tmats__ , Matsuo Lab
  • 2. • off-policy arXiv • Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables [Rakelly+ 2019] (2019/3/19) • Guided Meta-Policy Search [Mendonca+ 2019] (2019/4/1) • MAML meta-training off-policy 2
  • 3. 3
  • 4.  (meta learning) • : Wiki http://ibisforest.org/index.php?%E3%83%A1%E3%82%BF%E5%AD%A6%E7%BF%92 • [DL ]Meta-Learning Probabilistic Inference for Prediction ( ) • https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for- prediction-126167192 4
  • 5. MAML MAML (Model Agnostic Meta-Learning) [Finn+ 2017] • • adapt • MAML • 2 • 1 [Nichol+2018] • meta-test 
 5 min θ ∑ 𝒯 ℒ (θ − α∇θℒ (θ, 𝒟tr 𝒯), 𝒟val 𝒯 ) = min θ ∑ 𝒯 ℒ (ϕ 𝒯, 𝒟val 𝒯 ) θ ϕ 𝒯 ϕ 𝒯test = θ − α∇θℒ (θ, 𝒟tr 𝒯test)
  • 6. MAML • loss loss( ) • MAML model-based [Nagabandi+ 2018] [Gupta+ 2018] • [DL ]Meta Reinforcement Learning ( ) • https://www.slideshare.net/DeepLearningJP2016/dl-130067084 6 ℒRL (ϕ, 𝒟 𝒯i) = − 1 𝒟 𝒯i ∑ st,at∈𝒟 ri (st, at) = − 𝔼st,at∼πϕ,q 𝒯i [ 1 H H ∑ t=1 ri (st, at) ]
  • 7. ( ) On-policy v.s. Off-policy On-policy ( ) • ( ) • • ) ε-greedy Off-policy ( ) • • ※ MAML train test (= off-policy ) 7
  • 8. Efficient Off-Policy Meta-Reinforcement
 Learning via Probabilistic Context Variables 8
  • 9. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables • https://arxiv.org/abs/1903.08254 (Submitted on 19 Mar 2019) • Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, Sergey Levine • UC Berkeley (BAIR) • Deep RL ”UC Berkeley” • • https://github.com/katerakelly/oyster • (BAIR )PyTorch rlkit 9
  • 10. TL; DR • meta learning off-policy (PEARL) • (context) • permutation invariant • 20-100 10
  • 11. ( MAML ) • meta-training adaptation on-policy • MAML meta-train meta-test off-policy • adapt • 11
  • 12. 12
  • 13. • off-policy RL (soft actor-critic, SAC [Haarnoja+ 2018]) 
 context (PEARL) • Meta-training adapt • meta-train policy context • meta-test context policy adapt • policy off-policy meta-train meta- test on-policy 13
  • 14. MDP • • • • : • : • 1 • • 14 p(𝒯) 𝒯 𝒯 = {p (s0), p (st+1 |st, at), r (st, at)} 𝒯 c 𝒯 n = (sn, an, rn, s′n) c = c 𝒯 1:N p(𝒯)
  • 15. context • adapt • • (Inference network) • • prior Gaussian • meta-train meta-test 15 z z qϕ(z|c) 𝔼 𝒯 [ 𝔼z∼qϕ(z|c 𝒯 ) [ R(𝒯, z) + βDKL (qϕ (z|c 𝒯 ) ∥p(z))]] p(z) qϕ(z|c) ϕ zz
  • 16. context • MDP • permutation invariant • Inference network • Gaussian 16 {si, ai, s′i, ri} qϕ (z|c1:N) ∝ ΠN n=1Ψϕ (z|cn) Ψϕ (z|cn) = 𝒩 (fμ ϕ (cn), fσ ϕ (cn))
  • 17. off-policy • policy 
 • actor ciritic 
 • 
 • on-policy 
 on-policy test 17 qϕ(z|c) ℬ 𝒮c
  • 18. off-policy • Soft Actor-Critic (SAC) [Haarnoja+ 2018] context • SAC maxEntRL( ) off-policy actor-critic • actor critic reparameterization trick • critic loss: 
 • actor loss: 18 ℒcritic = 𝔼(s, a, r, s′ ) ∼ ℬ z ∼ qϕ(z|c) [Qθ(s, a, z) − (r + V (s′, z))] 2 z ℒactor = 𝔼s∼ℬ,a∼πθ DKL ( πθ(a|s, z)∥ exp (Qθ(s, a, z)) 𝒵θ(s) )
  • 19. 19
  • 20. • MuJoCo 6 • Half-Cheetah, Humanoid, Ant, Walker (Half-Cheetah Ant 2 ) • • adapt • 20-100 
 • : meta-training • : 20
  • 21. • on-policy (MAESN[Gupta+ 2018]) • sparse navigation • meta-test 
 • 
 • context • MAESN 21
  • 22. Ablation Study • • Half-Cheetah-Vel • RNN • RNN-tran: de-correlated • RNN-traj: • permutation invariant 
 22
  • 23. Ablation Study • • Half-Cheetah-Vel • • off-policy: off-policy( ) • off-policy RL-batch: policy • 
 (PEARL) 23
  • 24. Ablation Study • context • sparse navigation • context • 
 24
  • 25. 25
  • 26. • off-policy (PEARL) • context policy context off-policy • meta-training 26
  • 28. Guided Meta-Policy Search • https://arxiv.org/abs/1904.00956 (Submitted on 1 Apr 2019) • Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn • UC Berkeley (BAIR) • … • • https://github.com/RussellM2020/GMPS • Website • https://sites.google.com/berkeley.edu/guided-metapolicy-search 28
  • 29. TL; DR • meta learning off-policy (GMPS) • meta-train RL • meta-train meta-objective( ) imitation learning (behaviour cloning) • meta-training task learning meta-learning 2 29
  • 30. ( MAML ) • meta-training adaptation on-policy • [Rakelly+ 2019] • meta-training meta-test 30
  • 31. 31
  • 32. • meta-train meta-objective( ) (behaviour cloning) • meta-training 2 • task learning: meta-training policy • policy meta-test expert • meta-learning: policy meta-level supervised 32
  • 33. [Rakelly+ 2019] • • • 33 p(𝒯) 𝒯 𝒯 = {p (s0), p (st+1 |st, at), r (st, at)} p(𝒯)
  • 34. task learning • meta-training 
 / policy • meta-learning • MAML • adapt • MAML • (behaviour cloning) 34 𝒯i {π*i } ℒRL (ϕi, 𝒟i) ϕi 𝒯i ℒBC (ϕi, 𝒟i) ≜ − ∑ (st,at)∈𝒟 log πϕ (at |st)
  • 35. meta-learning • meta-training 
 
 • policy 
 meta-objective • 
 
 behaviour cloning compounding error 
 35 𝒯i π*i D*i min θ ∑ 𝒯i ∑ 𝒟val i ∼𝒟*i 𝔼 𝒟tr i ∼πθ [ ℒBC (θ − α∇θℒRL (θ, 𝒟tr i ), 𝒟val i )] θ 𝒯i ϕi D*i
  • 36. • meta-learning task learning meta-learning • policy • • meta-training • ) reward shaping • MAML 36
  • 37. policy • policy 
 contextual policy • ( ID ) • meta-training • meta-test meta-training • soft actor-critic(SAC) [Haarnoja+ 2018] 37 πθ (at |st, ω) ω
  • 38. • Behaviour cloning meta-objective 
 • 
 • 
 
 • • Behaviour cloning 38 θ ϕi πθ ϕi = θ + α𝔼τ∼πθ [ πθ(τ) πθinit (τ) ∇θlog πθ(τ)Ai(τ) ] Ai θ ← θ − β∇θℒBC (ϕi, 𝒟val i )
  • 39. 39
  • 40. • • Pushing (full state) • • • Pushing (vision) • • Door opening • • • (Ant) • https://sites.google.com/berkeley.edu/guided-metapolicy-search 40
  • 41. • • meta-training task context( ) • SAC • : meta-training : 41
  • 42. • • Door Opening Ant • • pushing • 42
  • 43. 43
  • 44. • off-policy (GMPS) • meta-training task learning meta-learning 2 (behaviour cloning) • meta-training 44
  • 45. 45
  • 46. • 2 • one-step update adapt (BAIR ) • ) MAML[Finn+ 2017] • adapt (DeepMind ) • ) Neural Processes[Garnelo+ 2018], GQN[Eslami+ 2018] • • • [DL ]Meta-Learning Probabilistic Inference for Prediction • https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for- prediction-126167192 • pro-con 46
  • 48. References [Eslami+ 2018] Eslami, S. M. Ali, Danilo Jimenez Rezende, Frédéric Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil C. Rabinowitz, Helen King, Chloe Hillier, Matthew M Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis. “Neural scene representation and rendering.” Science 360 (2018): 1204-1210. http://science.sciencemag.org/content/360/6394/1204 {Finn+ 2017] Chelsea Finn, Pieter Abbeel and Sergey Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,” Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1126-1135, 2017. http://proceedings.mlr.press/v70/ finn17a.html [Garnelo+ 2018] Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola and Danilo J. Rezende, S.M. Ali Eslami and Yee Whye Teh. “Neural Processes”. https://arxiv.org/abs/1807.01622. [Gupta+ 2018] Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel and Sergey Levine. ”Meta-Reinforcement Learning of Structured Exploration Strategies”. In Advances in Neural Information Processing Systems, 2018. https://nips.cc/Conferences/2018/ Schedule?showEvent=12658 [Haarnoja+ 2018] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel and Sergey Levine. “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor”. Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1861-1870, 2018. http://proceedings.mlr.press/v80/haarnoja18b.html [Mendonca+ 2019] Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine and Chelsea Finn. “Guided Meta- Policy Search”. https://arxiv.org/abs/1904.00956 [Nagabandi+ 2018] Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine and Chelsea Finn. “Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning”. https://arxiv.org/abs/1803.11347 [Nichol+2018] Alex Nichol, Joshua Achiam and John Schulman. “On First-Order Meta-Learning Algorithms”. https://arxiv.org/abs/1803.02999 [Rakelly+ 2019] Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn ands Sergey Levine. “Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables”. https://arxiv.org/abs/1903.08254 48