SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
2020/11/02
1
!
! control as inference active inference
!
!
!
! Christopher L Buckley
!
!
!
2
! On the Relationship Between Active Inference and Control as Inference [Millidge+ 20] Control as inference active inference
! Active inference: demystified and compared [Sajid+ 20] Active inference
! Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review [Levine 18] Control as inference
! Reinforcement Learning as Iterative and Amortised Inference [Millidge+ 20] Control as Inference amortized
! What does the free energy principle tell us about the brain? [Gershman 19] Active inference
! Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning [Tang+ 20] Control as inference Variational RL
MDP
! MDP
! state action
state transition probability
! MDP
t st ∈ 𝒮 at ∈ 𝒜 t + 1
st+1 p (st+1 |st, at)
3
st−1 st st+1
at−1 at at+1
POMDP
! MDP observation
!
! POMDP
s o
o s p(o|s)
4
st−1 st st+1
at−1 at at+1
ot−1 ot ot+1
! MDP policy
! trajectory
!
! reward
!
p (a|s)
T τ = (s1, a1, . . . , sT, aT)
r (st, at)
𝔼p(τ)
[
T
∑
t=1
r (st, at)
]
popt (a|s)
5
p(τ) = p(s1:T, a1:T) =
T
∏
t=1
p(at |st)p(st |st−1, at−1)
! plan
!
! Active inference
!
!
!
π = [a1, . . . , aT]
T τ = (s1:T, π)
π
6
p(τ) = p(π)p(s1:T |π) = p(π)
T
∏
t=1
p(st |st−1, π)
! preference
?
1.
! Control as inference RL as inference Planning as inference
! Variational RL
2.
!
! active inference
7
Control as Inference Variational RL
8
! optimality variable
!
!
=>
𝒪t ∈ {0,1}
t st at 𝒪t = 1 t
r
9
p(𝒪t = 1|st, at) := exp (r (st, at))
st
𝒪t
at
st+1
𝒪t+1
at+1
st−1
𝒪t−1
at−1
!
! optimal trajectory distribution
! p ( 𝒪1:t |τ)
10
p ( 𝒪1:T |τ) =
T
∏
t=1
p ( 𝒪t |st, at) =
T
∏
t=1
exp (r (st, at))
p (τ| 𝒪1:T) =
p ( 𝒪1:T |τ) p (τ)
p ( 𝒪1:T)
popt(τ) = p (τ| 𝒪1:T)
※ p ( 𝒪1:T = 1) = p ( 𝒪1:T)
!
!
!
!
p (τ| 𝒪1:T) ∝ p ( 𝒪1:T |τ) p (τ)
𝒪1:T
τ
q(τ)
q(τ)
11
̂q = arg min
q
DKL [q(τ)∥p (τ| 𝒪1:T)]
τ
𝒪1:t
p (τ| 𝒪1:T) ≈ q(τ)
p (τ)
p ( 𝒪1:T |τ)
ELBO
! ELBO
! ELBO
! ELBO
!
q(τ) p(τ)
12
log p ( 𝒪1:T) = log
∫
p ( 𝒪1:T, τ) dτ
= log 𝔼q(τ)
[
p ( 𝒪1:T, τ)
q (τ) ]
≥ 𝔼q(τ) [log p ( 𝒪1:T |τ) + log p (τ) − log q (τ)]
= 𝔼q(τ)
[
T
∑
t=1
r (st, at)
]
− DKL [q(τ)∥p(τ)] =: L(q)
τ
𝒪1:t
p (τ| 𝒪1:T) ≈ q(τ)
p (τ)
p ( 𝒪1:T |τ)
1.
!
!
!
!
!
control as inference; CAI
p (at ∣ st) =
1
| 𝒜|
qϕ (at ∣ st) ϕ
13
qϕ(τ) :=
T
∏
t=1
qϕ (at ∣ st) q (st ∣ st−1, at−1) =
T
∏
t=1
qϕ (at ∣ st) p (st ∣ st−1, at−1)
p(τ) :=
T
∏
t=1
p (at ∣ st) p (st ∣ st−1, at−1) =
1
| 𝒜|
T
∏
t=1
p (st ∣ st−1, at−1)
1.
! ELBO
!
!
14
L(ϕ) = 𝔼qϕ(τ)
[
T
∑
t=1
r (st, at)
]
− DKL [qϕ(τ)∥p(τ)]
≥ 𝔼qϕ(τ)
[
T
∑
t=1
r (st, at) − log qϕ(at |st)
]
= 𝔼qϕ(τ)
[
T
∑
t=1
r (st, at) + ℋ (qϕ(at |st))]
J(ϕ) := 𝔼qϕ(τ)
[
T
∑
t=1
r (st, at) + ℋ (qϕ(at |st))]
Soft Actor-Critic
! Soft Actor-Critic SAC [Haarnoja+ 17, 18]
! ELBO off-policy .
! Q
! Q critic actor
!
! Control as Inference https://deeplearning.jp/reinforcement_cource-2020s/ 
! Control as Inference https://www.slideshare.net/DeepLearningJP2016/dlcontrol-as-inference-201266247
Qθ (st, at) = r(st, at) + 𝔼p(st+1|st,at) [V(st+1)]
Qθ (st, at) qϕ(at |st)
15
Jq
t (ϕ) = 𝔼qϕ(at|st)p(st) [
log (qϕ (at |st)) − Qθ (st, at)]
JQ
t (θ) = 𝔼qϕ(at|st)p(st)
[(
r (st, at) + 𝔼p(st+1|st,at) [V¯θ (st+1)] − Qθ (st, at))
2
]
Vθ(st+1) = 𝔼qϕ(at+1|st+1) [Qθ(st+1, at+1) − log qϕ(at+1 |st+1)]
Q
POMDP
! Control as inference POMDP
! VAE
16
! SLAC[Lee+ 19]
! RNN
!
! [Han+ 19]
! RNN VRNN[Chung+ 16]
! variational recurrent model VRMat
CAI
! CAI
! Mirror descent [Bubeck, 14]
=> Variational Inference Model Predictive Control VI-MPC [Okada+ 19]
!
π
𝒲(π) = 𝔼q(τ)[p(𝒪1:T |τ)]
p(𝒪1:T |τ) := f(r(τ))
17
q(i+1)
(π) ←
q(i)
(π) ⋅ 𝒲 (π) ⋅ q(i)
(π)
𝔼q(i)(π) [ 𝒲 (π) ⋅ q(i) (π)]
[Okada+ 19]
Control as inference
! CAI
! SAC VI-MPC
! amortized [Kingma+ 13]
! [Millidge+ 20]
! amortized
18
2.
! CAI
! ELBO
! ELBO
!
=> Variational RL
p (at ∣ st)
q θ
19
pθ(τ) :=
T
∏
t=1
pθ (at ∣ st) p (st ∣ st−1, at−1)
L(θ, q) = 𝔼q(τ)
[
T
∑
t=1
r (st, at)
]
− DKL [q(τ)∥pθ(τ)]
EM
! E
!
! M
! E ELBO
!
! MPO[Abdolmaleki+ 18] V-MPO[Song+ 19]
! M E
θ θ = θold
θ
θ
20
̂θ = max
θ
𝔼q(τ)[log pθ(τ)] = max
θ
𝔼q(τ)
[
T
∑
t=1
log pθ (at ∣ st)
]
q(τ) = pθold (τ| 𝒪1:T) =
p ( 𝒪1:T ∣ τ) pθold
(τ)
∑τ
p ( 𝒪1:T ∣ τ) pθold
(τ)
MPO E
! Maximum a posteriori Policy Optimization MPO [Abdolmaleki+ 18]
!
! E Q
! Q off-policy
! MPO DL
! https://www.slideshare.net/DeepLearningJP2016/dlhyper-parameter-agnostic-methods-in-reinforcement-learning
θold pθold
(at ∣ st) ̂Qθold
(st, at)
21
q(τ) =
T
∏
t=1
q (at ∣ st) p (st ∣ st−1, at−1)
q(at |st) ∝ pθold
(at ∣ st)exp
̂Qθold
(st, at)
η
η > 0
Control as inference Variational RL
! Control as inference
! Variational RL
!
22
τ
𝒪1:T
p (τ| 𝒪1:T) ≈ q(τ)
p (τ)
p ( 𝒪1:T |τ)
τ
𝒪1:T
pθ (τ| 𝒪1:T) ≈ q(τ)
pθ (τ)
p ( 𝒪1:T |τ)
θ
Control as inference Variational RL
active inference
23
!
! Friston
!
!
24
※ ver.3
https://www.slideshare.net/masatoshiyoshida/ss-238982118
!
!
!
!
! unconscious inference
!
!
!
!
25
?
要因結果
推論(知覚)
!
!
!
o s
o s
26
p(o, s) = p(o|s)p(s)
p(s|o) =
p(s)p(o|s)
∑s
p(s)p(o|s)
推論
状態
⽣成
観測
内部モデル
(世界モデル)環境
!
"
o s
!
!
! Bayesian surprise
! active learning
!
!
a o a
u(o) = DKL[p(s ∣ o, a)||p(s ∣ a)] I(a)
a
I(a) a s o
I(a)
27
I(a) :=
∑
o
p(o ∣ a)DKL[p(s ∣ o, a)||p(s ∣ a)] = 𝔼p(o∣a)[u(o)]
!
.
!
!
o1:T π = [a1, . . . , aT]
U(o1:T) =
T
∑
t=1
u (ot)
28
I(π) = 𝔼p(o1:T∣π) [U(o1:T)] =
∑
o1:T
p(o1:T ∣ π)U(o1:T)
!
! ELBO
! ELBO variational free energy
! free energy principle
!
!
q(s)
−log p(o)
29
log p(o) ≥ 𝔼q(s) [
log
p(o, s)
q(s) ]
F(o, q) := − 𝔼q(s) [
log
p(o, s)
q(s) ]
!
!
!
! 1
!
!
! 2
o
−log p(o)
q
q(s)
30
F(o, q) = − log p(o) + DKL[q(s)||p(s|o)]
! POMDP
!
!
!
!
π = [a1, . . . , aT]
31
p(o1:T, s1:T |π) =
T
∏
t=1
p(ot |st)p(st |st−1, π)
q(s1:T |π) =
T
∏
t=1
q(st |π)
F(o1:T, π) = − 𝔼q(s1:T|π)
[
log
p(o1:T, s1:T |π)
q(s1:T |π) ]
st−1 st st+1
at−1 at at+1
ot−1 ot ot+1
π
!
! expected free energy
32
G(π):= 𝔼p(o1:T ∣ s1:T, π) [F (o1:T, π)]
= − 𝔼p(o1:T ∣ s1:T, π)
𝔼q(s1:T |π)
[
log
p (o1:T, s1:T |π)
q (s1:T |π) ]
= − 𝔼q(o1:T, s1:T |π)
[
log
p (o1:T, s1:T |π)
q (s1:T |π) ]
Active inference
!
! active inference AIF
t Gt
q(st |ot, π) ≈ p(st |ot, π)
33
Gt(π) = − 𝔼q(ot, st ∣ π)
[
log
p (ot, st ∣ π)
q (st ∣ π) ]
≈ − 𝔼q(ot, st ∣ π)
[
log
p (ot |π) q (st ∣ ot, π)
q (st ∣ π) ]
= − 𝔼q(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼q(ot ∣ π) [
DKL [q (st ∣ ot, π)||q (st ∣ π)]]
Active inference
!
! 1
!
! active inference
!
! 1 0
q = p
34
Gt(π) = − 𝔼q(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼q(ot ∣ π) [
DKL [q (st ∣ ot, π)||q (st ∣ π)]]
= − 𝔼p(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼p(ot ∣ π) [
DKL [p (st ∣ ot, π)||p (st ∣ π)]]
= 𝔼p(st ∣ π) [
ℋ (p (ot ∣ π))]
− I(π) ※ p(st |st−1, π) p(st |π)
Active inference
!
! 1
!
! extrinsic value
! 2
! bayesian surprise
! intrinsic value
=>
35
−Gt(π) = 𝔼q(ot,st|π) [log p(ot |π)] + 𝔼q(ot|π) [DKL[q(st |ot, π)||q(st |π)]]
Active inference
!
!
!
!
!
[Gershman+ 19]
!
36
˜p(o1:T) = exp(r(o1:T))
※ ˜p
Control as inference active inference
37
active inference
! Active inference AIF [Millidge+ 20]
!
!
! t −Gt(ϕ)
38
˜p (st, ot, at) = p(st |ot, at)p(at |st)˜p(ot |at) ≈ q(st |ot, at)p(at |st)˜p(ot |at)
qϕ(st, at) = qϕ (at ∣ st) q(st)
−Gt(ϕ) = 𝔼qϕ(ot, st, at)
[
log
˜p (st, ot, at)
qϕ (st, at) ]
≈ 𝔼qϕ(ot, st, at) [log ˜p (ot |at) + log p (at |st) + log q(st |ot, at) − log qϕ (at |st) − log q(st)]
= 𝔼qϕ(ot, st, at) [log ˜p (ot |at)] − 𝔼qϕ(ot, st, at)
[log qϕ (at |st) − log p(at |st)] + 𝔼qϕ(ot, st, at) [log q(st |ot, at) − log q(st)]
≈ 𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] − 𝔼q(st) [
DKL (qϕ (at ∣ st) ∥p (at ∣ st))]
+ 𝔼q(ot, at ∣ st) [
DKL (q (st ∣ ot, at) ∥q (st ∣ at))]
= 𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] + 𝔼q(st) [
ℋ (qϕ (at ∣ st))]
+ 𝔼q(ot, at ∣ st) [
DKL (q (st ∣ ot, at) ∥q (st ∣ at))]
p (at ∣ st) =
1
| 𝒜|
AIF CAI
! CAI
! AIF
! 1
! 2
! AIF
! AIF 3
! CAI AIF
!
39
𝔼q(st,at) [log p ( 𝒪t |st, at)] + 𝔼q(st) [
ℋ (qϕ(at |st))]
𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] + 𝔼q(st) [
ℋ (qϕ (at ∣ st))]
+ 𝔼q(ot, at ∣ st) [
DKL (q (st ∣ ot, at) ∥q (st ∣ at))]
Likelihood-AIF
! AIF CAI Likelihood-AIF
!
! CAI
˜p(ot) ˜p(ot |st)
−Gt
q(st) = p(st) p (at ∣ st) =
1
| 𝒜|
40
−Gt(ϕ) = 𝔼qϕ(ot, st, at)
[
log
˜p (st, ot, at)
qϕ (st, at) ]
= 𝔼qϕ(ot, st, at) [log ˜p (ot ∣ st) + log p (st) + log p (at ∣ st) − log qϕ (at ∣ st) − log q (st)]
= 𝔼qϕ(st, at) [log ˜p (ot ∣ st)] − DKL (q (st)||p (st)) − 𝔼q(st) [
DKL (qϕ (at ∣ st)||p (at ∣ st))]
−Gt(ϕ) = 𝔼qϕ(st, at) [log ˜p (ot |st)] + 𝔼q(st) [
ℋ (qϕ (at ∣ st))]
Likelihood-AIF CAI
! CAI
! Likelihood-AIF
! 2
! AIF POMDP MDP CAI 1
! CAI
! 2
log ˜p (ot ∣ st) = log p ( 𝒪t |st, at)
41
𝔼qϕ(st,at) [log p ( 𝒪t |st, at)] + 𝔼q(st) [
ℋ (qϕ(at |st))]
𝔼qϕ(st, at) [log ˜p (ot |st)] + 𝔼q(st) [
ℋ (qϕ (at ∣ st))]
CAI AIF
! CAI
!
!
!
!
! AIF
!
!
!
42
!
1.
! Control as inference
! Amortized
! Variational RL
2.
! active inference
!
!
!
43

Más contenido relacionado

La actualidad más candente

深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデルMasahiro Suzuki
 
[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展Deep Learning JP
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習Eiji Uchibe
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...Deep Learning JP
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoderSho Tatsuno
 
Variational AutoEncoder
Variational AutoEncoderVariational AutoEncoder
Variational AutoEncoderKazuki Nitta
 
[DLHacks]PyTorch, PixyzによるGenerative Query Networkの実装
[DLHacks]PyTorch, PixyzによるGenerative Query Networkの実装 [DLHacks]PyTorch, PixyzによるGenerative Query Networkの実装
[DLHacks]PyTorch, PixyzによるGenerative Query Networkの実装 Deep Learning JP
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説弘毅 露崎
 
[DL輪読会]Temporal DifferenceVariationalAuto-Encoder
[DL輪読会]Temporal DifferenceVariationalAuto-Encoder[DL輪読会]Temporal DifferenceVariationalAuto-Encoder
[DL輪読会]Temporal DifferenceVariationalAuto-EncoderDeep Learning JP
 
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...Deep Learning JP
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent ImaginationDeep Learning JP
 
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
[DL輪読会]Learning Latent Dynamics for Planning from Pixels[DL輪読会]Learning Latent Dynamics for Planning from Pixels
[DL輪読会]Learning Latent Dynamics for Planning from PixelsDeep Learning JP
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習Deep Learning JP
 
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.Deep Learning JP
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門tmtm otm
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...Deep Learning JP
 
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence ModelingDeep Learning JP
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot LearningDeep Learning JP
 

La actualidad más candente (20)

深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデル
 
[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
 
Variational AutoEncoder
Variational AutoEncoderVariational AutoEncoder
Variational AutoEncoder
 
[DLHacks]PyTorch, PixyzによるGenerative Query Networkの実装
[DLHacks]PyTorch, PixyzによるGenerative Query Networkの実装 [DLHacks]PyTorch, PixyzによるGenerative Query Networkの実装
[DLHacks]PyTorch, PixyzによるGenerative Query Networkの実装
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説
 
[DL輪読会]Temporal DifferenceVariationalAuto-Encoder
[DL輪読会]Temporal DifferenceVariationalAuto-Encoder[DL輪読会]Temporal DifferenceVariationalAuto-Encoder
[DL輪読会]Temporal DifferenceVariationalAuto-Encoder
 
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
 
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
[DL輪読会]Learning Latent Dynamics for Planning from Pixels[DL輪読会]Learning Latent Dynamics for Planning from Pixels
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
 
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
 
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
 
[DL輪読会]World Models
[DL輪読会]World Models[DL輪読会]World Models
[DL輪読会]World Models
 

Similar a 確率的推論と行動選択

Hidden Markov Models common probability formulas
Hidden Markov Models common probability formulasHidden Markov Models common probability formulas
Hidden Markov Models common probability formulasNidhal Selmi
 
関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライドYuchi Matsuoka
 
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ssusere0a682
 
情報幾何の基礎とEMアルゴリズムの解釈
情報幾何の基礎とEMアルゴリズムの解釈情報幾何の基礎とEMアルゴリズムの解釈
情報幾何の基礎とEMアルゴリズムの解釈Fukumu Tsutsumi
 
Hermite integrators and Riordan arrays
Hermite integrators and Riordan arraysHermite integrators and Riordan arrays
Hermite integrators and Riordan arraysKeigo Nitadori
 
A Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNA Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNTomonari Masada
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料Ken'ichi Matsui
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelsun peiyuan
 
Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2ybenjo
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半Ken'ichi Matsui
 
Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...IJRTEMJOURNAL
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.pptFaizAbaas
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDBenjamin Jaedon Choi
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120RCCSRENKEI
 
Discrete mathematics
Discrete mathematicsDiscrete mathematics
Discrete mathematicsM.Saber
 

Similar a 確率的推論と行動選択 (20)

Hidden Markov Models common probability formulas
Hidden Markov Models common probability formulasHidden Markov Models common probability formulas
Hidden Markov Models common probability formulas
 
関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド
 
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
 
情報幾何の基礎とEMアルゴリズムの解釈
情報幾何の基礎とEMアルゴリズムの解釈情報幾何の基礎とEMアルゴリズムの解釈
情報幾何の基礎とEMアルゴリズムの解釈
 
Hermite integrators and Riordan arrays
Hermite integrators and Riordan arraysHermite integrators and Riordan arrays
Hermite integrators and Riordan arrays
 
A Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNA Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILN
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
 
ตรรกวิทยา
ตรรกวิทยาตรรกวิทยา
ตรรกวิทยา
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.model
 
Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半
 
Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.ppt
 
Bayes2
Bayes2Bayes2
Bayes2
 
Radiation
RadiationRadiation
Radiation
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
 
HMM, MEMM, CRF メモ
HMM, MEMM, CRF メモHMM, MEMM, CRF メモ
HMM, MEMM, CRF メモ
 
Recent rl
Recent rlRecent rl
Recent rl
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120
 
Discrete mathematics
Discrete mathematicsDiscrete mathematics
Discrete mathematics
 

Más de Masahiro Suzuki

深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)Masahiro Suzuki
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについてMasahiro Suzuki
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習Masahiro Suzuki
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural NetworkMasahiro Suzuki
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習Masahiro Suzuki
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
 
(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman FiltersMasahiro Suzuki
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel LearningMasahiro Suzuki
 
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...Masahiro Suzuki
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task LearningMasahiro Suzuki
 
(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target PropagationMasahiro Suzuki
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization TrickMasahiro Suzuki
 
(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?Masahiro Suzuki
 

Más de Masahiro Suzuki (17)

深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
 
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
 
(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
 
(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?
 

Último

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptRakeshMohan42
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 

Último (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 

確率的推論と行動選択

  • 2. ! ! control as inference active inference ! ! ! ! Christopher L Buckley ! ! ! 2 ! On the Relationship Between Active Inference and Control as Inference [Millidge+ 20] Control as inference active inference ! Active inference: demystified and compared [Sajid+ 20] Active inference ! Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review [Levine 18] Control as inference ! Reinforcement Learning as Iterative and Amortised Inference [Millidge+ 20] Control as Inference amortized ! What does the free energy principle tell us about the brain? [Gershman 19] Active inference ! Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning [Tang+ 20] Control as inference Variational RL
  • 3. MDP ! MDP ! state action state transition probability ! MDP t st ∈ 𝒮 at ∈ 𝒜 t + 1 st+1 p (st+1 |st, at) 3 st−1 st st+1 at−1 at at+1
  • 4. POMDP ! MDP observation ! ! POMDP s o o s p(o|s) 4 st−1 st st+1 at−1 at at+1 ot−1 ot ot+1
  • 5. ! MDP policy ! trajectory ! ! reward ! p (a|s) T τ = (s1, a1, . . . , sT, aT) r (st, at) 𝔼p(τ) [ T ∑ t=1 r (st, at) ] popt (a|s) 5 p(τ) = p(s1:T, a1:T) = T ∏ t=1 p(at |st)p(st |st−1, at−1)
  • 6. ! plan ! ! Active inference ! ! ! π = [a1, . . . , aT] T τ = (s1:T, π) π 6 p(τ) = p(π)p(s1:T |π) = p(π) T ∏ t=1 p(st |st−1, π)
  • 7. ! preference ? 1. ! Control as inference RL as inference Planning as inference ! Variational RL 2. ! ! active inference 7
  • 8. Control as Inference Variational RL 8
  • 9. ! optimality variable ! ! => 𝒪t ∈ {0,1} t st at 𝒪t = 1 t r 9 p(𝒪t = 1|st, at) := exp (r (st, at)) st 𝒪t at st+1 𝒪t+1 at+1 st−1 𝒪t−1 at−1
  • 10. ! ! optimal trajectory distribution ! p ( 𝒪1:t |τ) 10 p ( 𝒪1:T |τ) = T ∏ t=1 p ( 𝒪t |st, at) = T ∏ t=1 exp (r (st, at)) p (τ| 𝒪1:T) = p ( 𝒪1:T |τ) p (τ) p ( 𝒪1:T) popt(τ) = p (τ| 𝒪1:T) ※ p ( 𝒪1:T = 1) = p ( 𝒪1:T)
  • 11. ! ! ! ! p (τ| 𝒪1:T) ∝ p ( 𝒪1:T |τ) p (τ) 𝒪1:T τ q(τ) q(τ) 11 ̂q = arg min q DKL [q(τ)∥p (τ| 𝒪1:T)] τ 𝒪1:t p (τ| 𝒪1:T) ≈ q(τ) p (τ) p ( 𝒪1:T |τ)
  • 12. ELBO ! ELBO ! ELBO ! ELBO ! q(τ) p(τ) 12 log p ( 𝒪1:T) = log ∫ p ( 𝒪1:T, τ) dτ = log 𝔼q(τ) [ p ( 𝒪1:T, τ) q (τ) ] ≥ 𝔼q(τ) [log p ( 𝒪1:T |τ) + log p (τ) − log q (τ)] = 𝔼q(τ) [ T ∑ t=1 r (st, at) ] − DKL [q(τ)∥p(τ)] =: L(q) τ 𝒪1:t p (τ| 𝒪1:T) ≈ q(τ) p (τ) p ( 𝒪1:T |τ)
  • 13. 1. ! ! ! ! ! control as inference; CAI p (at ∣ st) = 1 | 𝒜| qϕ (at ∣ st) ϕ 13 qϕ(τ) := T ∏ t=1 qϕ (at ∣ st) q (st ∣ st−1, at−1) = T ∏ t=1 qϕ (at ∣ st) p (st ∣ st−1, at−1) p(τ) := T ∏ t=1 p (at ∣ st) p (st ∣ st−1, at−1) = 1 | 𝒜| T ∏ t=1 p (st ∣ st−1, at−1)
  • 14. 1. ! ELBO ! ! 14 L(ϕ) = 𝔼qϕ(τ) [ T ∑ t=1 r (st, at) ] − DKL [qϕ(τ)∥p(τ)] ≥ 𝔼qϕ(τ) [ T ∑ t=1 r (st, at) − log qϕ(at |st) ] = 𝔼qϕ(τ) [ T ∑ t=1 r (st, at) + ℋ (qϕ(at |st))] J(ϕ) := 𝔼qϕ(τ) [ T ∑ t=1 r (st, at) + ℋ (qϕ(at |st))]
  • 15. Soft Actor-Critic ! Soft Actor-Critic SAC [Haarnoja+ 17, 18] ! ELBO off-policy . ! Q ! Q critic actor ! ! Control as Inference https://deeplearning.jp/reinforcement_cource-2020s/  ! Control as Inference https://www.slideshare.net/DeepLearningJP2016/dlcontrol-as-inference-201266247 Qθ (st, at) = r(st, at) + 𝔼p(st+1|st,at) [V(st+1)] Qθ (st, at) qϕ(at |st) 15 Jq t (ϕ) = 𝔼qϕ(at|st)p(st) [ log (qϕ (at |st)) − Qθ (st, at)] JQ t (θ) = 𝔼qϕ(at|st)p(st) [( r (st, at) + 𝔼p(st+1|st,at) [V¯θ (st+1)] − Qθ (st, at)) 2 ] Vθ(st+1) = 𝔼qϕ(at+1|st+1) [Qθ(st+1, at+1) − log qϕ(at+1 |st+1)] Q
  • 16. POMDP ! Control as inference POMDP ! VAE 16 ! SLAC[Lee+ 19] ! RNN ! ! [Han+ 19] ! RNN VRNN[Chung+ 16] ! variational recurrent model VRMat
  • 17. CAI ! CAI ! Mirror descent [Bubeck, 14] => Variational Inference Model Predictive Control VI-MPC [Okada+ 19] ! π 𝒲(π) = 𝔼q(τ)[p(𝒪1:T |τ)] p(𝒪1:T |τ) := f(r(τ)) 17 q(i+1) (π) ← q(i) (π) ⋅ 𝒲 (π) ⋅ q(i) (π) 𝔼q(i)(π) [ 𝒲 (π) ⋅ q(i) (π)] [Okada+ 19]
  • 18. Control as inference ! CAI ! SAC VI-MPC ! amortized [Kingma+ 13] ! [Millidge+ 20] ! amortized 18
  • 19. 2. ! CAI ! ELBO ! ELBO ! => Variational RL p (at ∣ st) q θ 19 pθ(τ) := T ∏ t=1 pθ (at ∣ st) p (st ∣ st−1, at−1) L(θ, q) = 𝔼q(τ) [ T ∑ t=1 r (st, at) ] − DKL [q(τ)∥pθ(τ)]
  • 20. EM ! E ! ! M ! E ELBO ! ! MPO[Abdolmaleki+ 18] V-MPO[Song+ 19] ! M E θ θ = θold θ θ 20 ̂θ = max θ 𝔼q(τ)[log pθ(τ)] = max θ 𝔼q(τ) [ T ∑ t=1 log pθ (at ∣ st) ] q(τ) = pθold (τ| 𝒪1:T) = p ( 𝒪1:T ∣ τ) pθold (τ) ∑τ p ( 𝒪1:T ∣ τ) pθold (τ)
  • 21. MPO E ! Maximum a posteriori Policy Optimization MPO [Abdolmaleki+ 18] ! ! E Q ! Q off-policy ! MPO DL ! https://www.slideshare.net/DeepLearningJP2016/dlhyper-parameter-agnostic-methods-in-reinforcement-learning θold pθold (at ∣ st) ̂Qθold (st, at) 21 q(τ) = T ∏ t=1 q (at ∣ st) p (st ∣ st−1, at−1) q(at |st) ∝ pθold (at ∣ st)exp ̂Qθold (st, at) η η > 0
  • 22. Control as inference Variational RL ! Control as inference ! Variational RL ! 22 τ 𝒪1:T p (τ| 𝒪1:T) ≈ q(τ) p (τ) p ( 𝒪1:T |τ) τ 𝒪1:T pθ (τ| 𝒪1:T) ≈ q(τ) pθ (τ) p ( 𝒪1:T |τ) θ Control as inference Variational RL
  • 26. ! ! ! o s o s 26 p(o, s) = p(o|s)p(s) p(s|o) = p(s)p(o|s) ∑s p(s)p(o|s) 推論 状態 ⽣成 観測 内部モデル (世界モデル)環境 ! " o s
  • 27. ! ! ! Bayesian surprise ! active learning ! ! a o a u(o) = DKL[p(s ∣ o, a)||p(s ∣ a)] I(a) a I(a) a s o I(a) 27 I(a) := ∑ o p(o ∣ a)DKL[p(s ∣ o, a)||p(s ∣ a)] = 𝔼p(o∣a)[u(o)]
  • 28. ! . ! ! o1:T π = [a1, . . . , aT] U(o1:T) = T ∑ t=1 u (ot) 28 I(π) = 𝔼p(o1:T∣π) [U(o1:T)] = ∑ o1:T p(o1:T ∣ π)U(o1:T)
  • 29. ! ! ELBO ! ELBO variational free energy ! free energy principle ! ! q(s) −log p(o) 29 log p(o) ≥ 𝔼q(s) [ log p(o, s) q(s) ] F(o, q) := − 𝔼q(s) [ log p(o, s) q(s) ]
  • 30. ! ! ! ! 1 ! ! ! 2 o −log p(o) q q(s) 30 F(o, q) = − log p(o) + DKL[q(s)||p(s|o)]
  • 31. ! POMDP ! ! ! ! π = [a1, . . . , aT] 31 p(o1:T, s1:T |π) = T ∏ t=1 p(ot |st)p(st |st−1, π) q(s1:T |π) = T ∏ t=1 q(st |π) F(o1:T, π) = − 𝔼q(s1:T|π) [ log p(o1:T, s1:T |π) q(s1:T |π) ] st−1 st st+1 at−1 at at+1 ot−1 ot ot+1 π
  • 32. ! ! expected free energy 32 G(π):= 𝔼p(o1:T ∣ s1:T, π) [F (o1:T, π)] = − 𝔼p(o1:T ∣ s1:T, π) 𝔼q(s1:T |π) [ log p (o1:T, s1:T |π) q (s1:T |π) ] = − 𝔼q(o1:T, s1:T |π) [ log p (o1:T, s1:T |π) q (s1:T |π) ]
  • 33. Active inference ! ! active inference AIF t Gt q(st |ot, π) ≈ p(st |ot, π) 33 Gt(π) = − 𝔼q(ot, st ∣ π) [ log p (ot, st ∣ π) q (st ∣ π) ] ≈ − 𝔼q(ot, st ∣ π) [ log p (ot |π) q (st ∣ ot, π) q (st ∣ π) ] = − 𝔼q(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼q(ot ∣ π) [ DKL [q (st ∣ ot, π)||q (st ∣ π)]]
  • 34. Active inference ! ! 1 ! ! active inference ! ! 1 0 q = p 34 Gt(π) = − 𝔼q(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼q(ot ∣ π) [ DKL [q (st ∣ ot, π)||q (st ∣ π)]] = − 𝔼p(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼p(ot ∣ π) [ DKL [p (st ∣ ot, π)||p (st ∣ π)]] = 𝔼p(st ∣ π) [ ℋ (p (ot ∣ π))] − I(π) ※ p(st |st−1, π) p(st |π)
  • 35. Active inference ! ! 1 ! ! extrinsic value ! 2 ! bayesian surprise ! intrinsic value => 35 −Gt(π) = 𝔼q(ot,st|π) [log p(ot |π)] + 𝔼q(ot|π) [DKL[q(st |ot, π)||q(st |π)]]
  • 37. Control as inference active inference 37
  • 38. active inference ! Active inference AIF [Millidge+ 20] ! ! ! t −Gt(ϕ) 38 ˜p (st, ot, at) = p(st |ot, at)p(at |st)˜p(ot |at) ≈ q(st |ot, at)p(at |st)˜p(ot |at) qϕ(st, at) = qϕ (at ∣ st) q(st) −Gt(ϕ) = 𝔼qϕ(ot, st, at) [ log ˜p (st, ot, at) qϕ (st, at) ] ≈ 𝔼qϕ(ot, st, at) [log ˜p (ot |at) + log p (at |st) + log q(st |ot, at) − log qϕ (at |st) − log q(st)] = 𝔼qϕ(ot, st, at) [log ˜p (ot |at)] − 𝔼qϕ(ot, st, at) [log qϕ (at |st) − log p(at |st)] + 𝔼qϕ(ot, st, at) [log q(st |ot, at) − log q(st)] ≈ 𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] − 𝔼q(st) [ DKL (qϕ (at ∣ st) ∥p (at ∣ st))] + 𝔼q(ot, at ∣ st) [ DKL (q (st ∣ ot, at) ∥q (st ∣ at))] = 𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] + 𝔼q(st) [ ℋ (qϕ (at ∣ st))] + 𝔼q(ot, at ∣ st) [ DKL (q (st ∣ ot, at) ∥q (st ∣ at))] p (at ∣ st) = 1 | 𝒜|
  • 39. AIF CAI ! CAI ! AIF ! 1 ! 2 ! AIF ! AIF 3 ! CAI AIF ! 39 𝔼q(st,at) [log p ( 𝒪t |st, at)] + 𝔼q(st) [ ℋ (qϕ(at |st))] 𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] + 𝔼q(st) [ ℋ (qϕ (at ∣ st))] + 𝔼q(ot, at ∣ st) [ DKL (q (st ∣ ot, at) ∥q (st ∣ at))]
  • 40. Likelihood-AIF ! AIF CAI Likelihood-AIF ! ! CAI ˜p(ot) ˜p(ot |st) −Gt q(st) = p(st) p (at ∣ st) = 1 | 𝒜| 40 −Gt(ϕ) = 𝔼qϕ(ot, st, at) [ log ˜p (st, ot, at) qϕ (st, at) ] = 𝔼qϕ(ot, st, at) [log ˜p (ot ∣ st) + log p (st) + log p (at ∣ st) − log qϕ (at ∣ st) − log q (st)] = 𝔼qϕ(st, at) [log ˜p (ot ∣ st)] − DKL (q (st)||p (st)) − 𝔼q(st) [ DKL (qϕ (at ∣ st)||p (at ∣ st))] −Gt(ϕ) = 𝔼qϕ(st, at) [log ˜p (ot |st)] + 𝔼q(st) [ ℋ (qϕ (at ∣ st))]
  • 41. Likelihood-AIF CAI ! CAI ! Likelihood-AIF ! 2 ! AIF POMDP MDP CAI 1 ! CAI ! 2 log ˜p (ot ∣ st) = log p ( 𝒪t |st, at) 41 𝔼qϕ(st,at) [log p ( 𝒪t |st, at)] + 𝔼q(st) [ ℋ (qϕ(at |st))] 𝔼qϕ(st, at) [log ˜p (ot |st)] + 𝔼q(st) [ ℋ (qϕ (at ∣ st))]
  • 42. CAI AIF ! CAI ! ! ! ! ! AIF ! ! ! 42
  • 43. ! 1. ! Control as inference ! Amortized ! Variational RL 2. ! active inference ! ! ! 43