Inicio
Explorar
Enviar búsqueda
Cargar
Iniciar sesión
Registrarse
Publicidad
Check these out next
【DL輪読会】Transformers are Sample Efficient World Models
Deep Learning JP
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
Deep Learning JP
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
Shota Imai
近年のHierarchical Vision Transformer
Yusuke Uchida
Transformer メタサーベイ
cvpaper. challenge
[DL輪読会]When Does Label Smoothing Help?
Deep Learning JP
【DL輪読会】Scaling Laws for Neural Language Models
Deep Learning JP
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
Deep Learning JP
1
de
34
Top clipped slide
【DL輪読会】Mastering Diverse Domains through World Models
16 de Jan de 2023
•
0 recomendaciones
0 recomendaciones
×
Sé el primero en que te guste
ver más
•
580 vistas
vistas
×
Total de vistas
0
En Slideshare
0
De embebidos
0
Número de embebidos
0
Descargar ahora
Descargar para leer sin conexión
Denunciar
Tecnología
2023/1/13 Deep Learning JP http://deeplearning.jp/seminar-2/
Deep Learning JP
Seguir
Deep Learning JP
Publicidad
Publicidad
Publicidad
Recomendados
MASTERING ATARI WITH DISCRETE WORLD MODELS (DreamerV2)
harmonylab
2.3K vistas
•
47 diapositivas
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
Deep Learning JP
559 vistas
•
15 diapositivas
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
Deep Learning JP
3.7K vistas
•
25 diapositivas
【DL輪読会】A Path Towards Autonomous Machine Intelligence
Deep Learning JP
13.2K vistas
•
36 diapositivas
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
Deep Learning JP
2.4K vistas
•
25 diapositivas
[DL輪読会]World Models
Deep Learning JP
5.8K vistas
•
30 diapositivas
Más contenido relacionado
Presentaciones para ti
(20)
【DL輪読会】Transformers are Sample Efficient World Models
Deep Learning JP
•
608 vistas
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
Deep Learning JP
•
3.8K vistas
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
Shota Imai
•
1.7K vistas
近年のHierarchical Vision Transformer
Yusuke Uchida
•
12.1K vistas
Transformer メタサーベイ
cvpaper. challenge
•
25.8K vistas
[DL輪読会]When Does Label Smoothing Help?
Deep Learning JP
•
10.5K vistas
【DL輪読会】Scaling Laws for Neural Language Models
Deep Learning JP
•
3K vistas
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
Deep Learning JP
•
2.1K vistas
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
Deep Learning JP
•
2.3K vistas
[DL輪読会]相互情報量最大化による表現学習
Deep Learning JP
•
7.1K vistas
[DL輪読会]GQNと関連研究,世界モデルとの関係について
Deep Learning JP
•
8.6K vistas
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
Deep Learning JP
•
4.2K vistas
画像生成・生成モデル メタサーベイ
cvpaper. challenge
•
7.8K vistas
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
Deep Learning JP
•
816 vistas
最近のDeep Learning (NLP) 界隈におけるAttention事情
Yuta Kikuchi
•
71.7K vistas
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
Deep Learning JP
•
1.4K vistas
【DL輪読会】Is Conditional Generative Modeling All You Need For Decision-Making?
Deep Learning JP
•
244 vistas
Transformer 動向調査 in 画像認識(修正版)
Kazuki Maeno
•
1.5K vistas
【メタサーベイ】Vision and Language のトップ研究室/研究者
cvpaper. challenge
•
1.6K vistas
[DL輪読会]逆強化学習とGANs
Deep Learning JP
•
8.4K vistas
Más de Deep Learning JP
(20)
【DL輪読会】大量API・ツールの扱いに特化したLLM
Deep Learning JP
•
40 vistas
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
Deep Learning JP
•
24 vistas
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
Deep Learning JP
•
52 vistas
【DL輪読会】Egocentric Video Task Translation (CVPR 2023 Highlight)
Deep Learning JP
•
59 vistas
【DL輪読会】Flow Matching for Generative Modeling
Deep Learning JP
•
766 vistas
【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...
Deep Learning JP
•
1.3K vistas
【DL輪読会】GPT-4Technical Report
Deep Learning JP
•
889 vistas
【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...
Deep Learning JP
•
232 vistas
【DL輪読会】Reward Design with Language Models
Deep Learning JP
•
605 vistas
【DL輪読会】Foundation Models for Decision Making: Problems, Methods, and Opportun...
Deep Learning JP
•
281 vistas
【DL輪読会】One-Shot Domain Adaptive and Generalizable Semantic Segmentation with ...
Deep Learning JP
•
215 vistas
【DL輪読会】DiffRF: Rendering-guided 3D Radiance Field Diffusion [N. Muller+ CVPR2...
Deep Learning JP
•
359 vistas
【DL輪読会】Hyena Hierarchy: Towards Larger Convolutional Language Models
Deep Learning JP
•
811 vistas
【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...
Deep Learning JP
•
1.1K vistas
【DL輪読会】Segment Anything
Deep Learning JP
•
1.8K vistas
【DL輪読会】Decoupling Human and Camera Motion from Videos in the Wild (CVPR2023)
Deep Learning JP
•
417 vistas
【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH
Deep Learning JP
•
242 vistas
【DL輪読会】Bridge-Prompt: Toward Ordinal Action Understanding in Instructional Vi...
Deep Learning JP
•
320 vistas
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
Deep Learning JP
•
336 vistas
【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Deep Learning JP
•
458 vistas
Publicidad
Último
(20)
如何办理一份高仿伦敦南岸大学毕业证成绩单?
aazepp
•
3 vistas
Configure Network Services.pptx
YanaDangle
•
0 vistas
Do Reinvent the Wheel - Nov 2021 - DigiNext.pdf
Hamidreza Soleimani
•
0 vistas
jenkins.pptx
Orco1
•
0 vistas
Office 365 licenses
Princy Nadar
•
0 vistas
【本科生、研究生】美国德鲁大学毕业证文凭购买指南
sutseu
•
0 vistas
NS-CUK Joint Jouarl Club: JHLee, Review on "GraphMAE: Self-Supervised Masked...
ssuser4b1f48
•
0 vistas
【本科生、研究生】英国卡迪夫大学毕业证文凭购买指南
sutseu
•
0 vistas
Unit 5.pdf
BALASHANMUGAVADIVUPM
•
0 vistas
Pill Camera.pptx
Md Refatul Amin Refat
•
0 vistas
#9 Calicut MuleSoft Meetup - Munits in Mule 4.pptx
AnoopRamachandran13
•
0 vistas
如何办理一份高仿东伦敦大学毕业证成绩单?
aazepp
•
3 vistas
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
ssuser4b1f48
•
0 vistas
Raspberry pi presentation.pptx
FrankAnthonyChin
•
0 vistas
Introduction to Virtualization.pptx
latifdhalait
•
0 vistas
zkStudyClub - cqlin: Efficient linear operations on KZG commitments
Alex Pruden
•
0 vistas
【本科生、研究生】英国克兰菲尔德大学毕业证文凭购买指南
akuufux
•
0 vistas
如何办理一份高仿纽约州立大学宾汉姆顿分校毕业证成绩单?
aazepp
•
0 vistas
Swarm Intelligence Applications in Unmanned Aerial Vehicles.pdf
AswathiM28
•
0 vistas
【本科生、研究生】英国埃克塞特大学毕业证文凭购买指南
akuufux
•
0 vistas
【DL輪読会】Mastering Diverse Domains through World Models
Mastering Diverse Domains
through World Models Shohei Taniguchi, Matsuo Lab
ॻࢽใ Mastering Diverse Domains
through World Models • ஶऀ • Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap • ֓ཁ • ੈքϞσϧΛͬͨڧԽֶशख๏Dreamerͷվળ൛ (ver. 3) • εΫϥονͷڧԽֶशͰॳΊͯMinecraftͰμΠϠϞϯυΛͱΔ͜ͱʹޭ https://arxiv.org/abs/2301.04104 2
Minecraft ObtainDiamond • MinecraftͰμΠϠϞϯυΛͱΔλεΫ •
ใुɼதؒΞΠςϜ͔μΠϠΛͱͬͨͱ͖ͷΈಘΒΕΔ • NeurIPSͰ2019͔Βίϯϖ͕ߦΘΕ͓ͯΓɼRLڀݚͷ1ͭϚΠϧετʔϯ • ͜Ε·ͰεΫϥονͷRLͰμΠϠ֫ಘ·Ͱ ޭͨ͠ྫͳ͠ • ਓؒͷσϞΛ͏ख๏Ͱͷޭྫ͋Γ
ൃද֓ཁ • લఏࣝ • ੈքϞσϧ
x ڧԽֶश • PlaNet, Dreamer, DreamerV2 • DreamerV3 • ·ͱΊ εϥΠυͷҰ෦ΛҎԼ͔Βྲྀ༻͍ͯ͠·͢ https://www.slideshare.net/ShoheiTaniguchi2/ss-238325780 4
ڧԽֶशͷ՝ αϯϓϧޮ • ֶशʹେྔͷ͕͔͔࣌ؒΔ • ϩϘοτͳͲͦΜͳʹසൟʹֶ࣮Ͱػशͤ͞Δͷίετతʹ͍͠ݫ 5
ੈքϞσϧ x ڧԽֶश ڥͷϞσϧΛਂֶशͰ֫ಘͰ͖Ε ͦͷϞσϧͰڥΛγϛϡϨʔτͯ͠ ํࡦΛֶशͰ͖Δͣ ➡
ੈքϞσϧ 6
ੈքϞσϧ x ڧԽֶश ֶशͷྲྀΕ 1.
ํࡦ Ͱ͔ڥΒσʔλ ΛूΊΔ 2. Λ༻͍ͯੈքϞσϧ Λֶश 3. ੈքϞσϧΛ༻͍ͯํࡦ Λߋ৽ • 1 ~ 3Λ܁Γฦ͢ π D D = {x1, a1, r1, …, xT, aT, rT} D pψ pψ (x1:T, r1:T ∣ a1:T) π https://arxiv.org/abs/1903.00374 7
World Models [Ha and
Schmidhuber,2018] • ੈքϞσϧܥͷڀݚͷΓͱ͍͑Δจ • ੈքϞσϧͷֶशɿVAE + MDN-RNN • ํࡦͷֶशɿCMA-ES • ࠓճৄ͍͠༰ׂѪ͠·͢ ʢҎԼͷεϥΠυͳͲΛࢀরʣ https://www.slideshare.net/masa_s/ss-97848402 https://worldmodels.github.io/ https://arxiv.org/abs/1803.10122 8
PlaNet [Hafner,et al.,2019] • ੈքϞσϧͷֶशɿ •
Recurrent State Space Model • ํࡦͷֶशɿCEM • ϞσϧϑϦʔͱ΄΅ಉͷੑೳ ্ɿ࣮ͰڥͷϩʔϧΞτ ԼɿੈքϞσϧʹΑΔγϛϡϨʔγϣϯ DM Control SuiteͰͷ࣮݁ݧՌ https://arxiv.org/abs/1811.04551 https://planetrl.github.io/ 9
Ψεܕঢ়ଶۭؒϞσϧ Gaussian State Space
Model • ঢ়ଶભҠ֬ʹਖ਼نΛ͏Ϟσϧ • • ؔ ʹDNNͳͲΛ༻͍Δ • ͜Εͩͱ࣮ݧతʹ͏·͍͔͘ͳ͍ʢޯফࣦͳͲʣ pψ (st+1 ∣ st, at) = Normal (μψ (st, at), diag (σ2 ψ (st, at))) μψ, σ2 ψ ot at rt st ot+1 at+1 rt+1 st+1 10
࠶ؼతঢ়ଶۭؒϞσϧ Recurrent State Space
Model (RSSM) • ঢ়ଶ ΛܾఆతʹભҠ͢Δ ͱ ֬తʹભҠ͢Δ ʹ͚ͯϞσϧԽ͢Δ • LSTMͳͲͷRNNܕͷؔ s h z ht+1 = fψ (ht, st, at) pψ (st ∣ ht) = Normal (μψ (ht), diag (σ2 ψ (ht))) fψ xt at rt st xt+1 at+1 rt+1 st+1 ht ht+1 11
RSSMΛ͏ͱ͔ͳΓੑೳ্͕͕Δ ࠶ؼతঢ়ଶۭؒϞσϧ Recurrent State Space
Model (RSSM) 12
Dreamer [Hafner,et al.,2019] • PlaNetΛϕʔεʹͯ͠ɺ ํࡦͷֶशΛActor-Criticʹܕมߋ •
Ձؔʹ ऩӹΛ༻͍Δ • PlaNet͔Βੑೳ͕େ෯ʹվળ λ https://arxiv.org/abs/1912.01603 https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html 13
Ձؔͷਪఆ ϕϧϚϯํఔࣜ εςοϓʹ֦ு͢Δͱ Vπ (st) = 𝔼 π [r
(st, at)] + Vπ (st+1) n Vπ n (st) = 𝔼 π [ n−1 ∑ k=1 r (st+k, at+k) ] + Vπ (st+n) 14
Ձؔͷਪఆ ͰࢦฏۉΛͱΔͱ ͜ΕΛ ऩӹͱͿݺ Vπ n (st)
= 𝔼 π [ n−1 ∑ k=1 r (st+k, at+k) ] + Vπ (st+n) n = 1,…, ∞ V̄π (st, λ) = (1 − λ) ∞ ∑ n=1 λn−1 Vπ n (st) λ 15
Ձؔͷਪఆ DreamerͰɺ ऩӹΛՁؔͷλʔήοτͱ͢Δ ͨͩ͠ɺࢦฏۉͷదͳେ͖͞ʢ ͱ͢ΔʣͰଧͪΔ λ θ
← θ − ηθ ∇θ 𝔼 pψ,πϕ [ V πϕ θ (st) − V̄π (st, λ) 2] H V̄π (st, λ) ≈ (1 − λ) H−1 ∑ n=1 λn−1 Vπ n (st) + λH−1 Vπ H (st) 16
ऩӹͷޮՌ λ No valueํࡦޯ๏Ͱֶशͨ͠߹ͷ݁Ռ ऩӹΛ༻͍Δ͜ͱͰɺ ʹґΒͣੑೳ͕վળ λ
H 17
DreamerV2 [Hafner,et al.,2020] Dreamerͷվྑ൛ 1. જࡏมʹࢄͳΧςΰϦΧϧΛ͏ 2.
Τϯίʔμ͕աʹਖ਼ଇԽ͞Εͳ͍Α͏ʹ KL߲ͷֶशΛௐ͢Δ • AtariͰਓؒϨϕϧͷੑೳΛୡ 18
ࢄજࡏม • PlaNetDreamerV1Ͱɼ࿈ଓతͳજࡏมΛ͍ɼਖ਼نͰϞσϧԽ • DreamerV2ͰɼࢄͳΧςΰϦΧϧʹมߋ 19
ࢄજࡏม • ࢄʹͨ͜͠ͱͰɼޯͷਪఆʹreparameterization trick͑ͳ͘ͳΔ •
ΘΓʹstraight-through estimatorͰਪఆ • ਪఆྔʹόΠΞε͕Δ͕ɼ࣮͕؆୯ 20
KL Balancing • ੈքϞσϧͷϩεʹ͓͍ͯɼKL߲encoderͱભҠϞσϧͷpriorΛ͚ۙͮΔ ਖ਼ଇԽͷׂΛ͢Δ •
͔͠͠ɼಛʹֶशॳʹظભҠϞσϧ͕ेʹֶशͰ͖͍ͯͳ͍ঢ়ଶͩͱ ͜ͷKLਖ਼ଇԽ͕ͳ͘ڧΓֶ͗ͯ͢शͷ͛ʹͳΔ 21
KL Balancing • EncoderͱભҠϞσϧͷKL߲ʹ͍ͭͯͷֶशΛௐ͢Δ͜ͱͰܰݮ •
0.8ʹઃఆ α 22
࣮ݧ • AtariͰਓؒ͑ • ϞσϧϑϦʔͷDQN,
RainbowͳͲΑΓ͍ڧ 23
࣮ݧ Ablation • ΧςΰϦΧϧมKL balancingͷޮՌ͔ͳΓେ͖͍ 24
DreamerV3 25
DreamerV3 • DreamerV2ΛΑΓ൚༻తʹ͑Δख๏ʹ͢ΔͨΊʹ͍͔ͭ͘ΛՃ • υϝΠϯ͕มΘͬͯৗʹಉ͡ϋΠύϥͰֶशͰ͖ΔΑ͏ʹ 1.
؍ଌใुͷΛsymlogؔͰม͢Δ 2. ActorͷతؔͰ ऩӹͷΛਖ਼نԽ͢Δ λ 26
Symlog Prediction • υϝΠϯ͕มΘΔͱɼ؍ଌใुͷͷεέʔϧ͕มΘΔͷͰɼ ஞҰϋΠύϥΛௐ͢Δඞཁ͕͋Δ •
ͦΕΛ͠ͳ͍͍ͯ͘Α͏ʹɼsymlogؔΛ͔͚Δ͜ͱͰΛ͋Δఔἧ͑Δ • ՄͳؔͳٯͷͰɼؔٯΛ͔͚ΕݩͷʹͤΔ 27
ऩӹͷਖ਼نԽ λ • Τϯτϩϐʔਖ਼ଇԽ͖ͰactorΛֶश͢Δ߹ɼͦͷͷνϡʔχϯά ใुͷεέʔϧεύʔεੑʹґଘ͢ΔͷͰ͍͠ • ͏·͘ใुͷΛਖ਼نԽͰ͖ΕɼυϝΠϯʹΑΒͣΤϯτϩϐʔ߲ͷΛ ݻఆͰ͖Δͣ 28
ऩӹͷਖ਼نԽ λ • ऩӹΛ5ʙ95%Ґͷ෯Ͱਖ਼نԽ͢Δ • ୯७ʹࢄͰਖ਼نԽ͢Δͱɼใु͕εύʔεͳͱ͖ʹɼऩӹ͕աେධՁ͞Εͯ ͠·͏ͷͰɼ֎ΕΛ͚ΔΑ͏ʹ͜ͷ͢ʹܗΔ 29
࣮ݧ • ͯ͢ͷυϝΠϯɾλεΫͰಉ͡ϋΠύϥͰߴ͍ੑೳ͕ग़ͤΔ 30
࣮ݧ • ϞσϧͷαΠζʹΑͬͯੑೳ͕εέʔϧ͢Δ͜ͱ֬ೝ 31
࣮ݧ ੈքϞσϧʹΑΔະདྷ༧ଌ 32
࣮ݧ • MinecraftͰॳΊͯRL agent͕μΠϠϞϯυΛͱΔ͜ͱʹޭ 33
·ͱΊ • ੈքϞσϧͷදతͳख๏DreamerͷൃలΛղઆ • V3ʹؔͯ͠ਖ਼ώϡʔϦεςΟοΫͷմײ൱Ίͳ͍ •
݁Ռ͍͢͝ 34
Publicidad