Publicidad

Más contenido relacionado

Presentaciones para ti(20)

Más de Deep Learning JP(20)

Publicidad

Último(20)

【DL輪読会】Mastering Diverse Domains through World Models

  1. Mastering Diverse Domains through World Models Shohei Taniguchi, Matsuo Lab
  2. ॻࢽ৘ใ Mastering Diverse Domains through World Models • ஶऀ • Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap • ֓ཁ • ੈքϞσϧΛ࢖ͬͨ‫ڧ‬Խֶशख๏Dreamerͷվળ൛ (ver. 3) • εΫϥονͷ‫ڧ‬ԽֶशͰॳΊͯMinecraftͰμΠϠϞϯυΛͱΔ͜ͱʹ੒ޭ https://arxiv.org/abs/2301.04104 2
  3. Minecraft ObtainDiamond • MinecraftͰμΠϠϞϯυΛͱΔλεΫ • ใु͸ɼதؒΞΠςϜ͔μΠϠΛͱͬͨͱ͖ͷΈಘΒΕΔ • NeurIPSͰ2019೥͔Βίϯϖ͕ߦΘΕ͓ͯΓɼRL‫ڀݚ‬ͷ1ͭϚΠϧετʔϯ • ͜Ε·ͰεΫϥονͷRLͰμΠϠ֫ಘ·Ͱ ੒ޭͨ͠ྫ͸ͳ͠ • ਓؒͷσϞΛ࢖͏ख๏Ͱͷ੒ޭྫ͸͋Γ
  4. ൃද֓ཁ • લఏ஌ࣝ • ੈքϞσϧ x ‫ڧ‬Խֶश • PlaNet, Dreamer, DreamerV2 • DreamerV3 • ·ͱΊ εϥΠυͷҰ෦ΛҎԼ͔Βྲྀ༻͍ͯ͠·͢ https://www.slideshare.net/ShoheiTaniguchi2/ss-238325780 4
  5. ‫ڧ‬Խֶशͷ՝୊ αϯϓϧޮ཰ • ֶशʹେྔͷ͕͔͔࣌ؒΔ • ϩϘοτͳͲ͸ͦΜͳʹසൟʹ࣮‫ֶͰػ‬शͤ͞Δͷ͸ίετతʹ‫͍͠ݫ‬ 5
  6. ੈքϞσϧ x ‫ڧ‬Խֶश ‫ڥ؀‬ͷϞσϧΛਂ૚ֶशͰ֫ಘͰ͖Ε͹ ͦͷϞσϧ಺Ͱ‫ڥ؀‬ΛγϛϡϨʔτͯ͠ ํࡦΛֶशͰ͖Δ͸ͣ ➡ ੈքϞσϧ 6
  7. ੈքϞσϧ x ‫ڧ‬Խֶश ֶशͷྲྀΕ 1. ํࡦ Ͱ‫͔ڥ؀‬Βσʔλ ΛूΊΔ 2. Λ༻͍ͯੈքϞσϧ Λֶश 3. ੈքϞσϧΛ༻͍ͯํࡦ Λߋ৽ • 1 ~ 3Λ‫܁‬Γฦ͢ π D D = {x1, a1, r1, …, xT, aT, rT} D pψ pψ (x1:T, r1:T ∣ a1:T) π https://arxiv.org/abs/1903.00374 7
  8. World Models [Ha and Schmidhuber,2018] • ੈքϞσϧ‫ܥ‬ͷ‫ڀݚ‬ͷ૸Γͱ͍͑Δ࿦จ • ੈքϞσϧͷֶशɿVAE + MDN-RNN • ํࡦͷֶशɿCMA-ES • ࠓճ͸ৄ͍͠಺༰͸ׂѪ͠·͢ ʢҎԼͷεϥΠυͳͲΛࢀরʣ https://www.slideshare.net/masa_s/ss-97848402 https://worldmodels.github.io/ https://arxiv.org/abs/1803.10122 8
  9. PlaNet [Hafner,et al.,2019] • ੈքϞσϧͷֶशɿ • Recurrent State Space Model • ํࡦͷֶशɿCEM • ϞσϧϑϦʔͱ΄΅ಉ౳ͷੑೳ ্ɿ࣮‫Ͱڥ؀‬ͷϩʔϧΞ΢τ ԼɿੈքϞσϧʹΑΔγϛϡϨʔγϣϯ DM Control SuiteͰͷ࣮‫݁ݧ‬Ռ https://arxiv.org/abs/1811.04551 https://planetrl.github.io/ 9
  10. Ψ΢ε‫ܕ‬ঢ়ଶۭؒϞσϧ Gaussian State Space Model • ঢ়ଶભҠ֬཰ʹਖ਼‫ن‬෼෍Λ࢖͏Ϟσϧ • • ؔ਺ ʹ͸DNNͳͲΛ༻͍Δ • ͜Εͩͱ࣮‫ݧ‬తʹ͏·͍͔͘ͳ͍ʢޯ഑ফࣦͳͲʣ pψ (st+1 ∣ st, at) = Normal (μψ (st, at), diag (σ2 ψ (st, at))) μψ, σ2 ψ ot at rt st ot+1 at+1 rt+1 st+1 10
  11. ࠶‫ؼ‬తঢ়ଶۭؒϞσϧ Recurrent State Space Model (RSSM) • ঢ়ଶ Λܾఆ࿦తʹભҠ͢Δ ͱ ֬཰తʹભҠ͢Δ ʹ෼͚ͯϞσϧԽ͢Δ • ͸LSTMͳͲͷRNN‫ܕ‬ͷؔ਺ s h z ht+1 = fψ (ht, st, at) pψ (st ∣ ht) = Normal (μψ (ht), diag (σ2 ψ (ht))) fψ xt at rt st xt+1 at+1 rt+1 st+1 ht ht+1 11
  12. RSSMΛ࢖͏ͱ͔ͳΓੑೳ্͕͕Δ ࠶‫ؼ‬తঢ়ଶۭؒϞσϧ Recurrent State Space Model (RSSM) 12
  13. Dreamer [Hafner,et al.,2019] • PlaNetΛϕʔεʹͯ͠ɺ ํࡦͷֶशΛActor-Critic‫ʹܕ‬มߋ • Ձ஋ؔ਺ʹ ऩӹΛ༻͍Δ • PlaNet͔Βੑೳ͕େ෯ʹվળ λ https://arxiv.org/abs/1912.01603 https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html 13
  14. Ձ஋ؔ਺ͷਪఆ ϕϧϚϯํఔࣜ εςοϓʹ֦ு͢Δͱ Vπ (st) = 𝔼 π [r (st, at)] + Vπ (st+1) n Vπ n (st) = 𝔼 π [ n−1 ∑ k=1 r (st+k, at+k) ] + Vπ (st+n) 14
  15. Ձ஋ؔ਺ͷਪఆ Ͱࢦ਺ฏ‫ۉ‬ΛͱΔͱ ͜ΕΛ ऩӹͱ‫Ϳݺ‬ Vπ n (st) = 𝔼 π [ n−1 ∑ k=1 r (st+k, at+k) ] + Vπ (st+n) n = 1,…, ∞ V̄π (st, λ) = (1 − λ) ∞ ∑ n=1 λn−1 Vπ n (st) λ 15
  16. Ձ஋ؔ਺ͷਪఆ DreamerͰ͸ɺ ऩӹΛՁ஋ؔ਺ͷλʔήοτͱ͢Δ ͨͩ͠ɺࢦ਺ฏ‫ۉ‬ͷ࿨͸ద౰ͳେ͖͞ʢ ͱ͢ΔʣͰଧͪ੾Δ λ θ ← θ − ηθ ∇θ 𝔼 pψ,πϕ [ V πϕ θ (st) − V̄π (st, λ) 2] H V̄π (st, λ) ≈ (1 − λ) H−1 ∑ n=1 λn−1 Vπ n (st) + λH−1 Vπ H (st) 16
  17. ऩӹͷޮՌ λ No value͸ํࡦޯ഑๏Ͱֶशͨ͠৔߹ͷ݁Ռ ऩӹΛ༻͍Δ͜ͱͰɺ ʹґΒͣੑೳ͕վળ λ H 17
  18. DreamerV2 [Hafner,et al.,2020] Dreamerͷվྑ൛ 1. જࡏม਺ʹ཭ࢄͳΧςΰϦΧϧ෼෍Λ࢖͏ 2. Τϯίʔμ͕ա౓ʹਖ਼ଇԽ͞Εͳ͍Α͏ʹ KL߲ͷֶश཰Λௐ੔͢Δ • AtariͰਓؒϨϕϧͷੑೳΛୡ੒ 18
  19. ཭ࢄજࡏม਺ • PlaNet΍DreamerV1Ͱ͸ɼ࿈ଓతͳજࡏม਺Λ࢖͍ɼਖ਼‫ن‬෼෍ͰϞσϧԽ • DreamerV2Ͱ͸ɼ཭ࢄͳΧςΰϦΧϧ෼෍ʹมߋ 19
  20. ཭ࢄજࡏม਺ • ཭ࢄʹͨ͜͠ͱͰɼޯ഑ͷਪఆʹreparameterization trick͸࢖͑ͳ͘ͳΔ • ୅ΘΓʹstraight-through estimatorͰਪఆ • ਪఆྔʹόΠΞε͕৐Δ͕ɼ࣮૷͕؆୯ 20
  21. KL Balancing • ੈքϞσϧͷϩεʹ͓͍ͯɼKL߲͸encoderͱભҠϞσϧͷpriorΛ͚ۙͮΔ ਖ਼ଇԽͷ໾ׂΛ͢Δ • ͔͠͠ɼಛʹֶशॳ‫ʹظ‬ભҠϞσϧ͕े෼ʹֶशͰ͖͍ͯͳ͍ঢ়ଶͩͱ ͜ͷKLਖ਼ଇԽ͕‫ͳ͘ڧ‬Γֶ͗ͯ͢शͷ๦͛ʹͳΔ 21
  22. KL Balancing • EncoderͱભҠϞσϧͷKL߲ʹ͍ͭͯͷֶश཰Λௐ੔͢Δ͜ͱͰܰ‫ݮ‬ • ͸0.8ʹઃఆ α 22
  23. ࣮‫ݧ‬ • AtariͰਓؒ௒͑ • ϞσϧϑϦʔͷDQN, RainbowͳͲΑΓ΋‫͍ڧ‬ 23
  24. ࣮‫ݧ‬ Ablation • ΧςΰϦΧϧม਺΍KL balancingͷޮՌ΋͔ͳΓେ͖͍ 24
  25. DreamerV3 25
  26. DreamerV3 • DreamerV2ΛΑΓ൚༻తʹ࢖͑Δख๏ʹ͢ΔͨΊʹ͍͔ͭ͘޻෉Λ௥Ճ • υϝΠϯ͕มΘͬͯ΋ৗʹಉ͡ϋΠύϥͰֶशͰ͖ΔΑ͏ʹ 1. ‫؍‬ଌ΍ใुͷ஋Λsymlogؔ਺Ͱม‫͢׵‬Δ 2. Actorͷ໨తؔ਺Ͱ͸ ऩӹͷ஋Λਖ਼‫ن‬Խ͢Δ λ 26
  27. Symlog Prediction • υϝΠϯ͕มΘΔͱɼ‫؍‬ଌ΍ใुͷ஋ͷεέʔϧ͕มΘΔͷͰɼ ஞҰϋΠύϥΛௐ੔͢Δඞཁ͕͋Δ • ͦΕΛ͠ͳ͍͍ͯ͘Α͏ʹɼsymlogؔ਺Λ͔͚Δ͜ͱͰ஋Λ͋Δఔ౓ἧ͑Δ • Մ‫ͳ਺ؔͳٯ‬ͷͰɼ‫਺ؔٯ‬Λ͔͚Ε͹‫ݩ‬ͷ஋ʹ໭ͤΔ 27
  28. ऩӹͷਖ਼‫ن‬Խ λ • Τϯτϩϐʔਖ਼ଇԽ෇͖ͰactorΛֶश͢Δ৔߹ɼͦͷ܎਺ͷνϡʔχϯά͸ ใुͷεέʔϧ΍εύʔεੑʹґଘ͢ΔͷͰ೉͍͠ • ͏·͘ใुͷ஋Λਖ਼‫ن‬ԽͰ͖Ε͹ɼυϝΠϯʹΑΒͣΤϯτϩϐʔ߲ͷ܎਺Λ ‫ݻ‬ఆͰ͖Δ͸ͣ 28
  29. ऩӹͷਖ਼‫ن‬Խ λ • ऩӹΛ5ʙ95%෼Ґ਺ͷ෯Ͱਖ਼‫ن‬Խ͢Δ • ୯७ʹ෼ࢄͰਖ਼‫ن‬Խ͢Δͱɼใु͕εύʔεͳͱ͖ʹɼऩӹ͕աେධՁ͞Εͯ ͠·͏ͷͰɼ֎Ε஋Λ஄͚ΔΑ͏ʹ͜ͷ‫͢ʹܗ‬Δ 29
  30. ࣮‫ݧ‬ • ͢΂ͯͷυϝΠϯɾλεΫͰಉ͡ϋΠύϥͰߴ͍ੑೳ͕ग़ͤΔ 30
  31. ࣮‫ݧ‬ • ϞσϧͷαΠζʹΑͬͯੑೳ͕εέʔϧ͢Δ͜ͱ΋֬ೝ 31
  32. ࣮‫ݧ‬ ੈքϞσϧʹΑΔະདྷ༧ଌ 32
  33. ࣮‫ݧ‬ • MinecraftͰॳΊͯRL agent͕μΠϠϞϯυΛͱΔ͜ͱʹ੒ޭ 33
  34. ·ͱΊ • ੈքϞσϧͷ୅දతͳख๏DreamerͷൃలΛղઆ • V3ʹؔͯ͠͸ਖ਼௚ώϡʔϦεςΟοΫͷմ‫ײ‬͸൱Ίͳ͍ • ݁Ռ͸͍͢͝ 34
Publicidad