SlideShare una empresa de Scribd logo
1 de 25
1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Deep Dynamics Models for Learning Dexterous
Manipulation(PDDM)
Keno Harada, UT, B3
書誌情報
● 著者情報:
○ Anusha Nagabandi, Kurt Konoglie, Sergey Levine, Vikash Kumar
○ Google Brain
● 論文リンク: https://arxiv.org/pdf/1909.11652.pdf(CoRL 2019?)
● Blog:
○ Google: https://sites.google.com/view/pddm/
○ BAIR: https://bair.berkeley.edu/blog/2019/09/30/deep-dynamics/
● CS285(http://rail.eecs.berkeley.edu/deeprlcourse/)のLecture10, 11で
PDDMに関係する技術の詳しい解説がなされています
2
デモ
gif from
https://sites.google.co
m/view/pddm/
3
研究概要
● 複数本の指でのdexterous manipulation task 難しい
○ 複数の方向から同時に対象物体に力を及ぼすことが可能でないと達成が難
しい
○ 多数の関節を制御し複雑な力を与える必要性
○ 接触が生じたり, 消えたりが繰り返されるため, 正確な物理モデルが必要と
される解析的な手法では難しい -> 学習ベースに成功の可能性が
● モデルベース強化学習
○ 環境のダイナミクスを学習する
○ 必要となるデータ数はmodel-freeより少ないため実用的
○ dexterous manipulation taskのような難しいタスクへの適用はまだあまり
なされていない
4
研究概要
● Online planning with deep dynamics models(PDDM)
○ Model Predictive Control
■ Neural network dynamics for modelbased deep reinforcement learning
with model-free fine-tuning(https://arxiv.org/pdf/1708.02596.pdf)
○ Ensembles for model uncertainty estimation
■ Deep Reinforcement Learning in a Handful of Trials using Probabilistic
Dynamics Models(https://papers.nips.cc/paper/7725-deep-
reinforcement-learning-in-a-handful-of-trials-using-probabilistic-
dynamics-models.pdf)
● 一言で言うと: 不確実性を考慮に入れたダイナミクスの予測をブートストラッ
プアンサンブルで行い,行動の選択をMPCによって行う
● 個々の手法は既存のものだが,組み合わせは新しく, 肝だとしている 5
アウトライン
● Learning the Dynamics
○ モデルベース強化学習の課題
○ 不確実性の考慮
○ ブートストラップアンサンブル
● Model Predictive Control
○ Random Shooting
○ Iterative Random-Shooting with Refinement
○ Filtering and Reward-Weighted Refinement
● PDDM
● 実験結果
6
Learning the Dynamics
モデルベース強化学習の課題
● モデルフリーの手法に比べてパフォーマンス劣る
○ モデルベースは学習されたモデルを基にPlanningする
■ ダイナミクスモデルが誤っていても,そのモデルにおいて報酬が高く得られるような行動を選択する
■ 高次元になるほどモデルが誤った予測をする可能性が高くなる(らしい)
■ モデルが予測に自信がないところを把握したい-> 不確実性の考慮
image from CS285 Lecture 11
slide
7
● aleatoric or stochastic uncertainty
○ 環境自身の持つ不確実性
○ データに対する不確実性
■ データ自体にノイズがある
● epistemic or model uncertainty
○ 十分に環境の遷移データが得られず, NNの学習が十分でない不確実性
Learning the Dynamics
不確実性の考慮
image from CS285 Lecture 11
slide
8
Learning the Dynamics
不確実性の考慮
● 環境自身の持つ不確実性の対処
○ -> 確率分布のパラメータをNNで出力し,サンプリングすることで対処
● 十分に環境の遷移データが得られず, NNの学習が十分でない不確実性への対処
○ -> ダイナミクスモデルを複数用意することで対処(ブートストラップアン
サンブル)
image from CS285 Lecture 11
slide
9
Learning the Dynamics
ブートストラップアンサンブル
● 複数のダイナミクスモデルを用いて遷移を予測し,一連の行動を行った際の報
酬の平均から,対象となる行動系列の評価を行う
image from CS285 Lecture 11
slide
10
Learning the Dynamics
ブートストラップアンサンブル
11
Model Predictive Control
Slide from CS285 Lecture 11
12
Model Predictive Control
Random shooting
● ある系列長のactionの系列をいくつか候補として挙げる
● その中で最も報酬が高く得られたaction系列を採用する
○ どれくらい報酬が得られるかは学習したモデルを使用し評価
○ Model Predictive Controlでは最初のactionだけ採用し, また次のstepで
Random shootingを行う
Slide from CS285 Lecture 10,
11
13
Model Predictive Control
Iterative Random-Shooting with Refinement
● 候補に挙げるアクション系列を,報酬が高く得られた範囲からとるようにし,
確度を高めていく
○ 何度かサンプリングを行い,最終的にアクション系列を定める
image from CS285 Lecture 10
slide
14
Model Predictive Control
Filtering and Reward-Weighted Refinement
● time step間の相関を考慮に入れ,アクション系列のサンプリングを行う時絞り
込む分布の更新をよりサンプル全体を考慮して有効的に行う
報酬による重み付けを行い
分布を更新
Time step間の相関の考慮(?)
filtering
15
PDDM
ブートストラップアンサンブル
Model Predictive Control
16
実験結果(モデルデザイン)
17
実験結果
● Valve Turning: 9-DoFのハンドでvalve
を回す
● In-hand Reorientation: キューブをある
指定の方向へ移動させる
● Handwriting: 正確な操作が求められる
● Boading Balls: 落とさずに二つのボー
ルを回転させる
18
Valve Turning
19
In-hand reorientation
20
Handwriting
21
Baoding Balls
22
Baoding Balls(real)
23
まとめ
● Dexterous manipulation taskを実用的に解けるような,ブートストラップアン
サンブルで不確実性を考慮し,Filtering and Reward-Weighted Refinementによ
って行動系列を選択してMPCを行う,既存手法をうまく組み合わせたモデルベ
ース強化学習手法PDDMを提案
24
実験設定詳細
25

Más contenido relacionado

La actualidad más candente

PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門tmtm otm
 
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling ProblemDeep Learning JP
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
 
【DL輪読会】Scaling laws for single-agent reinforcement learning
【DL輪読会】Scaling laws for single-agent reinforcement learning【DL輪読会】Scaling laws for single-agent reinforcement learning
【DL輪読会】Scaling laws for single-agent reinforcement learningDeep Learning JP
 
【DL輪読会】Transformers are Sample Efficient World Models
【DL輪読会】Transformers are Sample Efficient World Models【DL輪読会】Transformers are Sample Efficient World Models
【DL輪読会】Transformers are Sample Efficient World ModelsDeep Learning JP
 
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...Deep Learning JP
 
画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイcvpaper. challenge
 
[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for VisionDeep Learning JP
 
【DL輪読会】Implicit Behavioral Cloning
【DL輪読会】Implicit Behavioral Cloning【DL輪読会】Implicit Behavioral Cloning
【DL輪読会】Implicit Behavioral CloningDeep Learning JP
 
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...Deep Learning JP
 
Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Yoshitaka Ushiku
 
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展Deep Learning JP
 
[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networksDeep Learning JP
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep LearningSeiya Tokui
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP
 
[DL輪読会]Learning to Simulate Complex Physics with Graph Networks
[DL輪読会]Learning to Simulate Complex Physics with Graph Networks[DL輪読会]Learning to Simulate Complex Physics with Graph Networks
[DL輪読会]Learning to Simulate Complex Physics with Graph NetworksDeep Learning JP
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習Eiji Uchibe
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot LearningDeep Learning JP
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational AutoencoderDeep Learning JP
 

La actualidad más candente (20)

PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門
 
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
 
【DL輪読会】Scaling laws for single-agent reinforcement learning
【DL輪読会】Scaling laws for single-agent reinforcement learning【DL輪読会】Scaling laws for single-agent reinforcement learning
【DL輪読会】Scaling laws for single-agent reinforcement learning
 
【DL輪読会】Transformers are Sample Efficient World Models
【DL輪読会】Transformers are Sample Efficient World Models【DL輪読会】Transformers are Sample Efficient World Models
【DL輪読会】Transformers are Sample Efficient World Models
 
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
 
画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ
 
[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision
 
【DL輪読会】Implicit Behavioral Cloning
【DL輪読会】Implicit Behavioral Cloning【DL輪読会】Implicit Behavioral Cloning
【DL輪読会】Implicit Behavioral Cloning
 
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
 
Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)
 
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
 
[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 
[DL輪読会]Learning to Simulate Complex Physics with Graph Networks
[DL輪読会]Learning to Simulate Complex Physics with Graph Networks[DL輪読会]Learning to Simulate Complex Physics with Graph Networks
[DL輪読会]Learning to Simulate Complex Physics with Graph Networks
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 

Más de Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについてDeep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLMDeep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...Deep Learning JP
 
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...Deep Learning JP
 

Más de Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
 
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
 

[DL輪読会]Deep Dynamics Models for Learning Dexterous Manipulation

  • 1. 1 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ Deep Dynamics Models for Learning Dexterous Manipulation(PDDM) Keno Harada, UT, B3
  • 2. 書誌情報 ● 著者情報: ○ Anusha Nagabandi, Kurt Konoglie, Sergey Levine, Vikash Kumar ○ Google Brain ● 論文リンク: https://arxiv.org/pdf/1909.11652.pdf(CoRL 2019?) ● Blog: ○ Google: https://sites.google.com/view/pddm/ ○ BAIR: https://bair.berkeley.edu/blog/2019/09/30/deep-dynamics/ ● CS285(http://rail.eecs.berkeley.edu/deeprlcourse/)のLecture10, 11で PDDMに関係する技術の詳しい解説がなされています 2
  • 4. 研究概要 ● 複数本の指でのdexterous manipulation task 難しい ○ 複数の方向から同時に対象物体に力を及ぼすことが可能でないと達成が難 しい ○ 多数の関節を制御し複雑な力を与える必要性 ○ 接触が生じたり, 消えたりが繰り返されるため, 正確な物理モデルが必要と される解析的な手法では難しい -> 学習ベースに成功の可能性が ● モデルベース強化学習 ○ 環境のダイナミクスを学習する ○ 必要となるデータ数はmodel-freeより少ないため実用的 ○ dexterous manipulation taskのような難しいタスクへの適用はまだあまり なされていない 4
  • 5. 研究概要 ● Online planning with deep dynamics models(PDDM) ○ Model Predictive Control ■ Neural network dynamics for modelbased deep reinforcement learning with model-free fine-tuning(https://arxiv.org/pdf/1708.02596.pdf) ○ Ensembles for model uncertainty estimation ■ Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models(https://papers.nips.cc/paper/7725-deep- reinforcement-learning-in-a-handful-of-trials-using-probabilistic- dynamics-models.pdf) ● 一言で言うと: 不確実性を考慮に入れたダイナミクスの予測をブートストラッ プアンサンブルで行い,行動の選択をMPCによって行う ● 個々の手法は既存のものだが,組み合わせは新しく, 肝だとしている 5
  • 6. アウトライン ● Learning the Dynamics ○ モデルベース強化学習の課題 ○ 不確実性の考慮 ○ ブートストラップアンサンブル ● Model Predictive Control ○ Random Shooting ○ Iterative Random-Shooting with Refinement ○ Filtering and Reward-Weighted Refinement ● PDDM ● 実験結果 6
  • 7. Learning the Dynamics モデルベース強化学習の課題 ● モデルフリーの手法に比べてパフォーマンス劣る ○ モデルベースは学習されたモデルを基にPlanningする ■ ダイナミクスモデルが誤っていても,そのモデルにおいて報酬が高く得られるような行動を選択する ■ 高次元になるほどモデルが誤った予測をする可能性が高くなる(らしい) ■ モデルが予測に自信がないところを把握したい-> 不確実性の考慮 image from CS285 Lecture 11 slide 7
  • 8. ● aleatoric or stochastic uncertainty ○ 環境自身の持つ不確実性 ○ データに対する不確実性 ■ データ自体にノイズがある ● epistemic or model uncertainty ○ 十分に環境の遷移データが得られず, NNの学習が十分でない不確実性 Learning the Dynamics 不確実性の考慮 image from CS285 Lecture 11 slide 8
  • 9. Learning the Dynamics 不確実性の考慮 ● 環境自身の持つ不確実性の対処 ○ -> 確率分布のパラメータをNNで出力し,サンプリングすることで対処 ● 十分に環境の遷移データが得られず, NNの学習が十分でない不確実性への対処 ○ -> ダイナミクスモデルを複数用意することで対処(ブートストラップアン サンブル) image from CS285 Lecture 11 slide 9
  • 10. Learning the Dynamics ブートストラップアンサンブル ● 複数のダイナミクスモデルを用いて遷移を予測し,一連の行動を行った際の報 酬の平均から,対象となる行動系列の評価を行う image from CS285 Lecture 11 slide 10
  • 12. Model Predictive Control Slide from CS285 Lecture 11 12
  • 13. Model Predictive Control Random shooting ● ある系列長のactionの系列をいくつか候補として挙げる ● その中で最も報酬が高く得られたaction系列を採用する ○ どれくらい報酬が得られるかは学習したモデルを使用し評価 ○ Model Predictive Controlでは最初のactionだけ採用し, また次のstepで Random shootingを行う Slide from CS285 Lecture 10, 11 13
  • 14. Model Predictive Control Iterative Random-Shooting with Refinement ● 候補に挙げるアクション系列を,報酬が高く得られた範囲からとるようにし, 確度を高めていく ○ 何度かサンプリングを行い,最終的にアクション系列を定める image from CS285 Lecture 10 slide 14
  • 15. Model Predictive Control Filtering and Reward-Weighted Refinement ● time step間の相関を考慮に入れ,アクション系列のサンプリングを行う時絞り 込む分布の更新をよりサンプル全体を考慮して有効的に行う 報酬による重み付けを行い 分布を更新 Time step間の相関の考慮(?) filtering 15
  • 18. 実験結果 ● Valve Turning: 9-DoFのハンドでvalve を回す ● In-hand Reorientation: キューブをある 指定の方向へ移動させる ● Handwriting: 正確な操作が求められる ● Boading Balls: 落とさずに二つのボー ルを回転させる 18
  • 24. まとめ ● Dexterous manipulation taskを実用的に解けるような,ブートストラップアン サンブルで不確実性を考慮し,Filtering and Reward-Weighted Refinementによ って行動系列を選択してMPCを行う,既存手法をうまく組み合わせたモデルベ ース強化学習手法PDDMを提案 24

Notas del editor

  1. the model must have enough capacity to represent the complex dynamical system the use of ensembles is helpful, especially earlier in training when non-ensembled models can overfit badly and thus exhibit overconfident and harmful behavior there is not much difference between resetting model weights randomly at each training iteration versus warmstarting them from their previous values using a planning horizon that is either too long or too short can be detrimental: Short horizons lead to greedy planning, while long horizons suffer from compounding errors in the predictions PDDM, with action smoothing and soft updates, greatly outperforms the others medium values provide the best balance of dimensionality reduction and smooth integration of action samples versus loss of control authority. Here, too soft of a weighting leads to minimal movement of the hand, and too hard of a weighting leads to aggressive behaviors that frequently drop the objects
  2. we confirm that most of the prior methods do in fact succeed, and we also see that even on this simpler task, policy gradient approaches such as NPG require prohibitively large amounts of data
  3. when we increase the number of possible goals to 8 different options (90◦ and 45◦ rotations in the left, right, up, and down directions), we see that our method still succeeds, but the model-free approaches get stuck in local optima and are unable to fully achieve even the previously attainable goals. This inability to effectively address a “multi-task” or “multi-goal” setup is indeed a known drawback for model-free approaches, and it is particularly pronounced in such goal-conditioned tasks that require flexibility These additional goals do not make the task harder for PDDM, because even in learning 90◦ rotations, it is building a model of its interactions rather than specifically learning to get to those angles.
  4. prior model-based approaches don’t actually solve this task (values below the grey line correspond to holding the pencil still near the middle of the paper)
  5. This task is particularly challenging due to the inter-object interactions, which can lead to drastically discontinuous dynamics and frequent failures from dropping the objects. We were unable to get the other model-based or model-free methods to succeed at this task (Figure 8), but PDDM solves it using just 100,000 data points, or 2.7 hours worth of data moving a single ball to a goal location in the hand, posing the hand, and performing clockwise rotations instead of the learned counter-clockwise ones