SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Trajectory-wise Multiple Choice Learning for
Dynamics Generalization in Reinforcement Learning
Younggyo Seo1
*, Kimin Lee2
*, Ignasi Clavera2
, Thanard Kurutach2
,
Jinwoo Shin1
and Pieter Abbeel2
KAIST1
, UC Berkeley2
*Equal Contribution
https://sites.google.com/view/trajectory-mcl
Problem: Dynamics Generalization
● Model-based RL suffers from dynamics generalization problem
Evaluation
Training
Deployment
Problem: Dynamics Generalization
● Multi-modal distribution of transition dynamics
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
● Multiple choice learning (MCL)
Update the most accurate prediction
head for specialization
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
● Multiple choice learning (MCL)
Update the most accurate prediction
head for specialization
● Adaptive planning
Use the most accurate prediction head
over a recent experience for planning
Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Transitions
Which prediction head is most
accurate over these transitions?
Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Trajectory
segment
● Trajectory-wise multiple choice learning
Difference in dynamics is more distinctively captured
by considering prediction error over trajectory
segment
Context-conditional Multi-headed Dynamics Model
● We also introduce context encoder for online adaptation to unseen environments
● Context encoder g captures
contextual information from past
experience
● See [Lee’20] for more information
[Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In
ICML. 2020.
Analysis on Trajectory-wise MCL
Transitions Trajectory
segment
● Specialization leads to superior generalization performance
Hopper
Analysis on Adaptive Planning
● Qualitative analysis
○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0]
[Mass: 1.0]
with prediction heads
specialized for [Mass: 2.5]
[Mass: 2.5]
with prediction heads
specialized for [Mass: 2.5]
Agent acts as if it has a heavyweight body!
Comparative Evaluation
● Superior generalization performance on unseen 6 environments
Conclusion
● For dynamics generalization
○ Context-conditional multi-headed dynamics model
○ Trajectory-wise multiple choice learning
○ Adaptive planning
Thank you!

Más contenido relacionado

Similar a Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
AutonomyIncubator
 
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGAI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
Lokesh147875
 
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
AgileNetwork
 

Similar a Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning (8)

Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
 
State Representation Learning for control: an overview
State Representation Learning for control: an overviewState Representation Learning for control: an overview
State Representation Learning for control: an overview
 
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
 
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGAI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
 
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptx
 
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
 

Más de ALINLAB

Más de ALINLAB (7)

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
 
Learning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learningLearning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learning
 
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
 
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
 
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
 
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
 
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
 

Último

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 

Último (20)

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 

Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

  • 1. Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning Younggyo Seo1 *, Kimin Lee2 *, Ignasi Clavera2 , Thanard Kurutach2 , Jinwoo Shin1 and Pieter Abbeel2 KAIST1 , UC Berkeley2 *Equal Contribution https://sites.google.com/view/trajectory-mcl
  • 2. Problem: Dynamics Generalization ● Model-based RL suffers from dynamics generalization problem Evaluation Training Deployment
  • 3. Problem: Dynamics Generalization ● Multi-modal distribution of transition dynamics
  • 4. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads
  • 5. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads ● Multiple choice learning (MCL) Update the most accurate prediction head for specialization
  • 6. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads ● Multiple choice learning (MCL) Update the most accurate prediction head for specialization ● Adaptive planning Use the most accurate prediction head over a recent experience for planning
  • 7. Trajectory-wise Multiple Choice Learning ● For MCL, each prediction head should receive distinct training samples Transitions Which prediction head is most accurate over these transitions?
  • 8. Trajectory-wise Multiple Choice Learning ● For MCL, each prediction head should receive distinct training samples Trajectory segment ● Trajectory-wise multiple choice learning Difference in dynamics is more distinctively captured by considering prediction error over trajectory segment
  • 9. Context-conditional Multi-headed Dynamics Model ● We also introduce context encoder for online adaptation to unseen environments ● Context encoder g captures contextual information from past experience ● See [Lee’20] for more information [Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In ICML. 2020.
  • 10. Analysis on Trajectory-wise MCL Transitions Trajectory segment ● Specialization leads to superior generalization performance Hopper
  • 11. Analysis on Adaptive Planning ● Qualitative analysis ○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0] [Mass: 1.0] with prediction heads specialized for [Mass: 2.5] [Mass: 2.5] with prediction heads specialized for [Mass: 2.5] Agent acts as if it has a heavyweight body!
  • 12. Comparative Evaluation ● Superior generalization performance on unseen 6 environments
  • 13. Conclusion ● For dynamics generalization ○ Context-conditional multi-headed dynamics model ○ Trajectory-wise multiple choice learning ○ Adaptive planning Thank you!