This document proposes a method called Trajectory-wise Multiple Choice Learning (MCL) to improve generalization in model-based reinforcement learning. The method uses a multi-headed dynamics model to approximate the multi-modal distribution of transition dynamics. Trajectory-wise MCL updates the prediction head that is most accurate over an entire trajectory segment, allowing each head to specialize. An adaptive planning method then uses the most accurate head based on recent experience. Evaluation shows the approach achieves superior generalization to new environments compared to baseline methods.
4. Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
5. Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
● Multiple choice learning (MCL)
Update the most accurate prediction
head for specialization
6. Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
● Multiple choice learning (MCL)
Update the most accurate prediction
head for specialization
● Adaptive planning
Use the most accurate prediction head
over a recent experience for planning
7. Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Transitions
Which prediction head is most
accurate over these transitions?
8. Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Trajectory
segment
● Trajectory-wise multiple choice learning
Difference in dynamics is more distinctively captured
by considering prediction error over trajectory
segment
9. Context-conditional Multi-headed Dynamics Model
● We also introduce context encoder for online adaptation to unseen environments
● Context encoder g captures
contextual information from past
experience
● See [Lee’20] for more information
[Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In
ICML. 2020.
10. Analysis on Trajectory-wise MCL
Transitions Trajectory
segment
● Specialization leads to superior generalization performance
Hopper
11. Analysis on Adaptive Planning
● Qualitative analysis
○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0]
[Mass: 1.0]
with prediction heads
specialized for [Mass: 2.5]
[Mass: 2.5]
with prediction heads
specialized for [Mass: 2.5]
Agent acts as if it has a heavyweight body!