SlideShare a Scribd company logo
1 of 64
Download to read offline
State Representation Learning for
control: an overview
T. Lesort, N. Díaz-Rodríguez, J-F. Goudou, D. Filliat
Natalia Díaz Rodríguez, PhD 23rd April 2018
HUMAINT seminar, EU Joint Research Center, Sevilla1
ABOUT ME
2
3
DREAM
Project:
• Learning and adapting in open-ended domains
• Knowledge compression, consolidation, and transfer
• Artificial curiosity and representation redescription
• Day: getting data and learning tasks. Night: Finding better representations
www.robotsthatdream.eu
4
5
This presentation: fresh work
6
State Representation
Learning (SRL) in RL context
7
SRL
Why learning states ?
● Often, robot controllers require simple, ‘high-level’ inputs
(the ‘state’ of the world)
○ E.g., grasping: object position; driving: road direction
● High dimensional visual input perception requires filtering to
get this relevant information (often hand-crafted)
○ @DREAM: Learning state = Discarding irrelevant
knowledge (e.g., during ‘off’ or ‘sleep time’).
● Controllers are easier to train in such lower dimension
○ This ‘high-level’ information can be less dependent on the
environment and help transfer learning
8
Learning states vs
learning representations
● Learning a state:
○ A particular case of representation learning
○ Entails a control/robotics context where actions are
possible
○ Can be an objective by itself, but is often present in more
general approaches
■ E.g., as an auxiliary task in RL
● Often we look for interpretable info./ info. with a physical
meaning
9
Learning states vs
learning representations
● Focus in this survey:
○ Learning objectives relevant for SRL
○ Put in perspective common factors between very
different use cases
10
First, what
is a good
state?
11
The Markovian view: [Böhmer’15]
12
A good state representation is a
representation that is:
1. Markovian, i.e. it summarizes all the necessary information
to be able to choose an action within the policy, by looking
only at the current state.
2. Able to:
a. Represent the true value of the current state well enough
for policy improvement.
b. Generalize the learned value-function to unseen states
with similar futures.
3. Low dimensional for efficient estimation.
Further, a good state representation
should:
13
● Clearly disentangle the different factors of variation with
different semantics
● Sufficient
● As efficient as possible
(i.e., easy to work with, e.g., factorizing the
data-generating factors)
● Minimal (from all possible representations, the most
efficient one).
● In summary: exclude irrelevant information to encourage a
lower dimensionality
[Achille and Soatto, 2017]
Learning states: challenges
● Learning without direct supervision
○ True state never available
● Solution: Use ‘self-supervised’ learning
○ RL framework
○ Exploit observations, actions, rewards
● Various complementary approaches
○ Reconstruct observation
○ Learn forward/inverse model
○ ….
14
What can SRL be used for?
e.g., in computer vision for robotics
control
15
• RL and self-supervision
• Transfer learning
• Multimodal observation learning (e.g.,autonomous vehicles)
• Multi-objective optimization
• Evolutionary strategies
Applications of SRL
16
Overview of existing approaches
Models that reconstruct the
observation
● Train a state that is sufficient for
reconstructing the input observation
○ AE, DAE, VAE
○ GANs
● Downside: sensitive to irrelevant
variations
18
AE example: Deep Spatial
Autoencoders
● Learns a representation that corresponds to objects coords.
(image space):
● Torque control & 2nd-order dynamical system requires features
and their velocities
19 Deep spatial autoencoders for visuomotor learning [Finn’15]
AE example: Deep Spatial
Autoencoders
20 Deep spatial autoencoders for visuomotor learning [Finn’15]
Forward models:
● Find state from which it is easy to
predict next state
○ Impose constraints (e.g., simple
linear model)
○ Naturally discard irrelevant features
● May be useful in, e.g. model based RL
21
22
Forward model example: Value Prediction Network [Oh’17]
23
• SRL happens in the Transition (fwd) model
• Learns the dynamics of the reward values to perform
planning (based on options)
Forward model example: Value Prediction Network [Oh’17]
Inverse models
● Train a state:
○ Sufficient to find actions from 2
observations
○ Useful for a direct control model
● Impose constraints on the model, e.g.:
○ Simple linear model
○ Focus on states that can be
controlled
24
Inverse model example:
Intrinsic Curiosity Module (ICM)
● Curiosity: error in an agent's prediction of
the consequence of its own actions in a
visual feature space (learned by a
self-supervised inv. dynamics model).
○ Allows reward to be optional
○ Bypasses difficulties of predicting pixels
● Cons: Not tested on robotics 3D images
25
The role of SRL in the intrinsic
curiosity model
26
● The fwd model encodes constraints and learns the state
space
● Its prediction error is the intrinsic (i.e., curiosity) reward
that encourages exploration of the space
● Encode high-level constraints on the
states:
○ Temporal continuity
○ Controllability
○ Inertia, etc.
● May exploit rewards
27
Priors models
In case you wonder…
we do not mean PRIORS in the bayesian prior probability sense...rather:
Priors models
28
Use a priori knowledge to learn representations more relevant to the
task
29
Robotic Priors
[Jonschkowski et. al. 2015]
•
30
Robotic Priors
[Jonschkowski et. al. 2015]
•
31
Robotic Priors
[Jonschkowski et. al. 2015]
PVE encode an observation into a position state and velocity
estimation.
• Variation: Positions of relevant things vary (representation of
such positions should do as well).
• Slowness: Positions change slowly (velocities are slow)
• Inertia: Velocities change slowly (slowness applied to
velocities).
• Conservation: Velocity magnitudes change slowly (law of
conservation of energy: similar amount of energy
(”movement”) in consecutive time steps)
• Controllability: Controllable things (those whose
accelerations correlate with the actions of the robot) are
relevant.
Robotic Priors: Position Velocity Encoders
(PVE) [Jonschkowski et. al. 2017]
32
33
Robotic Priors: Position Velocity Encoders
(PVE) [Jonschkowski et. al. 2017]
Robotic Priors: Position Velocity Encoders
(PVE) [Jonschkowski et. al. 2017]
34
Other objective functions
35
Selectivity: [Thomas’17]
• A good representation should be a disentangled
representation of variation factors.
• If actions are known to be independent, it measures
the independence among variations of the
representation under each action.
• A model to learn the mean and supplementary params for
the variance of the distrib. (6)
• W, U, V fixed or learned
• Allows using KL-divergence to train a fwd model.
Multi-criteria objective functions:
Embed to Control (E2C)
36
[Watter’18]
.
SRL models - summary
37
SRL models - summary
38
• Denoising (DAE), variational (VAE) and vanilla
auto-encoders
• Siamese networks
• GANs
• Other objectives
Learning tools
39
Autoencoders: A taxonomy[Kramer’91, Salakhutdinov'06, Bengio'09]
40
• Learn compact and relevant (undercomplete -bottleneck-
and overcomplete [Zazakci'16]) representations minimizing the
reconstruction error:
• DAE inject noise in input, VAE in stochastic hidden layer
• Cons: Unable to ignore task-irrelevant features (domain
randomization or saliency)
Stacked AE (RBMs or VAEs)
Autoencoders: VAE -makes
an AE generative
41
• 2+ identical copies of parameters and loss function
• Discriminatively learns a similarity metric from data
• Suitable:
• In verification/recognition where #categories is large
and unknown
• Implementing priors (allows multiple passes in parallel)
• Cons: network size doubles
Siamese
Networks
[Chopra’05]
42
• A two-player adversarial (minimax) game with value
function V(G, D) (1: fake, 0: real input):
Generative Adversarial Networks
[Goodfellow’14]
43
BiGAN: Bidirectional Generative
Adversarial Network (= ALI)
44
BiGAN: Bidirectional Generative
Adversarial Network
45
BiGAN for SRL:
observation
reconstruction:
Evaluating state
representations:
the effect of
dimensionality
46
• Few approaches use a
minimal state dimension
• Larger dimensions often:
• Improve performance
• But limit interpretability
• (Inverted) pendulum, cart-pole
• Atari games (mostly 2D), VizDoom (3D)
• Octopus arms, driving/mountain cars,
bouncing ball
• Mobile robots, labyrinths, navigation
grids
• Robotics manipulation skills:
• Grasping, balancing a pendulum,
poking, pushing a button
Evaluating learned representations:
Environments
47
Evaluation environments:
Control theory problems from
classic RL literature
48
https://gym.openai.com/envs/#classic_control
Octopus arm [Engel’06, Munk’16]
Ball in cup
Evaluating state representations:
Metrics
49
Metrics: KNN-MSE:
States that are neighbours in the ground truth
should be neighbours in the learned state space
50 Unsupervised state representation learning with
robotic priors: a robustness benchmark [Lesort’17]
• SRL for autonomous agents
• Assessment, reproducibility, interpretability (FAT)
• Automatically decide state dimension
• Efficient data gathering for SRL by blending in:
• Learning by demonstration-> with human in the loop!
• Few-shot learning
• GANs
• Intrinsic motivations via Goal Exploration Processes
• -> Happy to collaborate and co-supervise students/visits
Future challenges/directions
51
(MAIN) REFERENCES
• GAN Zoo and GANs presentation
• Deep learning Zoo
• Lesort et. al. Unsupervised state representation learning
with robotic priors: a robustness benchmark, 2017,
DEMO video
• Lesort et al. State Representation learning for control: an
Overview (submitted to Neural Networks, Elsevier)
• Doncieux et al. Representational Redescription in
Open-ended Learning: a Conceptual Framework (in prep.)
52
Thank you!
Ideas?
natalia.diaz@ensta-paristech.fr
@NataliaDiazRodr
Appendix
Future directions
(or deep learning beyond
classifying chihuahuas and
blueberry muffins)
54
Appendix
Unsupervised state
representation learning with
robotic priors: a robustness
benchmark
55
T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark,
2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4
State Representation Learning with robotics priors
56
T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark,
2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4
State Representation Learning with robotics priors
57
T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark,
2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4
State Representation Learning with robotics priors
58
State Representation Learning with robotics priors
T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark,
2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4
59
State Representation Learning with robotics priors
T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark,
2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4
60
T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark,
2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4
State Representation Learning with robotics priors
61
Effect of distractors and domain randomization without Ref.
point prior
Robotic Priors
[Jonschkowski et. al. 2015]
T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark,
2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4
62
63
64

More Related Content

Similar to State Representation Learning for control: an overview

物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術CHENHuiMei
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand ManipulationSeung Jae Lee
 
Interpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex modelsInterpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex modelsManojit Nandi
 
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...AutonomyIncubator
 
morante2015automatic-presentation
morante2015automatic-presentationmorante2015automatic-presentation
morante2015automatic-presentationJuan G. Victores
 
crowd-robot interaction: crowd-aware robot navigation with attention-based DRL
crowd-robot interaction: crowd-aware robot navigation with attention-based DRLcrowd-robot interaction: crowd-aware robot navigation with attention-based DRL
crowd-robot interaction: crowd-aware robot navigation with attention-based DRL민재 정
 
Model based rl
Model based rlModel based rl
Model based rlSeolhokim
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introductionGanesan Narayanasamy
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用Ryo Iwaki
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review민재 정
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIAnand Joshi
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 

Similar to State Representation Learning for control: an overview (20)

物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation
 
Interpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex modelsInterpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex models
 
AI Lesson 33
AI Lesson 33AI Lesson 33
AI Lesson 33
 
Lesson 33
Lesson 33Lesson 33
Lesson 33
 
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
 
CH2_AI_Lecture1.ppt
CH2_AI_Lecture1.pptCH2_AI_Lecture1.ppt
CH2_AI_Lecture1.ppt
 
morante2015automatic-presentation
morante2015automatic-presentationmorante2015automatic-presentation
morante2015automatic-presentation
 
Hierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement LearningHierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement Learning
 
crowd-robot interaction: crowd-aware robot navigation with attention-based DRL
crowd-robot interaction: crowd-aware robot navigation with attention-based DRLcrowd-robot interaction: crowd-aware robot navigation with attention-based DRL
crowd-robot interaction: crowd-aware robot navigation with attention-based DRL
 
Model based rl
Model based rlModel based rl
Model based rl
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introduction
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAI
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 

More from Natalia Díaz Rodríguez

PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 Natalia Díaz Rodríguez
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...Natalia Díaz Rodríguez
 
How to write systematic literature reviews (ideally, your first PhD paper)
How to write systematic literature reviews (ideally, your first PhD paper)How to write systematic literature reviews (ideally, your first PhD paper)
How to write systematic literature reviews (ideally, your first PhD paper)Natalia Díaz Rodríguez
 
Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...Natalia Díaz Rodríguez
 
An Ontology for Wearables Data Interoperability and Ambient Assisted Living A...
An Ontology for Wearables Data Interoperability and Ambient Assisted Living A...An Ontology for Wearables Data Interoperability and Ambient Assisted Living A...
An Ontology for Wearables Data Interoperability and Ambient Assisted Living A...Natalia Díaz Rodríguez
 
Smart Dosing: A mobile application for tracking the medication tray-filling a...
Smart Dosing: A mobile application for tracking the medication tray-filling a...Smart Dosing: A mobile application for tracking the medication tray-filling a...
Smart Dosing: A mobile application for tracking the medication tray-filling a...Natalia Díaz Rodríguez
 
UCAmI Presentation Dec.2013, Guanacaste, Costa Rica
UCAmI Presentation Dec.2013, Guanacaste, Costa RicaUCAmI Presentation Dec.2013, Guanacaste, Costa Rica
UCAmI Presentation Dec.2013, Guanacaste, Costa RicaNatalia Díaz Rodríguez
 
IFSA World Congress -NAFIPS 2013 Edmonton, Alberta. Natalia Díaz
IFSA World Congress -NAFIPS 2013 Edmonton, Alberta. Natalia DíazIFSA World Congress -NAFIPS 2013 Edmonton, Alberta. Natalia Díaz
IFSA World Congress -NAFIPS 2013 Edmonton, Alberta. Natalia DíazNatalia Díaz Rodríguez
 
Extending Semantic Web Tools for Improving Smart Spaces Interoperability and ...
Extending Semantic Web Tools for Improving Smart Spaces Interoperability and ...Extending Semantic Web Tools for Improving Smart Spaces Interoperability and ...
Extending Semantic Web Tools for Improving Smart Spaces Interoperability and ...Natalia Díaz Rodríguez
 
A Framework for Context-aware applications for Smart Spaces. ruSmart 2011 St ...
A Framework for Context-aware applications for Smart Spaces. ruSmart 2011 St ...A Framework for Context-aware applications for Smart Spaces. ruSmart 2011 St ...
A Framework for Context-aware applications for Smart Spaces. ruSmart 2011 St ...Natalia Díaz Rodríguez
 

More from Natalia Díaz Rodríguez (13)

Continual learning and robotics
Continual learning and robotics   Continual learning and robotics
Continual learning and robotics
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
How to write systematic literature reviews (ideally, your first PhD paper)
How to write systematic literature reviews (ideally, your first PhD paper)How to write systematic literature reviews (ideally, your first PhD paper)
How to write systematic literature reviews (ideally, your first PhD paper)
 
Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...
 
An Ontology for Wearables Data Interoperability and Ambient Assisted Living A...
An Ontology for Wearables Data Interoperability and Ambient Assisted Living A...An Ontology for Wearables Data Interoperability and Ambient Assisted Living A...
An Ontology for Wearables Data Interoperability and Ambient Assisted Living A...
 
Guest lecture @Stanford Aug 4th 2015
Guest lecture @Stanford Aug 4th 2015 Guest lecture @Stanford Aug 4th 2015
Guest lecture @Stanford Aug 4th 2015
 
PhD Defense Natalia Díaz Rodríguez
PhD Defense Natalia Díaz RodríguezPhD Defense Natalia Díaz Rodríguez
PhD Defense Natalia Díaz Rodríguez
 
Smart Dosing: A mobile application for tracking the medication tray-filling a...
Smart Dosing: A mobile application for tracking the medication tray-filling a...Smart Dosing: A mobile application for tracking the medication tray-filling a...
Smart Dosing: A mobile application for tracking the medication tray-filling a...
 
UCAmI Presentation Dec.2013, Guanacaste, Costa Rica
UCAmI Presentation Dec.2013, Guanacaste, Costa RicaUCAmI Presentation Dec.2013, Guanacaste, Costa Rica
UCAmI Presentation Dec.2013, Guanacaste, Costa Rica
 
IFSA World Congress -NAFIPS 2013 Edmonton, Alberta. Natalia Díaz
IFSA World Congress -NAFIPS 2013 Edmonton, Alberta. Natalia DíazIFSA World Congress -NAFIPS 2013 Edmonton, Alberta. Natalia Díaz
IFSA World Congress -NAFIPS 2013 Edmonton, Alberta. Natalia Díaz
 
Extending Semantic Web Tools for Improving Smart Spaces Interoperability and ...
Extending Semantic Web Tools for Improving Smart Spaces Interoperability and ...Extending Semantic Web Tools for Improving Smart Spaces Interoperability and ...
Extending Semantic Web Tools for Improving Smart Spaces Interoperability and ...
 
A Framework for Context-aware applications for Smart Spaces. ruSmart 2011 St ...
A Framework for Context-aware applications for Smart Spaces. ruSmart 2011 St ...A Framework for Context-aware applications for Smart Spaces. ruSmart 2011 St ...
A Framework for Context-aware applications for Smart Spaces. ruSmart 2011 St ...
 

Recently uploaded

Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDivyaK787011
 

Recently uploaded (20)

Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
 

State Representation Learning for control: an overview

  • 1. State Representation Learning for control: an overview T. Lesort, N. Díaz-Rodríguez, J-F. Goudou, D. Filliat Natalia Díaz Rodríguez, PhD 23rd April 2018 HUMAINT seminar, EU Joint Research Center, Sevilla1
  • 3. 3
  • 4. DREAM Project: • Learning and adapting in open-ended domains • Knowledge compression, consolidation, and transfer • Artificial curiosity and representation redescription • Day: getting data and learning tasks. Night: Finding better representations www.robotsthatdream.eu 4
  • 5. 5
  • 8. Why learning states ? ● Often, robot controllers require simple, ‘high-level’ inputs (the ‘state’ of the world) ○ E.g., grasping: object position; driving: road direction ● High dimensional visual input perception requires filtering to get this relevant information (often hand-crafted) ○ @DREAM: Learning state = Discarding irrelevant knowledge (e.g., during ‘off’ or ‘sleep time’). ● Controllers are easier to train in such lower dimension ○ This ‘high-level’ information can be less dependent on the environment and help transfer learning 8
  • 9. Learning states vs learning representations ● Learning a state: ○ A particular case of representation learning ○ Entails a control/robotics context where actions are possible ○ Can be an objective by itself, but is often present in more general approaches ■ E.g., as an auxiliary task in RL ● Often we look for interpretable info./ info. with a physical meaning 9
  • 10. Learning states vs learning representations ● Focus in this survey: ○ Learning objectives relevant for SRL ○ Put in perspective common factors between very different use cases 10
  • 11. First, what is a good state? 11
  • 12. The Markovian view: [Böhmer’15] 12 A good state representation is a representation that is: 1. Markovian, i.e. it summarizes all the necessary information to be able to choose an action within the policy, by looking only at the current state. 2. Able to: a. Represent the true value of the current state well enough for policy improvement. b. Generalize the learned value-function to unseen states with similar futures. 3. Low dimensional for efficient estimation.
  • 13. Further, a good state representation should: 13 ● Clearly disentangle the different factors of variation with different semantics ● Sufficient ● As efficient as possible (i.e., easy to work with, e.g., factorizing the data-generating factors) ● Minimal (from all possible representations, the most efficient one). ● In summary: exclude irrelevant information to encourage a lower dimensionality [Achille and Soatto, 2017]
  • 14. Learning states: challenges ● Learning without direct supervision ○ True state never available ● Solution: Use ‘self-supervised’ learning ○ RL framework ○ Exploit observations, actions, rewards ● Various complementary approaches ○ Reconstruct observation ○ Learn forward/inverse model ○ …. 14
  • 15. What can SRL be used for? e.g., in computer vision for robotics control 15
  • 16. • RL and self-supervision • Transfer learning • Multimodal observation learning (e.g.,autonomous vehicles) • Multi-objective optimization • Evolutionary strategies Applications of SRL 16
  • 17. Overview of existing approaches
  • 18. Models that reconstruct the observation ● Train a state that is sufficient for reconstructing the input observation ○ AE, DAE, VAE ○ GANs ● Downside: sensitive to irrelevant variations 18
  • 19. AE example: Deep Spatial Autoencoders ● Learns a representation that corresponds to objects coords. (image space): ● Torque control & 2nd-order dynamical system requires features and their velocities 19 Deep spatial autoencoders for visuomotor learning [Finn’15]
  • 20. AE example: Deep Spatial Autoencoders 20 Deep spatial autoencoders for visuomotor learning [Finn’15]
  • 21. Forward models: ● Find state from which it is easy to predict next state ○ Impose constraints (e.g., simple linear model) ○ Naturally discard irrelevant features ● May be useful in, e.g. model based RL 21
  • 22. 22 Forward model example: Value Prediction Network [Oh’17]
  • 23. 23 • SRL happens in the Transition (fwd) model • Learns the dynamics of the reward values to perform planning (based on options) Forward model example: Value Prediction Network [Oh’17]
  • 24. Inverse models ● Train a state: ○ Sufficient to find actions from 2 observations ○ Useful for a direct control model ● Impose constraints on the model, e.g.: ○ Simple linear model ○ Focus on states that can be controlled 24
  • 25. Inverse model example: Intrinsic Curiosity Module (ICM) ● Curiosity: error in an agent's prediction of the consequence of its own actions in a visual feature space (learned by a self-supervised inv. dynamics model). ○ Allows reward to be optional ○ Bypasses difficulties of predicting pixels ● Cons: Not tested on robotics 3D images 25
  • 26. The role of SRL in the intrinsic curiosity model 26 ● The fwd model encodes constraints and learns the state space ● Its prediction error is the intrinsic (i.e., curiosity) reward that encourages exploration of the space
  • 27. ● Encode high-level constraints on the states: ○ Temporal continuity ○ Controllability ○ Inertia, etc. ● May exploit rewards 27 Priors models
  • 28. In case you wonder… we do not mean PRIORS in the bayesian prior probability sense...rather: Priors models 28
  • 29. Use a priori knowledge to learn representations more relevant to the task 29 Robotic Priors [Jonschkowski et. al. 2015]
  • 32. PVE encode an observation into a position state and velocity estimation. • Variation: Positions of relevant things vary (representation of such positions should do as well). • Slowness: Positions change slowly (velocities are slow) • Inertia: Velocities change slowly (slowness applied to velocities). • Conservation: Velocity magnitudes change slowly (law of conservation of energy: similar amount of energy (”movement”) in consecutive time steps) • Controllability: Controllable things (those whose accelerations correlate with the actions of the robot) are relevant. Robotic Priors: Position Velocity Encoders (PVE) [Jonschkowski et. al. 2017] 32
  • 33. 33 Robotic Priors: Position Velocity Encoders (PVE) [Jonschkowski et. al. 2017]
  • 34. Robotic Priors: Position Velocity Encoders (PVE) [Jonschkowski et. al. 2017] 34
  • 35. Other objective functions 35 Selectivity: [Thomas’17] • A good representation should be a disentangled representation of variation factors. • If actions are known to be independent, it measures the independence among variations of the representation under each action.
  • 36. • A model to learn the mean and supplementary params for the variance of the distrib. (6) • W, U, V fixed or learned • Allows using KL-divergence to train a fwd model. Multi-criteria objective functions: Embed to Control (E2C) 36 [Watter’18] .
  • 37. SRL models - summary 37
  • 38. SRL models - summary 38
  • 39. • Denoising (DAE), variational (VAE) and vanilla auto-encoders • Siamese networks • GANs • Other objectives Learning tools 39
  • 40. Autoencoders: A taxonomy[Kramer’91, Salakhutdinov'06, Bengio'09] 40 • Learn compact and relevant (undercomplete -bottleneck- and overcomplete [Zazakci'16]) representations minimizing the reconstruction error: • DAE inject noise in input, VAE in stochastic hidden layer • Cons: Unable to ignore task-irrelevant features (domain randomization or saliency) Stacked AE (RBMs or VAEs)
  • 41. Autoencoders: VAE -makes an AE generative 41
  • 42. • 2+ identical copies of parameters and loss function • Discriminatively learns a similarity metric from data • Suitable: • In verification/recognition where #categories is large and unknown • Implementing priors (allows multiple passes in parallel) • Cons: network size doubles Siamese Networks [Chopra’05] 42
  • 43. • A two-player adversarial (minimax) game with value function V(G, D) (1: fake, 0: real input): Generative Adversarial Networks [Goodfellow’14] 43
  • 45. BiGAN: Bidirectional Generative Adversarial Network 45 BiGAN for SRL: observation reconstruction:
  • 46. Evaluating state representations: the effect of dimensionality 46 • Few approaches use a minimal state dimension • Larger dimensions often: • Improve performance • But limit interpretability
  • 47. • (Inverted) pendulum, cart-pole • Atari games (mostly 2D), VizDoom (3D) • Octopus arms, driving/mountain cars, bouncing ball • Mobile robots, labyrinths, navigation grids • Robotics manipulation skills: • Grasping, balancing a pendulum, poking, pushing a button Evaluating learned representations: Environments 47
  • 48. Evaluation environments: Control theory problems from classic RL literature 48 https://gym.openai.com/envs/#classic_control Octopus arm [Engel’06, Munk’16] Ball in cup
  • 50. Metrics: KNN-MSE: States that are neighbours in the ground truth should be neighbours in the learned state space 50 Unsupervised state representation learning with robotic priors: a robustness benchmark [Lesort’17]
  • 51. • SRL for autonomous agents • Assessment, reproducibility, interpretability (FAT) • Automatically decide state dimension • Efficient data gathering for SRL by blending in: • Learning by demonstration-> with human in the loop! • Few-shot learning • GANs • Intrinsic motivations via Goal Exploration Processes • -> Happy to collaborate and co-supervise students/visits Future challenges/directions 51
  • 52. (MAIN) REFERENCES • GAN Zoo and GANs presentation • Deep learning Zoo • Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark, 2017, DEMO video • Lesort et al. State Representation learning for control: an Overview (submitted to Neural Networks, Elsevier) • Doncieux et al. Representational Redescription in Open-ended Learning: a Conceptual Framework (in prep.) 52
  • 54. Appendix Future directions (or deep learning beyond classifying chihuahuas and blueberry muffins) 54
  • 55. Appendix Unsupervised state representation learning with robotic priors: a robustness benchmark 55
  • 56. T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark, 2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4 State Representation Learning with robotics priors 56
  • 57. T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark, 2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4 State Representation Learning with robotics priors 57
  • 58. T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark, 2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4 State Representation Learning with robotics priors 58
  • 59. State Representation Learning with robotics priors T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark, 2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4 59
  • 60. State Representation Learning with robotics priors T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark, 2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4 60
  • 61. T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark, 2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4 State Representation Learning with robotics priors 61
  • 62. Effect of distractors and domain randomization without Ref. point prior Robotic Priors [Jonschkowski et. al. 2015] T. Lesort et. al. Unsupervised state representation learning with robotic priors: a robustness benchmark, 2017, https://arxiv.org/abs/1709.05185 DEMO: https://www.youtube.com/watch?v=wFd0yJuJlQ4 62
  • 63. 63
  • 64. 64