230922 Semantic Exploration from Language Abstractions and Pretrained Representations - public.pdf

S
Seungjoon1Graduated student
Machine
Learning
LABoratory
Seungjoon Lee. 2023-09-22. sjlee1218@postech.ac.kr
Semantic Exploration from
Language Abstractions and
Pretrained Representations
Neurips 2022. Paper Summary
1
Contents
• Introduction
• Methods
• Experiments
• Conclusion
2
Caution!!!
• This is the material I summarized a paper at my personal research meeting.
• Some of the contents may be incorrect!
• Some experiments are excluded intentionally, because they are not directly
related to my research interest.
• Methods are simplified for easy explanation.
• Please send me an email if you want to contact me: sjlee1218@postech.ac.kr
(for correction or addition of materials, ideas to develop this paper, discussion).
3
Situations
• Novelty-based RL exploration methods incentivize exploration using novelty
as intrinsic rewards.
• The novelty is calculated based on the degree to which an observation is new.
4
Complication
• The existing visual novelty-based methods will fail on partial observable high
dimensional state space, especially in 3D environments.
• Because the semantically similar states can be observed very differently, in
terms of the point of view.
5
Questions & Hypothesis
• Question:
• Can you recognize similarities in high dimensional states that are
semantically similar but visually different for novelty-based exploration?
• Hypothesis:
• Language abstraction can make semantic-based novelty intrinsic reward,
accelerating exploration.
6
Contributions
• This paper shows novelty calculation using language abstraction can
accelerate RL exploration because
• 1) language can abstracts a state space coarsely, and
• 2) language can abstracts a state space semantically.
• Furthermore, this paper shows the above idea is applicable in environments
without language, using vision-language model (VLM)
7
Methods
8
Problem Formulation
• Goal conditioned MDP
• , goal space. A goal is a language instruction in this paper.
• .
• , language oracle is used in proof of concept experiments (PoC).
• Oracle output is never observed by an agent and is distinct from the instruction .
•
that maximizes is considered.
(S, A, G, P, Re, γ)
G
Re : S × G → ℝ
𝒪 : S → L
g
πg( ⋅ |S) E[
H
∑
t=0
γt
(re
t + βri
t)]
9
Method Outline
• Novelty calculation baseline + RL + (Language encoder or pretrained VLM)
• Novelty calculation baseline: Random Network Distillation
• RL agent cannot see oracle language and does not share any parameters
with pretrained models.
10
Method - Novelty Calculation Baseline
Outline
• Random Network Distillation (RND)
• RND makes the higher intrinsic rewards when visiting the unfamiliar states
using the trainable network.
• Today’s paper calls the original RND as visual RND, VisRND.
11
Method - Novelty Calculation Baseline
Environment interaction diagram
• RND makes intrinsic rewards using two state encoders:
• a fixed target function and a trainable predictor function .
ffixed fψ
12
Method - Novelty Calculation Baseline
Calculation of intrinsic reward
• Intrinsic reward
• Target function .
• Deterministic, randomly initialized, fixed NN.
• Predictor function .
• Trainable NN with parameter .
ri
= ||ffixed(s) − fψ(s)||2
ffixed : S → ℝk
fψ : S → ℝk
ψ
13
Method - Novelty Calculation Baseline
Training of the state encoder
• is trained to mimic the random feature .
•
• implicitly stores the visit counts, familiarity of states.
fψ ffixed(s)
L(ψ) = ||ffixed(s) − fψ(s)||2
fψ
14
Method - Novelty Calculation Baseline
Training of a RL agent
• RL agent: on-policy IMPALA
•
Value loss , where
and .
•
Policy loss
L(ϕ) =
∑
t
[yt − Vϕ(st, g)]
2
yt =
t+n−1
∑
k=t
γk−t
rk + γn
Vϕ(st+n, g) rk = re
k + βri
k
L(θ) = −
∑
t
[
log πθ(at |st, g)(rt + γyt+1 − Vϕ(st, g))]
15
Method - Language Encoder
• Language-RND, Lang-RND, makes higher intrinsic rewards when getting the
unfamiliar language.
• Language is from the oracle, is a fixed random LSTM.
• Lang-RND shows language’s coarse abstraction is helpful for RL exploration.
ffixed : L → ℝk
16
Method - Oracle Language Distillation
• Language Distillation, LD, makes higher intrinsic rewards when getting the
visual observation with unfamiliar linguistic meaning.
• , oracle.
• , trained to generate text caption like oracle, with a CNN encoder
and LSTM decoder.
•
, where K is the length of oracle language.
• LD shows semantic meaning can accelerate RL exploration.
ffixed : S → L
fψ : S → L
ri
= −
K
∑
k=1
log (fψ(s))
(ffixed(s))k
k
17
Method - VLM Encoder
• Network Distillation, ND, makes higher intrinsic rewards when getting the
visual observation with unfamiliar linguistic meaning.
• , pretrained to make visual embedding aligned to the
corresponding language embedding.
• ND shows this paper’s idea is applicable in envs without language.
ffixed : S → ℝk
18
Experiments
19
PoC: Is Language a Meaningful Abstraction?
• Using oracle language, the authors does proof of concept (PoC) showing:
• 1) Language abstraction forces a RL agent to explore much more states,
• because language coarsely abstracts states.
• 2) Language abstraction forces a RL agent to explore semantically diverse
states,
• because language semantically abstracts states.
20
PoC
Environment
• Playroom Environment:
• Rooms with various household objects.
• Tasks: lift, put, find.
• Goal: an instruction like “find <object>”.
• If the goal is achieved, reward is +1 and an episode ends.
• Oracle language is made by Unity.
21
PoC - Coarse Abstraction by Language
Results
• Claim:
• If language abstracts the state space coarsely first, novelty by the
abstraction accelerates RL exploration.
22
PoC: Methods Taxonomy
23
Is state space
abstracted
coarsely?
Is semantic
meaning
considered?
Trainable
Network
Target
function
Vis-RND X X Fixed random NN
Lang-RND O Fixed random NN
S → Rk
L → Rk
△
PoC - Coarse Abstraction by Language
Results
• Claim:
• If language abstracts the state space coarsely first, novelty by the abstraction accelerates
RL exploration.
• Exploration with language novelty solves the tasks much faster than
exploration with visual novelty.
24
Lang-RND: state -> language -> random feature
Vis-RND: state -> random feature
Trajectory comparison between
Lang-RND v.s. Vis-RND
PoC - Coarse Abstraction by Language
Why coarse? And so what?
• States are coarsely grouped into language by Unity oracle.
• Therefore, the agent should explore wider to get higher intrinsic rewards.
• Because random language features are made from the oracle language,
random feature space coarsely abstracts the states.
25
POC - Semantic Diversity from Images
• Claims:
• 1) Just coarse abstraction is not enough. Semantic should be considered.
• 2) We can use language abstraction-based novelty from visual states.
26
PoC: Methods Taxonomy
27
Is state space
abstracted
coarsely?
Is semantic
meaning
considered?
Trainable
Network
Target
function
Vis-RND X X Fixed random NN
Lang-RND O Fixed random NN
Shuffled
Language
Distillation
O X
Fixed random NN
whose
distribution is
same with oracle
Language
Distillation
O O Unity oracle
S → Rk
L → Rk
S → L
S → L
△
POC - Semantic Diversity from Images
Results
• Claims:
• 1) Just coarse abstraction is not enough. Semantic should be considered.
• 2) We can use language abstraction-based novelty from visual states.
• Exploration with coarse + meaningful embedding helps exploration more than
coarse + meaningless embedding.
28
LD: state -> meaningful text
S-LD: state -> meaningless text S-LD output examples
POC - Semantic Diversity
Why semantically diverse? And so what?
• LD makes higher intrinsic rewards when visiting states with newer semantics.
• RL agent should explore semantically diverse states to get higher .
ri
29
POC - Semantic Diversity
Why semantically diverse? And so what?
• LD makes higher intrinsic rewards when visiting states with newer semantics.
• RL agent should explore semantically diverse states to get higher .
• The dramatic gap between LD v.s. S-LD is due to the environment choice.
• Because oracle makes captions of agent interactions.
• LD interacts more, getting higher intrinsic rewards with new languages.
ri
30
Experiments - Intrinsic Rewards with VLM
• Authors use VLM encoder to eliminate the need for the language oracle.
• An agent gets higher intrinsic rewards when visiting states with the
unfamiliar linguistic embedding.
31
Experiments - Intrinsic Rewards with VLM
Results
• With coarse and semantic embedding of VLM, ALM-ND can learns faster than
Vis-RND, without oracle language.
• ALM-ND uses an ALIGN model encoder, pretrained to make image
embedding aligned to the corresponding text embedding.
32
Conclusion
• Conclusion:
• Novelty calculation using language abstraction can accelerate RL exploration
because it abstracts state space 1) coarsely 2) semantically.
• Novelty calculation using language abstraction works on various settings; on-policy
and off-policy, different novelty calculations, different 3D domains even without
oracle language.
• Limitations:
• There is no 2D env performance comparison with existing visual novelty methods.
• The performance of pretrained VLM affects the resulting RL sample efficiency a lot.
33
Appendix
36
Contents
• Related works and rationale
• Methods - novelty calculation baseline: Random Network Distillation PoC
• Methods - novelty calculation baseline: Never Give Up
• Methods - construction of S-LD
• More experiments
37
Why is This New?
• Existing family of intrinsic reward exploration methods could fail on 3D state spaces,
because those all use visual state representations.
• This method abstracts the states using semantic, avoiding useless explorations.
• Existing RL methods with language require environment-specific annotations or
semantic parsers.
• This method can be applied to any visually natural env using pretrained VLM.
• Existing RL with pretrained embeddings mainly put the embedding directly into the
agent.
• This method shows large pretrained models can be used to guide exploration.
38
Rationale
• Why would VLM representation be helpful for semantic novelty-based
exploration? Intuitions?
• 1. Language is inherently abstract.
• language links situations similarly which are superficially distinct but
causally related, and vice versa.
• 2. Language carries important information efficiently ignoring miscellaneous
noises.
39
Random Network Distillation
Proof of Concept
• Question: can be a novelty measure?
• Dataset: many 0 and other digit. (Eg. 5000 images for 0, 10 images for 1)
• is trained to .
||fψ(s) − ffixed(s)||2
N
fψ min
ψ
||fψ(s) − ffixed(s)||2
40
N, # of target class in training data
MSE
for
unseen data
Novelty Calculation Baseline - Never Give Up
Outline
• Never Give Up (NGU) makes based on how new the state is in this episode.
• NUG components
• , state encoder:
• , memory of for all in one episode.
• is distinct with the experience replay buffer of the whole game.
ri
fψ s → Rk
M f(s) s
M
41
Novelty Calculation Baseline - Never Give Up
Environment interaction diagram
• is made by encoder and non-parametric buffer of encoded states .
• does not share any parameters with a RL agent.
ri
fψ M
fψ
42
Novelty Calculation Baseline - Never Give Up
Calculation of intrinsic reward
•
• is k-nearest neighbors of in the episode memory .
• is bigger when the encoded state is far from the existing encoded states.
• is filled with for all which are visited states so far in this episode.
ri
= R(f(s′), M) ∝
∑
f(x)∈knn(f(s′),M)
||f(s′) − f(x)||2
knn(f(s′), M) f(s′) M
ri
M f(s) s
43
Novelty Calculation Baseline - Never Give Up
Training of the state encoder
• is trained to extract visual features related only to the agent’s actions.
• , where is MLP.
• should extract the features relevant to its action.
• In the today’s paper, only Vis-NGU trains in this way.
• Lang-NGU, LSE-NGU use the fixed pretrained (CLIP, ALM etc.).
fψ
at = h(fψ(st), fψ(st+1)) h
fψ
fψ
fψ
44
Novelty Calculation Baseline - Never Give Up
Training of a RL agent
• RL agent: DRQN + -greedy.
• Q function loss: ,
where ~ experience replay buffer of the whole game.
ϵ
L(ϕ) = ||(re
t + βri
t) + γQϕ(st+1, at+1) − Qϕ(st, at)||2
(st, at, re
t , ri
t, st+1)
45
S-LD Construction
• S-LD uses a fixed target network , whose output distribution is
same with the oracle, but the mapping is random.
• construction procedure:
• Get the empirical oracle language distribution by , which is trained by LD.
• Get an image embedding as a real number using random fixed NN.
• Map the random real number to language, according to the oracle distribution.
ffixed : S → L
ffixed
πLD
46
Pretrained Image Model instead of VLM
• CNN encoder pretrained on ImageNet is compared.
• Language embedding gets much bigger performance in harder tasks. (Put,
Find)
47
Oracle Language v.s. Image Embedding from VLM
• Methods with image are not significantly worse than methods with oracle
languages.
48
Visited State Heatmap in City Environment
• NGU variants explores in City environment only with intrinsic rewards.
• Language abstractions makes the visited state wider.
49
Observation examples in City Environment
1 de 47

Recomendados

NovelD: A Simple yet Effective Exploration Criterion por
NovelD: A Simple yet Effective Exploration CriterionNovelD: A Simple yet Effective Exploration Criterion
NovelD: A Simple yet Effective Exploration CriterionSeungjoon1
4 vistas39 diapositivas
Natural Language to Visualization by Neural Machine Translation por
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translationivaderivader
157 vistas20 diapositivas
230906 paper summary - learning to world model with language - public.pdf por
230906 paper summary - learning to world model with language - public.pdf230906 paper summary - learning to world model with language - public.pdf
230906 paper summary - learning to world model with language - public.pdfSeungjoon1
38 vistas26 diapositivas
230915 paper summary learning to world model with language with details - pub... por
230915 paper summary learning to world model with language with details - pub...230915 paper summary learning to world model with language with details - pub...
230915 paper summary learning to world model with language with details - pub...Seungjoon1
33 vistas38 diapositivas
Talk from NVidia Developer Connect por
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
497 vistas36 diapositivas
Learning RGB-D Salient Object Detection using background enclosure, depth con... por
Learning RGB-D Salient Object Detection using background enclosure, depth con...Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...Benyamin Moadab
114 vistas26 diapositivas

Más contenido relacionado

Similar a 230922 Semantic Exploration from Language Abstractions and Pretrained Representations - public.pdf

Leveraging high level and low-level features for multimedia event detection.2... por
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Lu Jiang
554 vistas29 diapositivas
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre... por
NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...ssuser4b1f48
70 vistas26 diapositivas
Named entity extraction tools for raw OCR text por
Named entity extraction tools for raw OCR textNamed entity extraction tools for raw OCR text
Named entity extraction tools for raw OCR textKepa J. Rodriguez
3.1K vistas19 diapositivas
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review por
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Reviewchangedaeoh
59 vistas30 diapositivas
Segmentation - based Historical Handwritten Word Spotting using document-spec... por
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Konstantinos Zagoris
1.1K vistas20 diapositivas
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T... por
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Spark Summit
4.1K vistas40 diapositivas

Similar a 230922 Semantic Exploration from Language Abstractions and Pretrained Representations - public.pdf(20)

Leveraging high level and low-level features for multimedia event detection.2... por Lu Jiang
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...
Lu Jiang554 vistas
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre... por ssuser4b1f48
NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
ssuser4b1f4870 vistas
Named entity extraction tools for raw OCR text por Kepa J. Rodriguez
Named entity extraction tools for raw OCR textNamed entity extraction tools for raw OCR text
Named entity extraction tools for raw OCR text
Kepa J. Rodriguez3.1K vistas
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review por changedaeoh
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
changedaeoh59 vistas
Segmentation - based Historical Handwritten Word Spotting using document-spec... por Konstantinos Zagoris
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Konstantinos Zagoris1.1K vistas
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T... por Spark Summit
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Spark Summit4.1K vistas
Nlp and transformer (v3s) por H K Yoon
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
H K Yoon146 vistas
Understanding Large Social Networks | IRE Major Project | Team 57 por Raj Patel
Understanding Large Social Networks | IRE Major Project | Team 57 Understanding Large Social Networks | IRE Major Project | Team 57
Understanding Large Social Networks | IRE Major Project | Team 57
Raj Patel91 vistas
Ire presentation por Raj Patel
Ire presentationIre presentation
Ire presentation
Raj Patel251 vistas
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE por Raj Patel
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Raj Patel221 vistas
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks por Junho Cho
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
Junho Cho273 vistas
Platform-independent static binary code analysis using a meta-assembly language por zynamics GmbH
Platform-independent static binary code analysis using a meta-assembly languagePlatform-independent static binary code analysis using a meta-assembly language
Platform-independent static binary code analysis using a meta-assembly language
zynamics GmbH613 vistas
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 por Saurabh Kaushik
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik2.5K vistas
Depth Fusion from RGB and Depth Sensors by Deep Learning por Yu Huang
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep Learning
Yu Huang1K vistas
Packed Levitated Marker for Entity and Relation Extraction por taeseon ryu
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu9 vistas
ECCV2022 paper reading - MultiMAE: Multi-modal Multi-task Masked Autoencoders... por Antonio Tejero de Pablos
ECCV2022 paper reading - MultiMAE: Multi-modal Multi-task Masked Autoencoders...ECCV2022 paper reading - MultiMAE: Multi-modal Multi-task Masked Autoencoders...
ECCV2022 paper reading - MultiMAE: Multi-modal Multi-task Masked Autoencoders...
PR-355: Masked Autoencoders Are Scalable Vision Learners por Jinwon Lee
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee1.5K vistas
Advanced deep learning based object detection methods por Brodmann17
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
Brodmann171.6K vistas

Último

Note on the Riemann Hypothesis por
Note on the Riemann HypothesisNote on the Riemann Hypothesis
Note on the Riemann Hypothesisvegafrank2
7 vistas20 diapositivas
A giant thin stellar stream in the Coma Galaxy Cluster por
A giant thin stellar stream in the Coma Galaxy ClusterA giant thin stellar stream in the Coma Galaxy Cluster
A giant thin stellar stream in the Coma Galaxy ClusterSérgio Sacani
18 vistas14 diapositivas
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio... por
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...Trustlife
142 vistas17 diapositivas
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor... por
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Trustlife
112 vistas17 diapositivas
별헤는 사람들 2023년 12월호 전명원 교수 자료 por
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료sciencepeople
63 vistas30 diapositivas
1978 NASA News Release Log por
1978 NASA News Release Log1978 NASA News Release Log
1978 NASA News Release Logpurrterminator
15 vistas146 diapositivas

Último(20)

Note on the Riemann Hypothesis por vegafrank2
Note on the Riemann HypothesisNote on the Riemann Hypothesis
Note on the Riemann Hypothesis
vegafrank27 vistas
A giant thin stellar stream in the Coma Galaxy Cluster por Sérgio Sacani
A giant thin stellar stream in the Coma Galaxy ClusterA giant thin stellar stream in the Coma Galaxy Cluster
A giant thin stellar stream in the Coma Galaxy Cluster
Sérgio Sacani18 vistas
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio... por Trustlife
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...
Trustlife142 vistas
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor... por Trustlife
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Trustlife112 vistas
별헤는 사람들 2023년 12월호 전명원 교수 자료 por sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople63 vistas
ELECTRON TRANSPORT CHAIN por DEEKSHA RANI
ELECTRON TRANSPORT CHAINELECTRON TRANSPORT CHAIN
ELECTRON TRANSPORT CHAIN
DEEKSHA RANI10 vistas
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... por ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 vistas
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... por InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific105 vistas
Light Pollution for LVIS students por CWBarthlmew
Light Pollution for LVIS studentsLight Pollution for LVIS students
Light Pollution for LVIS students
CWBarthlmew12 vistas
Experimental animal Guinea pigs.pptx por Mansee Arya
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptx
Mansee Arya38 vistas
How to be(come) a successful PhD student por Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens537 vistas
RemeOs science and clinical evidence por PetrusViitanen1
RemeOs science and clinical evidenceRemeOs science and clinical evidence
RemeOs science and clinical evidence
PetrusViitanen153 vistas
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... por ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI8 vistas
application of genetic engineering 2.pptx por SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz14 vistas
2. Natural Sciences and Technology Author Siyavula.pdf por ssuser821efa
2. Natural Sciences and Technology Author Siyavula.pdf2. Natural Sciences and Technology Author Siyavula.pdf
2. Natural Sciences and Technology Author Siyavula.pdf
ssuser821efa10 vistas

230922 Semantic Exploration from Language Abstractions and Pretrained Representations - public.pdf

  • 1. Machine Learning LABoratory Seungjoon Lee. 2023-09-22. sjlee1218@postech.ac.kr Semantic Exploration from Language Abstractions and Pretrained Representations Neurips 2022. Paper Summary 1
  • 2. Contents • Introduction • Methods • Experiments • Conclusion 2
  • 3. Caution!!! • This is the material I summarized a paper at my personal research meeting. • Some of the contents may be incorrect! • Some experiments are excluded intentionally, because they are not directly related to my research interest. • Methods are simplified for easy explanation. • Please send me an email if you want to contact me: sjlee1218@postech.ac.kr (for correction or addition of materials, ideas to develop this paper, discussion). 3
  • 4. Situations • Novelty-based RL exploration methods incentivize exploration using novelty as intrinsic rewards. • The novelty is calculated based on the degree to which an observation is new. 4
  • 5. Complication • The existing visual novelty-based methods will fail on partial observable high dimensional state space, especially in 3D environments. • Because the semantically similar states can be observed very differently, in terms of the point of view. 5
  • 6. Questions & Hypothesis • Question: • Can you recognize similarities in high dimensional states that are semantically similar but visually different for novelty-based exploration? • Hypothesis: • Language abstraction can make semantic-based novelty intrinsic reward, accelerating exploration. 6
  • 7. Contributions • This paper shows novelty calculation using language abstraction can accelerate RL exploration because • 1) language can abstracts a state space coarsely, and • 2) language can abstracts a state space semantically. • Furthermore, this paper shows the above idea is applicable in environments without language, using vision-language model (VLM) 7
  • 9. Problem Formulation • Goal conditioned MDP • , goal space. A goal is a language instruction in this paper. • . • , language oracle is used in proof of concept experiments (PoC). • Oracle output is never observed by an agent and is distinct from the instruction . • that maximizes is considered. (S, A, G, P, Re, γ) G Re : S × G → ℝ 𝒪 : S → L g πg( ⋅ |S) E[ H ∑ t=0 γt (re t + βri t)] 9
  • 10. Method Outline • Novelty calculation baseline + RL + (Language encoder or pretrained VLM) • Novelty calculation baseline: Random Network Distillation • RL agent cannot see oracle language and does not share any parameters with pretrained models. 10
  • 11. Method - Novelty Calculation Baseline Outline • Random Network Distillation (RND) • RND makes the higher intrinsic rewards when visiting the unfamiliar states using the trainable network. • Today’s paper calls the original RND as visual RND, VisRND. 11
  • 12. Method - Novelty Calculation Baseline Environment interaction diagram • RND makes intrinsic rewards using two state encoders: • a fixed target function and a trainable predictor function . ffixed fψ 12
  • 13. Method - Novelty Calculation Baseline Calculation of intrinsic reward • Intrinsic reward • Target function . • Deterministic, randomly initialized, fixed NN. • Predictor function . • Trainable NN with parameter . ri = ||ffixed(s) − fψ(s)||2 ffixed : S → ℝk fψ : S → ℝk ψ 13
  • 14. Method - Novelty Calculation Baseline Training of the state encoder • is trained to mimic the random feature . • • implicitly stores the visit counts, familiarity of states. fψ ffixed(s) L(ψ) = ||ffixed(s) − fψ(s)||2 fψ 14
  • 15. Method - Novelty Calculation Baseline Training of a RL agent • RL agent: on-policy IMPALA • Value loss , where and . • Policy loss L(ϕ) = ∑ t [yt − Vϕ(st, g)] 2 yt = t+n−1 ∑ k=t γk−t rk + γn Vϕ(st+n, g) rk = re k + βri k L(θ) = − ∑ t [ log πθ(at |st, g)(rt + γyt+1 − Vϕ(st, g))] 15
  • 16. Method - Language Encoder • Language-RND, Lang-RND, makes higher intrinsic rewards when getting the unfamiliar language. • Language is from the oracle, is a fixed random LSTM. • Lang-RND shows language’s coarse abstraction is helpful for RL exploration. ffixed : L → ℝk 16
  • 17. Method - Oracle Language Distillation • Language Distillation, LD, makes higher intrinsic rewards when getting the visual observation with unfamiliar linguistic meaning. • , oracle. • , trained to generate text caption like oracle, with a CNN encoder and LSTM decoder. • , where K is the length of oracle language. • LD shows semantic meaning can accelerate RL exploration. ffixed : S → L fψ : S → L ri = − K ∑ k=1 log (fψ(s)) (ffixed(s))k k 17
  • 18. Method - VLM Encoder • Network Distillation, ND, makes higher intrinsic rewards when getting the visual observation with unfamiliar linguistic meaning. • , pretrained to make visual embedding aligned to the corresponding language embedding. • ND shows this paper’s idea is applicable in envs without language. ffixed : S → ℝk 18
  • 20. PoC: Is Language a Meaningful Abstraction? • Using oracle language, the authors does proof of concept (PoC) showing: • 1) Language abstraction forces a RL agent to explore much more states, • because language coarsely abstracts states. • 2) Language abstraction forces a RL agent to explore semantically diverse states, • because language semantically abstracts states. 20
  • 21. PoC Environment • Playroom Environment: • Rooms with various household objects. • Tasks: lift, put, find. • Goal: an instruction like “find <object>”. • If the goal is achieved, reward is +1 and an episode ends. • Oracle language is made by Unity. 21
  • 22. PoC - Coarse Abstraction by Language Results • Claim: • If language abstracts the state space coarsely first, novelty by the abstraction accelerates RL exploration. 22
  • 23. PoC: Methods Taxonomy 23 Is state space abstracted coarsely? Is semantic meaning considered? Trainable Network Target function Vis-RND X X Fixed random NN Lang-RND O Fixed random NN S → Rk L → Rk △
  • 24. PoC - Coarse Abstraction by Language Results • Claim: • If language abstracts the state space coarsely first, novelty by the abstraction accelerates RL exploration. • Exploration with language novelty solves the tasks much faster than exploration with visual novelty. 24 Lang-RND: state -> language -> random feature Vis-RND: state -> random feature Trajectory comparison between Lang-RND v.s. Vis-RND
  • 25. PoC - Coarse Abstraction by Language Why coarse? And so what? • States are coarsely grouped into language by Unity oracle. • Therefore, the agent should explore wider to get higher intrinsic rewards. • Because random language features are made from the oracle language, random feature space coarsely abstracts the states. 25
  • 26. POC - Semantic Diversity from Images • Claims: • 1) Just coarse abstraction is not enough. Semantic should be considered. • 2) We can use language abstraction-based novelty from visual states. 26
  • 27. PoC: Methods Taxonomy 27 Is state space abstracted coarsely? Is semantic meaning considered? Trainable Network Target function Vis-RND X X Fixed random NN Lang-RND O Fixed random NN Shuffled Language Distillation O X Fixed random NN whose distribution is same with oracle Language Distillation O O Unity oracle S → Rk L → Rk S → L S → L △
  • 28. POC - Semantic Diversity from Images Results • Claims: • 1) Just coarse abstraction is not enough. Semantic should be considered. • 2) We can use language abstraction-based novelty from visual states. • Exploration with coarse + meaningful embedding helps exploration more than coarse + meaningless embedding. 28 LD: state -> meaningful text S-LD: state -> meaningless text S-LD output examples
  • 29. POC - Semantic Diversity Why semantically diverse? And so what? • LD makes higher intrinsic rewards when visiting states with newer semantics. • RL agent should explore semantically diverse states to get higher . ri 29
  • 30. POC - Semantic Diversity Why semantically diverse? And so what? • LD makes higher intrinsic rewards when visiting states with newer semantics. • RL agent should explore semantically diverse states to get higher . • The dramatic gap between LD v.s. S-LD is due to the environment choice. • Because oracle makes captions of agent interactions. • LD interacts more, getting higher intrinsic rewards with new languages. ri 30
  • 31. Experiments - Intrinsic Rewards with VLM • Authors use VLM encoder to eliminate the need for the language oracle. • An agent gets higher intrinsic rewards when visiting states with the unfamiliar linguistic embedding. 31
  • 32. Experiments - Intrinsic Rewards with VLM Results • With coarse and semantic embedding of VLM, ALM-ND can learns faster than Vis-RND, without oracle language. • ALM-ND uses an ALIGN model encoder, pretrained to make image embedding aligned to the corresponding text embedding. 32
  • 33. Conclusion • Conclusion: • Novelty calculation using language abstraction can accelerate RL exploration because it abstracts state space 1) coarsely 2) semantically. • Novelty calculation using language abstraction works on various settings; on-policy and off-policy, different novelty calculations, different 3D domains even without oracle language. • Limitations: • There is no 2D env performance comparison with existing visual novelty methods. • The performance of pretrained VLM affects the resulting RL sample efficiency a lot. 33
  • 35. Contents • Related works and rationale • Methods - novelty calculation baseline: Random Network Distillation PoC • Methods - novelty calculation baseline: Never Give Up • Methods - construction of S-LD • More experiments 37
  • 36. Why is This New? • Existing family of intrinsic reward exploration methods could fail on 3D state spaces, because those all use visual state representations. • This method abstracts the states using semantic, avoiding useless explorations. • Existing RL methods with language require environment-specific annotations or semantic parsers. • This method can be applied to any visually natural env using pretrained VLM. • Existing RL with pretrained embeddings mainly put the embedding directly into the agent. • This method shows large pretrained models can be used to guide exploration. 38
  • 37. Rationale • Why would VLM representation be helpful for semantic novelty-based exploration? Intuitions? • 1. Language is inherently abstract. • language links situations similarly which are superficially distinct but causally related, and vice versa. • 2. Language carries important information efficiently ignoring miscellaneous noises. 39
  • 38. Random Network Distillation Proof of Concept • Question: can be a novelty measure? • Dataset: many 0 and other digit. (Eg. 5000 images for 0, 10 images for 1) • is trained to . ||fψ(s) − ffixed(s)||2 N fψ min ψ ||fψ(s) − ffixed(s)||2 40 N, # of target class in training data MSE for unseen data
  • 39. Novelty Calculation Baseline - Never Give Up Outline • Never Give Up (NGU) makes based on how new the state is in this episode. • NUG components • , state encoder: • , memory of for all in one episode. • is distinct with the experience replay buffer of the whole game. ri fψ s → Rk M f(s) s M 41
  • 40. Novelty Calculation Baseline - Never Give Up Environment interaction diagram • is made by encoder and non-parametric buffer of encoded states . • does not share any parameters with a RL agent. ri fψ M fψ 42
  • 41. Novelty Calculation Baseline - Never Give Up Calculation of intrinsic reward • • is k-nearest neighbors of in the episode memory . • is bigger when the encoded state is far from the existing encoded states. • is filled with for all which are visited states so far in this episode. ri = R(f(s′), M) ∝ ∑ f(x)∈knn(f(s′),M) ||f(s′) − f(x)||2 knn(f(s′), M) f(s′) M ri M f(s) s 43
  • 42. Novelty Calculation Baseline - Never Give Up Training of the state encoder • is trained to extract visual features related only to the agent’s actions. • , where is MLP. • should extract the features relevant to its action. • In the today’s paper, only Vis-NGU trains in this way. • Lang-NGU, LSE-NGU use the fixed pretrained (CLIP, ALM etc.). fψ at = h(fψ(st), fψ(st+1)) h fψ fψ fψ 44
  • 43. Novelty Calculation Baseline - Never Give Up Training of a RL agent • RL agent: DRQN + -greedy. • Q function loss: , where ~ experience replay buffer of the whole game. ϵ L(ϕ) = ||(re t + βri t) + γQϕ(st+1, at+1) − Qϕ(st, at)||2 (st, at, re t , ri t, st+1) 45
  • 44. S-LD Construction • S-LD uses a fixed target network , whose output distribution is same with the oracle, but the mapping is random. • construction procedure: • Get the empirical oracle language distribution by , which is trained by LD. • Get an image embedding as a real number using random fixed NN. • Map the random real number to language, according to the oracle distribution. ffixed : S → L ffixed πLD 46
  • 45. Pretrained Image Model instead of VLM • CNN encoder pretrained on ImageNet is compared. • Language embedding gets much bigger performance in harder tasks. (Put, Find) 47
  • 46. Oracle Language v.s. Image Embedding from VLM • Methods with image are not significantly worse than methods with oracle languages. 48
  • 47. Visited State Heatmap in City Environment • NGU variants explores in City environment only with intrinsic rewards. • Language abstractions makes the visited state wider. 49 Observation examples in City Environment