SlideShare una empresa de Scribd logo
1 de 55
Info-Wasserstein-GAIL
Yunzhu Li, Jiaming Song, Stefano Ermon, “Inferring The Latent Structure
of Human Decision-Making from Raw Visual Inputs”, ArXiv, 2017
Sungjoon Choi
(sungjoon.choi@cpslab.snu.ac.kr)
Latent Structure of Human Demos
2
Pass / code: 0 Pass / code: 1
Turn/ code: 0 Turn/ code: 1
• Introduction
• Backgrounds
• Generative Adversarial Imitation Learning (GAIL)
• Policy gradient
• InfoGAN
• Wasserstein GAN
• InfoGAIL
• Experiments
Contents
3
• Goal of imitation learning is to match expert
behavior.
• However, demonstrations often show significant
variability due to latent factors.
• This paper presents an Info-GAIL algorithm that
can infer the latent structure of human decision
making.
• This method can not only imitate, but also learn
interpretable representations.
Imitation Learning
4
• The goal of this paper is to develop an imitation
learning framework that is able to autonomously
discover and disentangle the latent factors of
variation underlying human decision making.
• Basically, this paper combines generative
adversarial imitation learning (GAIL), Info GAN,
and Wasserstein GAN with some reward
heuristics
Introduction
5
• We will NOT go into details.
GAIL
6
• But, we will see some basics of policy gradient methods.
Policy Gradient
7
Now we Get rid of expectation
over a policy function!!
Policy Gradient
8
Step-based PG
9
Step-based PG
10
In other words, now we are considering a dynamic model!
Step-based PG
11
We do NOT have to care about
complex models in an MDP, anymore!
Step-based PG (REINFORCE)
12
Now, we have REINFORCE algorithm!
This method has been used in many deep learning methods
where the objective function is NOT differentiable.
Step-based PG (PG)
13
For all trajectories, and for all instances in a trajectory,
the PG is simply weighted MLE where the weight is defined by
the sum of future rewards, or Q value.
• Now, we know where (18) came from, right?
GAIL
14
• Interpretable Imitation Learning
• Utilized information theoretic regularization.
• Simply added InfoGAN to GAIL.
• Utilizing Raw Visual Inputs via Transfer Learning
• Used a Deep Residual Network.
Visual InfoGAIL
15
• Rather than using a single unstructured noise vector,
InfoGAN decomposes the input noise vector into two
parts: (1) z, incompressible noise and (2) c, the latent code
that targets the salient structured semantic features of the
data distribution.
• InfoGAN proposes an information-theoretic regularization:
there should be high mutual information between latent
codes c and generator distribution G(z, c). Thus I(c; G(z, c))
should be high.
InfoGAN
16
• Reward Augmentation
• A general framework to incorporate prior knowledge in imitation
learning by providing additional incentives to the agent without
interfering with the imitation learning process.
• Added a surrogate state-based reward that reflects our biases over
the desired behaviors.
• Can be seen as
• a hybrid between imitation and reinforcement learning
• side information provided to the generator
• Wasserstein GAN (WGAN)
• The discrimination network in WGAN solves a regression problem
instead of a classification problem.
• Suffers less from the vanishing gradient and mode collapse problem.
Improved Optimization
17
• Wasserstein Generative Adversarial Learning
WGAN?
18
Example 1 in WGAN
WGAN, practically
48
• Variance Reduction
• Reduce variance in policy gradient method.
• Replay buffer method with prioritized replay.
• Good for the cases where the rewards are rare.
• Baseline variance reduction methods.
Improved Optimization
49
Finally, InfoGAIL
50
Sample data similar to InfoGAN
Update D similar to WGAN.
Initialize policy from behavior cloning
Update Q similar to GAN or GAIL.
Update Policy with TRPO.
Network Architectures
51
Latent codes are
added to G
Latent codes are also
added to D
Actions are added to D
The posterior network Q adopts the same
architecture as D except that the output is
a softmax over the discrete latent variables,
or factored Gaussian over continuous
latent variables.
Input Image
Action
Disc. Latent Code Cont. Latent Code
G (policy)
Input Image
Action Disc. Latent Code
D (cost)
Score
Input Image
Action Disc. Latent Code
Q (regularizer)
Disc. Latent Code Cont. Latent Code
Train policy function G with TRPO, and iterate.
Experiments
53
Pass / code: 0 Pass / code: 1
Turn/ code: 0 Turn/ code: 1
Experiments
54
InfoGAIL

Más contenido relacionado

La actualidad más candente

Synthetic data generation for machine learning
Synthetic data generation for machine learningSynthetic data generation for machine learning
Synthetic data generation for machine learning
QuantUniversity
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 

La actualidad más candente (20)

Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
 
Security and Privacy of Machine Learning
Security and Privacy of Machine LearningSecurity and Privacy of Machine Learning
Security and Privacy of Machine Learning
 
Global Governance of Generative AI: The Right Way Forward
Global Governance of Generative AI: The Right Way ForwardGlobal Governance of Generative AI: The Right Way Forward
Global Governance of Generative AI: The Right Way Forward
 
CNS - Unit - 2 - Stream Ciphers and Block Ciphers
CNS - Unit - 2 - Stream Ciphers and Block CiphersCNS - Unit - 2 - Stream Ciphers and Block Ciphers
CNS - Unit - 2 - Stream Ciphers and Block Ciphers
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial Networks
 
Synthetic data generation for machine learning
Synthetic data generation for machine learningSynthetic data generation for machine learning
Synthetic data generation for machine learning
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
 
Digital signature
Digital signatureDigital signature
Digital signature
 

Similar a InfoGAIL

Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Jisu Han
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.ppt
ManiMaran230751
 
consistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_review
Yoonho Na
 

Similar a InfoGAIL (20)

InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417
 
NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...
NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...
NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...
 
Paper review
Paper reviewPaper review
Paper review
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
 
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
 
Clustering using GA and Hill-climbing
Clustering using GA and Hill-climbingClustering using GA and Hill-climbing
Clustering using GA and Hill-climbing
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.ppt
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Gans - Generative Adversarial Nets
Gans - Generative Adversarial NetsGans - Generative Adversarial Nets
Gans - Generative Adversarial Nets
 
Generative Adversarial Networks 2
Generative Adversarial Networks 2Generative Adversarial Networks 2
Generative Adversarial Networks 2
 
Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architectures
 
consistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_review
 
Using GANs to improve generalization in a semi-supervised setting - trying it...
Using GANs to improve generalization in a semi-supervised setting - trying it...Using GANs to improve generalization in a semi-supervised setting - trying it...
Using GANs to improve generalization in a semi-supervised setting - trying it...
 
Semi-supervised learning with GANs
Semi-supervised learning with GANsSemi-supervised learning with GANs
Semi-supervised learning with GANs
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...
NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...
NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical data
 
Machine Learning - Supervised Learning
Machine Learning - Supervised LearningMachine Learning - Supervised Learning
Machine Learning - Supervised Learning
 

Más de Sungjoon Choi

Más de Sungjoon Choi (20)

RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
 
Hybrid computing using a neural network with dynamic external memory
Hybrid computing using a neural network with dynamic external memoryHybrid computing using a neural network with dynamic external memory
Hybrid computing using a neural network with dynamic external memory
 
Modeling uncertainty in deep learning
Modeling uncertainty in deep learning Modeling uncertainty in deep learning
Modeling uncertainty in deep learning
 
Gaussian Process Latent Variable Model
Gaussian Process Latent Variable ModelGaussian Process Latent Variable Model
Gaussian Process Latent Variable Model
 
Uncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep LearningUncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep Learning
 
Recent Trends in Deep Learning
Recent Trends in Deep LearningRecent Trends in Deep Learning
Recent Trends in Deep Learning
 
Leveraged Gaussian Process
Leveraged Gaussian ProcessLeveraged Gaussian Process
Leveraged Gaussian Process
 
LevDNN
LevDNNLevDNN
LevDNN
 
IROS 2017 Slides
IROS 2017 SlidesIROS 2017 Slides
IROS 2017 Slides
 
Domain Adaptation Methods
Domain Adaptation MethodsDomain Adaptation Methods
Domain Adaptation Methods
 
Connection between Bellman equation and Markov Decision Processes
Connection between Bellman equation and Markov Decision ProcessesConnection between Bellman equation and Markov Decision Processes
Connection between Bellman equation and Markov Decision Processes
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
 
Inverse Reinforcement Learning Algorithms
Inverse Reinforcement Learning AlgorithmsInverse Reinforcement Learning Algorithms
Inverse Reinforcement Learning Algorithms
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
 
Deep Learning in Robotics
Deep Learning in RoboticsDeep Learning in Robotics
Deep Learning in Robotics
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
TensorFlow Tutorial Part2
TensorFlow Tutorial Part2TensorFlow Tutorial Part2
TensorFlow Tutorial Part2
 

Último

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Último (20)

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

InfoGAIL

  • 1. Info-Wasserstein-GAIL Yunzhu Li, Jiaming Song, Stefano Ermon, “Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs”, ArXiv, 2017 Sungjoon Choi (sungjoon.choi@cpslab.snu.ac.kr)
  • 2. Latent Structure of Human Demos 2 Pass / code: 0 Pass / code: 1 Turn/ code: 0 Turn/ code: 1
  • 3. • Introduction • Backgrounds • Generative Adversarial Imitation Learning (GAIL) • Policy gradient • InfoGAN • Wasserstein GAN • InfoGAIL • Experiments Contents 3
  • 4. • Goal of imitation learning is to match expert behavior. • However, demonstrations often show significant variability due to latent factors. • This paper presents an Info-GAIL algorithm that can infer the latent structure of human decision making. • This method can not only imitate, but also learn interpretable representations. Imitation Learning 4
  • 5. • The goal of this paper is to develop an imitation learning framework that is able to autonomously discover and disentangle the latent factors of variation underlying human decision making. • Basically, this paper combines generative adversarial imitation learning (GAIL), Info GAN, and Wasserstein GAN with some reward heuristics Introduction 5
  • 6. • We will NOT go into details. GAIL 6 • But, we will see some basics of policy gradient methods.
  • 7. Policy Gradient 7 Now we Get rid of expectation over a policy function!!
  • 10. Step-based PG 10 In other words, now we are considering a dynamic model!
  • 11. Step-based PG 11 We do NOT have to care about complex models in an MDP, anymore!
  • 12. Step-based PG (REINFORCE) 12 Now, we have REINFORCE algorithm! This method has been used in many deep learning methods where the objective function is NOT differentiable.
  • 13. Step-based PG (PG) 13 For all trajectories, and for all instances in a trajectory, the PG is simply weighted MLE where the weight is defined by the sum of future rewards, or Q value.
  • 14. • Now, we know where (18) came from, right? GAIL 14
  • 15. • Interpretable Imitation Learning • Utilized information theoretic regularization. • Simply added InfoGAN to GAIL. • Utilizing Raw Visual Inputs via Transfer Learning • Used a Deep Residual Network. Visual InfoGAIL 15
  • 16. • Rather than using a single unstructured noise vector, InfoGAN decomposes the input noise vector into two parts: (1) z, incompressible noise and (2) c, the latent code that targets the salient structured semantic features of the data distribution. • InfoGAN proposes an information-theoretic regularization: there should be high mutual information between latent codes c and generator distribution G(z, c). Thus I(c; G(z, c)) should be high. InfoGAN 16
  • 17. • Reward Augmentation • A general framework to incorporate prior knowledge in imitation learning by providing additional incentives to the agent without interfering with the imitation learning process. • Added a surrogate state-based reward that reflects our biases over the desired behaviors. • Can be seen as • a hybrid between imitation and reinforcement learning • side information provided to the generator • Wasserstein GAN (WGAN) • The discrimination network in WGAN solves a regression problem instead of a classification problem. • Suffers less from the vanishing gradient and mode collapse problem. Improved Optimization 17
  • 18. • Wasserstein Generative Adversarial Learning WGAN? 18
  • 19.
  • 20. Example 1 in WGAN
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 49. • Variance Reduction • Reduce variance in policy gradient method. • Replay buffer method with prioritized replay. • Good for the cases where the rewards are rare. • Baseline variance reduction methods. Improved Optimization 49
  • 50. Finally, InfoGAIL 50 Sample data similar to InfoGAN Update D similar to WGAN. Initialize policy from behavior cloning Update Q similar to GAN or GAIL. Update Policy with TRPO.
  • 51. Network Architectures 51 Latent codes are added to G Latent codes are also added to D Actions are added to D The posterior network Q adopts the same architecture as D except that the output is a softmax over the discrete latent variables, or factored Gaussian over continuous latent variables.
  • 52. Input Image Action Disc. Latent Code Cont. Latent Code G (policy) Input Image Action Disc. Latent Code D (cost) Score Input Image Action Disc. Latent Code Q (regularizer) Disc. Latent Code Cont. Latent Code Train policy function G with TRPO, and iterate.
  • 53. Experiments 53 Pass / code: 0 Pass / code: 1 Turn/ code: 0 Turn/ code: 1