SlideShare una empresa de Scribd logo
1 de 191
Descargar para leer sin conexión
VARIANTS OF
GANs
Jaejun Yoo
Ph.D. Candidate @KAIST
13th May, 2017
초짜 대학원생의 입장에서 이해하는
안녕하세요 저는
유재준
- Ph.D. Candidate
- Medical Image Reconstruction,
- http://jaejunyoo.blogspot.com/
- CVPR 2017 NTIRE Challenge:
Ranked 3rd
Topological Data Analysis, EEG
이 강의의 목표
1. GAN에 대한 더 깊은 이해
2. 이후 GAN 연구의 흐름을 따라가기 위한 기반 다지기
• 기존의 GAN이 가지고 있는 문제점과 그 이유에 대한 이해
• Variants of GAN을 소개하되 주요 문제점을 해결하거나 큰 틀에서 새
로운 방향을 제시한 논문들 위주 소개
BACKGROUND
PREREQUISITES
Generative Models
“FACE IMAGES”
PREREQUISITES
Generative Models
* Figure adopted from BEGAN paper released at 31. Mar. 2017
David Berthelot et al. Google (link)
Generated Images by Neural Network
PREREQUISITES
Generative Models
“What I cannot create, I do not understand”
PREREQUISITES
Generative Models
“What I cannot create, I do not understand”
If the network can learn how to draw cat and dog separately,
it must be able to classify them, i.e. feature learning follows naturally.
PREREQUISITES
Taxonomy of Machine Learning
From Yann Lecun, (NIPS 2016)From David silver, Reinforcement learning (UCL course on RL, 2015)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
y = f(x)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
PREREQUISITES
Taxonomy of Machine Learning
From Yann Lecun, (NIPS 2016)From David silver, Reinforcement learning (UCL course on RL, 2015)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
PREREQUISITES
Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
* Figure adopted from NIPS 2016 Tutorial: GAN paper, Ian Goodfellow 2016
GAN
SCHEMATIC OVERVIEW
z
G
D
x
Real or Fake?
Diagram of
Standard GAN
Gaussian noise as an input for G
z
G
D
x
Real or Fake?
Diagram of
Standard GAN
지폐위조범
경찰
SCHEMATIC OVERVIEW
z
G
D
x
Real or Fake?
Diagram of
Standard GAN
지폐위조범
경찰
QP
SCHEMATIC OVERVIEW
Diagram of
Standard GAN
Data distribution
Model distribution
Discriminator
SCHEMATIC OVERVIEW
* Figure adopted from Generative Adversarial Nets, Ian Goodfellow et al. 2014
Minimax problem of GAN
THEORETICAL RESULTS
Show that…
1. The minimax problem of GAN has a global optimum at 𝒑𝒑𝒈𝒈 = 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅
2. The proposed algorithm can find that global optimum
TWO STEP APPROACH
THEORETICAL RESULTS
Proposition 1.
THEORETICAL RESULTS
Main Theorem
THEORETICAL RESULTS
Convergence of the proposed algorithm
SUMMARY
• Supervised / Unsupervised / Reinforcement Learning
• Generative Models
• Variational Inference Technique
• Adversarial Training
• Reduce the gap between 𝑸𝑸 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 and 𝑷𝑷𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅
TAKE HOME KEYPOINTS
QP
RELATED WORKS (APPLICATIONS)
* CycleGAN Jun-Yan Zhu et al. 2017
* SRGAN Christian Ledwig et al.
2017
Super-resolution
Domain
Adaptation
Img2Img
Translation
DIFFICULTIES
DIFFICULTIES
DIFFICULTIES CONVERGENCE OF THE MODEL
DIFFICULTIES MODE COLLAPSE (SAMPLE DIVERSITY)
* Slide adopted from NIPS 2016 Tutorial, Ian Goodfellow
DIFFICULTIES
DIFFICULTIES
TEMPORARY SOLUTION
DIFFICULTIES HOW TO EVALUATE THE QUALITY?
DIFFICULTIES HOW TO EVALUATE THE QUALITY?
DIFFICULTIES HOW TO EVALUATE THE QUALITY?
TEMPORARY SOLUTION
SUMMARY
“TRAINING GAN IS HARD”
• Power balance (NO learning)
• Convergence (oscillation)
• Mode collapse
• Evaluation (GAN training loss is intractable)
SUMMARY
“TRAINING GAN IS HARD”
• Power balance (NO learning)
• Convergence (oscillation)
• Mode collapse
• Evaluation (GAN training loss is intractable)
HOW TO SOLVE THESE PROBLEMS?
DCGAN
LET’S CAREFULLY SELECT THE ARCHITECTURE!
MOTIVATION
TRAINING IS TOOOO HARD…
(Ahhh…it just does not work…orz…)
SCHEMATIC OVERVIEW
Guideline for stable learning
SCHEMATIC OVERVIEW
Guideline for stable learning
“However, after extensive model exploration we identified a family of architectures that resulted in
stable training across a range of datasets and allowed for higher resolution and deeper generative models.”
SCHEMATIC OVERVIEW
Guideline for stable learning
“However, after extensive model exploration we identified a family of architectures that resulted in
stable training across a range of datasets and allowed for higher resolution and deeper generative models.”
"Most GANs today are at least loosely based on the DCGAN architecture."
- NIPS 2016 Tutorial by Ian Goodfellow
Okay, learning is finished and the model converged. Then…
KEYPOINTS
“How to show that our network or generator learned
A MEANINGFUL FUNCTION?”
KEYPOINTS
• The generator DOES NOT MEMORIZED the images.
• There are NO SHARP TRANSITION while walking in the latent space.
• The generator UNDERSTANDS the feature of the data.
Okay, learning is finished and the model converged. Then…
Show that
RESULTS
* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)
What can GAN do?
RESULTS
* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)
What can GAN do?
“Walking in the latent space”
z-space
RESULTS
* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)
What can GAN do?
RESULTS
* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)
What can GAN do?
“Forgetting the feature it learned”
RESULTS
What can GAN do?
“Vector arithmetic“
(e.g. word2vec)
RESULTS
What can GAN do?
“Vector arithmetic“
(e.g. word2vec)
RESULTS
* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)
What can GAN do?
“Vector arithmetic“
(e.g. word2vec)
RESULTS
Neural network understanding “Rotation”
* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)
What can GAN do?
“Understand the meaning of the data“
(e.g. code: rotation, category, and etc.)
SUMMARY
1. Guideline for stable learning
2. Good analysis on the results
• Show that the generator DOES NOT MEMORIZED the images
• Show that there are NO SHARP TRANSITION while walking in the latent space
• Show that the generator UNDERSTANDS the feature of the data
Unrolled GAN
LET’S GIVE EXTRA INFORMATION TO THE NETWORK
(allow it to ‘see into the future’)
Convergence of the proposed algorithm
MOTIVATION
Impossible to achieve in practice
MOTIVATION
WHAT HAPPENS?
* Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016
MOTIVATION
WHAT HAPPENS?
* Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016
“GAN procedure normally do not cover the whole distribution,
even when targeting a mode covering divergence such as KL”
MOTIVATION
WHAT HAPPENS?
VS
𝑮𝑮∗
= 𝒎𝒎𝒎𝒎 𝒎𝒎
𝑮𝑮
𝒎𝒎𝒎𝒎𝒎𝒎
𝑫𝑫
𝑽𝑽(𝑮𝑮, 𝑫𝑫)
𝑮𝑮∗
= 𝒎𝒎𝒎𝒎𝒎𝒎
𝑫𝑫
𝒎𝒎𝒎𝒎 𝒎𝒎
𝑮𝑮
𝑽𝑽 𝑮𝑮, 𝑫𝑫
MOTIVATION
WHAT HAPPENS?
VS
𝑮𝑮∗
= 𝒎𝒎𝒎𝒎 𝒎𝒎
𝑮𝑮
𝒎𝒎𝒎𝒎𝒎𝒎
𝑫𝑫
𝑽𝑽(𝑮𝑮, 𝑫𝑫)
𝑮𝑮∗
= 𝒎𝒎𝒎𝒎𝒎𝒎
𝑫𝑫
𝒎𝒎𝒎𝒎 𝒎𝒎
𝑮𝑮
𝑽𝑽 𝑮𝑮, 𝑫𝑫
* A single output which can fool the current discriminator most
SCHEMATIC OVERVIEW
* https://www.youtube.com/watch?v=JmON4S0kl04
“Adversarial games are not guaranteed to converge using
gradient descent, e.g. rock, scissor, paper.”
UNROLLED GAN
* Figure adopted from Unrolled GAN, Luke Metz et al. 2016
Let’s “UNROLL” the discriminator to see several modes!
UNROLLED GAN
* Figure adopted from Unrolled GAN, Luke Metz et al. 2016
UNROLLED GAN
* Figure adopted from Unrolled GAN, Luke Metz et al. 2016
* Here, only the “G” part is unrolled
(∵In practice, “D” usually over-powers “G”)
UNROLLED GAN
The Missing Gradient Term
“How the discriminator would
react to a change in the generator.”
(두 가지 경우를 모두 생각해봅시다. )Trade off between
RESULTS
Increased stability in terms of power balance
* Figure adopted from Unrolled GAN, Luke Metz et al. 2016
SUMMARY
1. Address the mode collapsing problem
2. Unrolling the optimization problem
• Make the discriminator optimal as possible as it can
IMPLEMENTATION
* Codes from the jupyter notebook of Ben Poole (2nd Author)
: https://github.com/poolio/unrolled_gan
IMPLEMENTATION
* Codes from the jupyter notebook of Ben Poole (2nd Author)
: https://github.com/poolio/unrolled_gan
IMPLEMENTATION
* Codes from the jupyter notebook of Ben Poole (2nd Author)
: https://github.com/poolio/unrolled_gan
InfoGAN
LET’S USE ADDITIONAL CONSTRAINTS FOR THE GENERATOR
MOTIVATION
“We want to get a disentangled representation space EXPLICITLY.”
Neural network understanding “Rotation”
* Figure adopted from DCGAN paper (link)
MOTIVATION
“We want to get a disentangled representation space EXPLICITLY.”
Neural network understanding “Digit Type”
* Figure adopted from infoGAN paper (link)
Code
MOTIVATION
* Slide adopted from Takato Horii’s slides in SlideShare (link)
• When Generator studies data representations, infoGAN imposes an extra
constraint to make NN learn the feature space in disentangled way.
• Unlike standard GAN, Generator takes a pair of variables as an input:
1. Gaussian noise z (source of incompressible noise)
2. latent code c (semantic feature of data distribution)
infoGAN
SCHEMATIC OVERVIEW
z
G
D
x
Real or Fake?
Diagram of
Standard GAN
SCHEMATIC OVERVIEW
c
z
G
D
x
Real or Fake?
add an extra “code” variable
Diagram of
infoGAN
1. Gaussian noise z (source of incompressible noise)
2. latent code c (semantic feature of data distribution)
SCHEMATIC OVERVIEW
c
z
G
D
x
Real or Fake?
add an extra “code” variable
Diagram of
infoGAN
1. Gaussian noise z (source of incompressible noise)
2. latent code c (semantic feature of data distribution)
𝐜𝐜 ~ 𝐜𝐜𝐜𝐜𝐜𝐜( 𝐊𝐊 − 𝟏𝟏𝟏𝟏, 𝐩𝐩 = 𝟎𝟎. 𝟏𝟏)
1 9
𝟏𝟏
𝟏𝟏𝟏𝟏
0
…
…
SCHEMATIC OVERVIEW
c
z
G
D
x
Real or Fake?
add an extra “code” variable
Diagram of
infoGAN
1. Gaussian noise z (source of incompressible noise)
2. latent code c (semantic feature of data distribution)
*
SCHEMATIC OVERVIEW
c
z
G
D
x
I
Real or Fake?
Mutual Info.
infoGAN
: maximize I(c,G(z,c))
Diagram of
infoGAN Impose an extra constraint to learn disentangled feature space
SCHEMATIC OVERVIEW
“The information in the latent code c should not be lost in the generation process.”
c
z
G
D
x
I
Real or Fake?
Mutual Info.
infoGAN
: maximize I(c,G(z,c))
Diagram of
infoGAN Impose an extra constraint to learn disentangled feature space
SCHEMATIC OVERVIEW
INFOGAN
* Figure adopted from Wikipedia “Mutual Information”
Changed Minimax problem:
Mutual Information:
∴ We need to minimize the entropy of
where .
INFOGAN
* Figure adopted from Wikipedia “Mutual Information”
Changed Minimax problem:
Mutual Information:
∴ We need to minimize the entropy of
where .
Uncertainty
INFOGAN
* Figure adopted from Wikipedia “Mutual Information”
Changed Minimax problem:
Mutual Information:
∴ We need to minimize the entropy of
where .
Uncertainty
*
intractable
VARIATIONAL INFORMATION MAXIMIZATION
Changed Minimax problem:
Let’s MAXIMIZE the LOWER BOUND which is tractable !
QP
RESULTS
MNIST dataset
* Figure adopted from infoGAN paper (link)
RESULTS
3D FACE dataset
* Figure adopted from infoGAN paper (link)
RESULTS
* Figure adopted from infoGAN paper (link)
RESULTS
* Figure adopted from infoGAN paper (link)
Q
IMPLEMENTATION
c
z
G
D
x
I
Diagram of
infoGAN
Train Q separately
SUMMARY
1. Add an additional constraint to improve the performance
• Mutual information
• No adding on the computational cost
2. Learn better feature space
3. Unsupervised way to learn implicit features in the dataset
4. Variational method
IMPLEMENTATION
IMPLEMENTATION
𝒇𝒇-GAN
LET’S USE f-DIVERGENCE RATHER THAN FIXING A SINGLE ONE.
Here, I have heavily reused the slides from S. Nowozin’s (1st author) NIPS 2016 workshop for GAN.
You can easily find the related information (slides) at:
http://www.nowozin.net/sebastian/blog/nips-2016-generative-adversarial-training-workshop-talk.html
𝑄𝑄
𝑃𝑃
𝒫𝒫
MOTIVATION
LEARNING PROBABILISTIC MODELS
* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
𝑃𝑃
𝑄𝑄
𝒫𝒫
Assumptions on P :
• tractable sampling
• tractable parameter gradient with respect to sample
• tractable likelihood function
MOTIVATION
LEARNING PROBABILISTIC MODELS
* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
[Goodfellow et al., 2014]
𝑧𝑧 → Lin 100,1200 → ReLU
→ Lin 1200,1200 → ReLU
→ Lin(1200,784) → Sigmoid
Random input Generator Output
𝑧𝑧 ~ Uniform100
MOTIVATION
Likelihood-free Model
• P: Expectation
• Q: Expectation
• Structure in ℱ
• Examples:
• Energy statistic [Szekely, 1997]
• Kernel MMD [Gretton et al., 20
12],
[Smola et al., 2007]
• Wasserstein distance [Cuturi, 20
13]
• DISCO Nets
[Bouchacourt et al., 2016]
Integral Probability Metrics
[Müller, 1997]
[Sriperumbudur et al., 2010]
𝛾𝛾ℱ 𝑃𝑃, 𝑄𝑄 = sup
𝑓𝑓∈ℱ
� 𝑓𝑓d𝑃𝑃 − � 𝑓𝑓d𝑄𝑄
Proper scoring rules
[Gneiting and Raftery, 2007]
𝑆𝑆 𝑃𝑃, 𝑄𝑄 = � 𝑆𝑆 𝑃𝑃, 𝑥𝑥 d𝑄𝑄(𝑥𝑥)
• P: Distribution
• Q: Expectation
• Examples:
• Log-likelihood
[Fisher, 1922], [Good, 1952]
• Quadratic score
[Bernardo, 1979]
f-divergences
[Ali and Silvey, 1966]
𝐷𝐷𝑓𝑓 𝑃𝑃 ∥ 𝑄𝑄 = � 𝑞𝑞 𝑥𝑥 𝑓𝑓
𝑝𝑝(𝑥𝑥)
𝑞𝑞(𝑥𝑥)
d𝑥𝑥
• P: Distribution
• Q: Distribution
• Examples:
• Kullback-Leibler divergence
[Kullback and Leibler, 1952]
• Jensen-Shannon divergence
• Total variation
• Pearson 𝜒𝜒2
LEARNING PROBABILISTIC MODELS
SCHEMATIC OVERVIEW
• P: Distribution
• Q: Expectation
• P: Expectation
• Q: Expectation
• P: Distribution
• Q: Distribution
[Nguyen et al., 2010], [Reid and Williamson, 2011], [Goodfellow et a
l., 2014]
Variational representation of divergences
LEARNING PROBABILISTIC MODELS
SCHEMATIC OVERVIEW
Neural Sampler samples
Training samples
How do we measure the distance only based on
empirical samples from 𝑷𝑷𝜽𝜽(𝒙𝒙) and 𝐐𝐐(𝒙𝒙)?
TRAINING NEURAL SAMPLERS
SCHEMATIC OVERVIEW
* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
“Show that the GAN approach is a special case of an existing
more general variational divergence estimation approach.”
Let’s generalize the GAN objective to arbitrary 𝒇𝒇-divergences!
SCHEMATIC OVERVIEW
Neural Sampler distribution
True distribution
𝑃𝑃𝜃𝜃 𝑥𝑥
𝑄𝑄 𝑥𝑥
We can minimize some distance (divergence) between the distributions
if we had 𝑷𝑷𝜽𝜽(𝒙𝒙) and 𝑸𝑸(𝑥𝑥)
TRAINING NEURAL SAMPLERS
SCHEMATIC OVERVIEW
* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
• Divergence between two distributions
𝑫𝑫𝒇𝒇 𝑸𝑸 ∥ 𝑷𝑷 = �
𝓧𝓧
𝒑𝒑 𝒙𝒙 𝒇𝒇
𝒒𝒒(𝒙𝒙)
𝒑𝒑(𝒙𝒙)
𝒅𝒅𝒅𝒅
• f : generator function, convex, f (1) = 0
[Ali and Silvey, 1966]
𝑓𝑓-DIVERGENCE
TRY WHAT YOU WANT! (골라먹는 재미)
𝑓𝑓-DIVERGENCE
• Divergence between two distributions
𝐷𝐷𝑓𝑓 𝑄𝑄 ∥ 𝑃𝑃 = �
𝒳𝒳
𝑝𝑝 𝑥𝑥 𝑓𝑓
𝑞𝑞(𝑥𝑥)
𝑝𝑝(𝑥𝑥)
d𝑥𝑥
• Every convex function 𝑓𝑓 has a Fenchel conjugate 𝑓𝑓∗
so that
𝑓𝑓 𝑢𝑢 = sup
𝑡𝑡∈dom𝑓𝑓∗
𝑡𝑡𝑢𝑢 − 𝑓𝑓∗
(𝑡𝑡)
[Nguyen, Wainwright, Jordan, 2010]
“Any convex f can be represented as point-wise max of linear functions”
Estimating 𝑓𝑓-divergences from samples
𝑓𝑓-DIVERGENCE
𝐷𝐷𝑓𝑓 𝑄𝑄 ∥ 𝑃𝑃 = �
𝒳𝒳
𝑝𝑝 𝑥𝑥 𝑓𝑓
𝑞𝑞(𝑥𝑥)
𝑝𝑝(𝑥𝑥)
d𝑥𝑥
= �
𝒳𝒳
𝑝𝑝 𝑥𝑥 sup
𝑡𝑡∈dom𝑓𝑓∗
𝑡𝑡
𝑞𝑞(𝑥𝑥)
𝑝𝑝(𝑥𝑥)
− 𝑓𝑓∗(𝑡𝑡) d𝑥𝑥
≥ sup
𝑇𝑇∈𝒯𝒯
�
𝒳𝒳
𝑞𝑞 𝑥𝑥 𝑇𝑇 𝑥𝑥 d𝑥𝑥 − �
𝒳𝒳
𝑝𝑝 𝑥𝑥 𝑓𝑓∗ 𝑇𝑇 𝑥𝑥 d𝑥𝑥
= sup
𝑇𝑇∈𝒯𝒯
𝔼𝔼𝑥𝑥~𝑄𝑄 𝑇𝑇(𝑥𝑥) − 𝔼𝔼𝑥𝑥~𝑃𝑃[𝑓𝑓∗(𝑇𝑇(𝑥𝑥))]
Approximate using: samples from Q samples from P
Estimating 𝑓𝑓-divergences from samples (cont)
𝑓𝑓-DIVERGENCE
* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
• GAN
min
𝜃𝜃
max
𝜔𝜔
𝔼𝔼𝑥𝑥~𝑄𝑄[log 𝐷𝐷𝜔𝜔 𝑥𝑥 ] + 𝔼𝔼𝑥𝑥~𝑃𝑃𝜃𝜃
[log(1 − 𝐷𝐷𝜔𝜔(𝑥𝑥))]
• 𝑓𝑓-GAN
min
𝜃𝜃
max
𝜔𝜔
𝔼𝔼𝑥𝑥~𝑄𝑄 𝑇𝑇𝜔𝜔 (𝑥𝑥) − 𝔼𝔼𝑥𝑥~𝑃𝑃𝜃𝜃
[𝑓𝑓∗
(𝑇𝑇𝜔𝜔(𝑥𝑥))]
• GAN discriminator-variational function correspondence: log𝐷𝐷𝜔𝜔 𝑥𝑥 =
𝑇𝑇𝜔𝜔 𝑥𝑥
• GAN minimizes the Jensen-Shannon divergence (which was also pointed
out in Goodfellow et al., 2014)
𝑓𝑓-GAN and GAN objectives
𝑓𝑓-GAN
* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
min
𝜃𝜃
max
𝜔𝜔
𝔼𝔼𝑥𝑥~𝑄𝑄 𝑔𝑔𝑓𝑓(𝑉𝑉𝜔𝜔 𝑥𝑥 ) + 𝔼𝔼𝑥𝑥~𝑃𝑃𝜃𝜃
−𝑓𝑓∗
𝑔𝑔𝑓𝑓 𝑉𝑉𝜔𝜔 𝑥𝑥
Comparison of the objectives
𝑓𝑓-GAN
* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
• Double-loop algorithm [Goodfellow et al., 2014]
• Algorithm:
• Inner loop: tighten divergence lower bound
• Outer loop: minimize generator loss
• In practice the inner loop is run only for one step (two backprops)
• Missing justification for this practice
• Single-step algorithm (proposed)
• Algorithm: simultaneously take (one backprop)
• a positive gradient step w.r.t. variational function 𝑇𝑇𝜔𝜔(𝑥𝑥)
• a negative gradient step w.r.t. generator function 𝑃𝑃𝜃𝜃 𝑥𝑥
• Does this converge?
THEORETICAL RESULTS
Algorithm: Double-Loop versus Single-Step
GENERAL ALGORITHM
f-GAN
* Please note that in S. Nowozin’s paper, P represents the real distribution and 𝑄𝑄𝜃𝜃 stands for the parametric model we set.
THEORETICAL RESULTS
Local convergence of the algorithm 1
• Assumptions
• F is locally (strongly) convex with respect to 𝜃𝜃
• F is (strongly) concave with respect to 𝜔𝜔
• Local convergence:
Define 𝐽𝐽 𝜃𝜃, 𝜔𝜔 =
1
2
𝛻𝛻𝜃𝜃 𝐹𝐹 2 +
1
2
𝛻𝛻𝜔𝜔 𝐹𝐹 2, then
𝐽𝐽 𝜃𝜃𝑡𝑡
, 𝜔𝜔𝑡𝑡
≤ 1 −
𝛿𝛿2
𝐿𝐿
𝑡𝑡
𝐽𝐽 𝜃𝜃0
, 𝜔𝜔0
𝛻𝛻2 𝐹𝐹
=
𝛻𝛻𝜃𝜃
2
𝐹𝐹 𝛻𝛻𝜃𝜃 𝛻𝛻𝜔𝜔 𝐹𝐹
𝛻𝛻𝜔𝜔 𝛻𝛻𝜃𝜃 𝐹𝐹 𝛻𝛻𝜔𝜔
2 𝐹𝐹
𝛻𝛻𝜃𝜃
2
𝐹𝐹 ≻ 0, 𝛻𝛻𝜔𝜔
2
𝐹𝐹 ≺ 0
𝛿𝛿: strong convexity parameter, L: smoothness parameter
Geometric rate
of convergence!
THEORETICAL RESULTS
Local convergence of the algorithm 1
𝑽𝑽 𝒙𝒙, 𝒚𝒚 = 𝒙𝒙𝒙𝒙 +
𝜹𝜹
𝟐𝟐
(𝒙𝒙𝟐𝟐 − 𝒚𝒚𝟐𝟐) 𝑽𝑽 𝒙𝒙, 𝒚𝒚 = 𝒙𝒙𝒚𝒚𝟐𝟐 +
𝜹𝜹
𝟐𝟐
(𝒙𝒙𝟐𝟐 − 𝒚𝒚𝟐𝟐)
VIDEOS
RESULTS
Synthetic 1D Univariate
Approximate a mixture of Gaussians by a Gaussian to
• Validate the approach
• Demonstrate the properties of different divergences [Minka, 2005]
Compare the exact optimization of the divergence with the GAN approach
* Please note that in S. Nowozin’s paper, P represents the real distribution and 𝑄𝑄𝜃𝜃 stands for the parametric model we set.
RESULTS
* Figure adopted from f-GAN paper (link)
Synthetic 1D Univariate
* Please note that in S. Nowozin’s paper, P represents the real distribution and 𝑄𝑄𝜃𝜃 stands for the parametric model we set.
RESULTS
* Figure adopted from f-GAN paper (link)
SUMMARY
• Generalize GAN objective to arbitrary 𝒇𝒇-divergences
• Simplify GAN algorithm + local convergence proof
• Demonstrate different divergences
ETC.
WHY GAN GENERATES SHARPER IMAGES?
* Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016
ETC.
WHY GAN GENERATES SHARPER IMAGES?
* Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016
• LSUN experiment: No (visually)
• Empirical contradiction to intuition from [Theis et al., 2015],
[Huszar, 2015]
• Why?
• Intuition: strong inductive bias of model class
Q
ETC.
DOES THE DIVERGENCE MATTER?
EBGAN
LET’S USE ENERGY-BASED MODEL FOR THE DISCRIMINATOR
MOTIVATION
* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
We want to learn
the data manifold!
MOTIVATION
* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
We want to learn
the data manifold!
⟺ Do not want to give
penalty to both 𝒚𝒚 and �𝒚𝒚 .
MOTIVATION
* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
We want to learn
the data manifold!
⟺ Do not want to give
penalty to both 𝒚𝒚 and �𝒚𝒚 .
⟺ Pen can fall either way.
(there are no wrong way to fall)
MOTIVATION
ENERGY-BASED MODEL
Energy based models capture dependencies between variables by associating a
scalar energy (a measure of compatibility) to each configuration of the variables.
Inference, i.e., making a prediction or decision, consists in setting the value of
observed variables and finding values of the remaining variables that minimize the
energy.
Learning consists in finding an energy function that associates low energies to
correct values of the remaining variables, and higer energies to incorrect values.
A loss functional, minimized during learning, is used to measure the quality of the
available energy functions.
* Quoted from “A Tutorial on Energy-Based Learning”, Yann Lecun et al., 2006
Energy based models capture dependencies between variables by associating a
scalar energy (a measure of compatibility) to each configuration of the variables.
Inference, i.e., making a prediction or decision, consists in setting the value of
observed variables and finding values of the remaining variables that minimize the
energy.
Learning consists in finding an energy function that associates low energies to
correct values of the remaining variables, and higer energies to incorrect values.
A loss functional, minimized during learning, is used to measure the quality of the
available energy functions.
MOTIVATION
ENERGY-BASED MODEL
WITHIN the COMMON inference/learning FRAMEWORK, the wide choice of
energy functions and loss functionals allows for the design of many types of
statistical models, both probabilistic and non-probabilistic.
* Quoted from “A Tutorial on Energy-Based Learning”, Yann Lecun et al., 2006
MOTIVATION
ENERGY-BASED MODEL
* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
“LET’S USE IT!”
MOTIVATION
ENERGY-BASED MODEL
* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
“BUT HOW do we choose where to push up?“
MOTIVATION
* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
MOTIVATION
* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
LIMITATIONS
→ Limit the space
→ Every interesting case is intractable
→ How to pick the point to push up?
→ Limit the model or space
⋮
⋮
MOTIVATION
* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
MOTIVATION
* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
SCHEMATIC OVERVIEW
* Slides from Yann Lecun’s talk in NIPS 2016 (link)
* Slides from Yann Lecun’s talk in NIPS 2016 (link)
SCHEMATIC OVERVIEW
* Slides from Yann Lecun’s talk in NIPS 2016 (link)
SCHEMATIC OVERVIEW
Architecture: discriminator is an auto-encoder
Loss functions:
EBGAN
* Slides from Yann Lecun’s talk in NIPS 2016 (link)
Architecture: discriminator is an auto-encoder
Loss functions:
hinge loss
EBGAN
* Slides from Yann Lecun’s talk in NIPS 2016 (link)
THEORETICAL RESULTS
* Slides from Yann Lecun’s talk in NIPS 2016 (link)
RESULTS
* Slides from Yann Lecun’s talk in NIPS 2016 (link)
RESULTS
* Slides from Yann Lecun’s talk in NIPS 2016 (link)
SUMMARY
1. New framework using an energy-based model
• Discriminator as an energy function
• Low values on the data manifold
• Higher values everywhere else
• Generator produce contrastive samples
2. Stable learning
WGAN
MOTIVATION
What does it mean to learn a probability model?
: we learned it from 𝒇𝒇-GAN!
𝑄𝑄
𝑃𝑃𝜃𝜃
𝒫𝒫
LEARNING PROBABILISTIC MODELS
Assumptions on P : tractable sampling, parameter gradient with respect to sample, likelihood function
MOTIVATION
What does it mean to learn a probability model?
: we learned it from 𝒇𝒇-GAN!
𝑄𝑄
𝑃𝑃𝜃𝜃
𝒫𝒫
LEARNING PROBABILISTIC MODELS
Assumptions on P : tractable sampling, parameter gradient with respect to sample, likelihood function
… 𝒎𝒎𝒎𝒎 𝒎𝒎
𝜽𝜽
𝑲𝑲𝑲𝑲[𝑸𝑸| 𝑷𝑷𝜽𝜽 ⟺ 𝒎𝒎𝒎𝒎𝒎𝒎
𝜽𝜽
�
𝒊𝒊=𝟏𝟏
𝑵𝑵
𝒍𝒍𝒍𝒍𝒍𝒍 𝑷𝑷 𝒙𝒙𝒊𝒊 𝜽𝜽
QP
MOTIVATION
What if the supports of two distribution does not overlap?
𝑰𝑰𝑰𝑰 𝑲𝑲𝑲𝑲[𝑸𝑸| 𝑷𝑷𝜽𝜽 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅?
QP
MOTIVATION
What if the supports of two distribution does not overlap?
𝑨𝑨𝑨𝑨𝑨𝑨 𝒂𝒂 𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕𝒕𝒕 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅
QP
* Note that this is a very rough explanation
MOTIVATION
What if the supports of two distribution does not overlap?
𝑨𝑨𝑨𝑨𝑨𝑨 𝒂𝒂 𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕𝒕𝒕 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅
QP
TEMPORARY? OR UNSATISFYING SOLUTION
* Note that this is a very rough explanation
• P: Expectation
• Q: Expectation
• Structure in ℱ
• Examples:
• Energy statistic [Szekely, 1997]
• Kernel MMD [Gretton et al., 20
12],
[Smola et al., 2007]
• Wasserstein distance [Cuturi, 20
13]
• DISCO Nets
[Bouchacourt et al., 2016]
Integral Probability Metrics
[Müller, 1997]
[Sriperumbudur et al., 2010]
𝛾𝛾ℱ 𝑃𝑃, 𝑄𝑄 = sup
𝑓𝑓∈ℱ
� 𝑓𝑓d𝑃𝑃 − � 𝑓𝑓d𝑄𝑄
Proper scoring rules
[Gneiting and Raftery, 2007]
𝑆𝑆 𝑃𝑃, 𝑄𝑄 = � 𝑆𝑆 𝑃𝑃, 𝑥𝑥 d𝑄𝑄(𝑥𝑥)
• P: Distribution
• Q: Expectation
• Examples:
• Log-likelihood
[Fisher, 1922], [Good, 1952]
• Quadratic score
[Bernardo, 1979]
f-divergences
[Ali and Silvey, 1966]
𝐷𝐷𝑓𝑓 𝑃𝑃 ∥ 𝑄𝑄 = � 𝑞𝑞 𝑥𝑥 𝑓𝑓
𝑝𝑝(𝑥𝑥)
𝑞𝑞(𝑥𝑥)
d𝑥𝑥
• P: Distribution
• Q: Distribution
• Examples:
• Kullback-Leibler divergence
[Kullback and Leibler, 1952]
• Jensen-Shannon divergence
• Total variation
• Pearson 𝜒𝜒2
LEARNING PROBABILISTIC MODELS
REVIEW!
[Goodfellow et al., 2014]
𝑧𝑧 → Lin 100,1200 → ReLU
→ Lin 1200,1200 → ReLU
→ Lin(1200,784) → Sigmoid
Random input Generator Output
𝑧𝑧 ~ Uniform100
REVIEW!
Likelihood-free Model
[Goodfellow et al., 2014]
𝑧𝑧 → Lin 100,1200 → ReLU
→ Lin 1200,1200 → ReLU
→ Lin(1200,784) → Sigmoid
Random input Generator Output
𝑧𝑧 ~ Uniform100
REVIEW!
WELL KNOWN FOR BEING DELICATE AND UNSTABLE FOR TRAINING
Likelihood-free Model
• P: Expectation
• Q: Expectation
• Structure in ℱ
• Examples:
• Energy statistic [Szekely, 1997]
• Kernel MMD [Gretton et al., 20
12],
[Smola et al., 2007]
• Wasserstein distance [Cuturi, 20
13]
• DISCO Nets
[Bouchacourt et al., 2016]
Integral Probability Metrics
[Müller, 1997]
[Sriperumbudur et al., 2010]
𝛾𝛾ℱ 𝑃𝑃, 𝑄𝑄 = sup
𝑓𝑓∈ℱ
� 𝑓𝑓d𝑃𝑃 − � 𝑓𝑓d𝑄𝑄
Proper scoring rules
[Gneiting and Raftery, 2007]
𝑆𝑆 𝑃𝑃, 𝑄𝑄 = � 𝑆𝑆 𝑃𝑃, 𝑥𝑥 d𝑄𝑄(𝑥𝑥)
• P: Distribution
• Q: Expectation
• Examples:
• Log-likelihood
[Fisher, 1922], [Good, 1952]
• Quadratic score
[Bernardo, 1979]
f-divergences
[Ali and Silvey, 1966]
𝐷𝐷𝑓𝑓 𝑃𝑃 ∥ 𝑄𝑄 = � 𝑞𝑞 𝑥𝑥 𝑓𝑓
𝑝𝑝(𝑥𝑥)
𝑞𝑞(𝑥𝑥)
d𝑥𝑥
• P: Distribution
• Q: Distribution
• Examples:
• Kullback-Leibler divergence
[Kullback and Leibler, 1952]
• Jensen-Shannon divergence
• Total variation
• Pearson 𝜒𝜒2
LEARNING PROBABILISTIC MODELS
REVIEW!
THEORETIC RESULTS
THEORETIC RESULTS
Different distances
Slide courtesy of Sungbin Lim, DeepBio, 2017
THEORETIC RESULTS
Different distances
Slide courtesy of Sungbin Lim, DeepBio, 2017
THEORETIC RESULTS
Different distances
Slide courtesy of Sungbin Lim, DeepBio, 2017
THEORETIC RESULTS
Different distances
Slide courtesy of Sungbin Lim, DeepBio, 2017
THEORETIC RESULTS
Slide courtesy of Sungbin Lim, DeepBio, 2017
THEORETIC RESULTS
Slide courtesy of Sungbin Lim, DeepBio, 2017
THEORETIC RESULTS
Slide courtesy of Sungbin Lim, DeepBio, 2017
Slide courtesy of Sungbin Lim, DeepBio, 2017
THEORETIC RESULTS
THEORETIC RESULTS
Slide courtesy of Sungbin Lim, DeepBio, 2017
WASSERSTEIN DISTANCE
Slide courtesy of Sungbin Lim, DeepBio, 2017
WASSERSTEIN DISTANCE
Slide courtesy of Sungbin Lim, DeepBio, 2017
WASSERSTEIN DISTANCE
THEORETIC RESULTS
* Figure adopted from WGAN paper (link)
THEORETIC RESULTS
Slide courtesy of Sungbin Lim, DeepBio, 2017
THEORETIC RESULTS
Slide courtesy of Sungbin Lim, DeepBio, 2017
THEORETIC RESULTS
Wasserstein distance is a continuous function on 𝜽𝜽 under mild assumption!
THEORETIC RESULTS
Wasserstein distance is the weakest one
THEORETIC RESULTS
Highly intractable!
(for all joint dist….)
THEORETIC RESULTS
Change the problem with its dual problem which is tractable (somehow)!
THEORETIC RESULTS
Okay! We can use the neural network!
(up to a constant)
IMPLEMENTATION
RESULTS
* Figure adopted from WGAN paper (link)
RESULTS
* Figure adopted from WGAN paper (link)
RESULTS
* Figure adopted from WGAN paper (link)
Stable without Batch normalization!
RESULTS
* Figure adopted from WGAN paper (link)
Stable without DCGAN structure (generator)!
SUMMARY
1. Provide a comprehensive theoretical analysis of how the EM
distance behaves in comparison to the others
2. Introduce Wasserstein-GAN that minimizes a reasonable and
efficient approximation of the EM distance.
3. Empirically show that WGANs cure the main training problems of
GANs (e.g. stability, power balance, mode-collapsing)
4. Evaluation criteria (learning curve)
IMPLEMENTATION
THAT SIMPLE!
(after all those mathematics…)
• 거의 모든 GAN에 대한 구현이 Pytorch와 tensorflow 버전으
로 구현되어있는 repo:
https://github.com/wiseodd/generative-models
IMPLEMENTATION
THANK YOU 
jaejun.yoo@kaist.ac.kr
MLE & KL DIVERGENCE
분포수렴이란?

Más contenido relacionado

La actualidad más candente

GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionPreferred Networks
 
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Databricks
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Vitaly Bondar
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningKhang Pham
 
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Weiwei Guo
 
Horovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made EasyHorovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made EasyAlexander Sergeev
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
Kaggleのテクニック
KaggleのテクニックKaggleのテクニック
KaggleのテクニックYasunori Ozaki
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...Jihoo Kim
 
Evolution of the StyleGAN family
Evolution of the StyleGAN familyEvolution of the StyleGAN family
Evolution of the StyleGAN familyVitaly Bondar
 

La actualidad más candente (20)

GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
 
Horovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made EasyHorovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made Easy
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Kaggleのテクニック
KaggleのテクニックKaggleのテクニック
Kaggleのテクニック
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Pegasus
PegasusPegasus
Pegasus
 
Adversarial training Basics
Adversarial training BasicsAdversarial training Basics
Adversarial training Basics
 
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
 
Tabnet presentation
Tabnet presentationTabnet presentation
Tabnet presentation
 
Evolution of the StyleGAN family
Evolution of the StyleGAN familyEvolution of the StyleGAN family
Evolution of the StyleGAN family
 

Destacado

Anomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAEAnomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAEPreferred Networks
 
Anomaly Detection using Deep Auto-Encoders | Gianmario Spacagna
Anomaly Detection using Deep Auto-Encoders | Gianmario SpacagnaAnomaly Detection using Deep Auto-Encoders | Gianmario Spacagna
Anomaly Detection using Deep Auto-Encoders | Gianmario SpacagnaData Science Milan
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learningAdam Gibson
 
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016Taehoon Kim
 

Destacado (6)

Anomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAEAnomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAE
 
Dcgan
DcganDcgan
Dcgan
 
Anomaly Detection using Deep Auto-Encoders | Gianmario Spacagna
Anomaly Detection using Deep Auto-Encoders | Gianmario SpacagnaAnomaly Detection using Deep Auto-Encoders | Gianmario Spacagna
Anomaly Detection using Deep Auto-Encoders | Gianmario Spacagna
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
 
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
 

Similar a Variants of GANs - Jaejun Yoo

[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of FunctionsJaeJun Yoo
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewQuantUniversity
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417Shuai Zhang
 
Big Data LDN 2017: Deep Learning Demystified
Big Data LDN 2017: Deep Learning DemystifiedBig Data LDN 2017: Deep Learning Demystified
Big Data LDN 2017: Deep Learning DemystifiedMatt Stubbs
 
Nips 2016 tutorial generative adversarial networks review
Nips 2016 tutorial  generative adversarial networks reviewNips 2016 tutorial  generative adversarial networks review
Nips 2016 tutorial generative adversarial networks reviewMinho Heo
 
brief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANsbrief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANsParham Zilouchian
 
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Codiax
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Age and Gender Detection-converted.pdf
Age and Gender Detection-converted.pdfAge and Gender Detection-converted.pdf
Age and Gender Detection-converted.pdfMohammedMuzammil83
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjajaCodePolitan
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
An Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’sAn Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’sijtsrd
 
Project_Final_Review.pdf
Project_Final_Review.pdfProject_Final_Review.pdf
Project_Final_Review.pdfDivyaGugulothu
 
Face-GAN project report.pptx
Face-GAN project report.pptxFace-GAN project report.pptx
Face-GAN project report.pptxAndleebFatima16
 
Generative Adversarial Networks and Their Applications in Medical Imaging
Generative Adversarial Networks  and Their Applications in Medical ImagingGenerative Adversarial Networks  and Their Applications in Medical Imaging
Generative Adversarial Networks and Their Applications in Medical ImagingSanghoon Hong
 
IRJET - Deep Learning Approach to Inpainting and Outpainting System
IRJET -  	  Deep Learning Approach to Inpainting and Outpainting SystemIRJET -  	  Deep Learning Approach to Inpainting and Outpainting System
IRJET - Deep Learning Approach to Inpainting and Outpainting SystemIRJET Journal
 
PROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATION
PROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATIONPROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATION
PROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATIONWilly Marroquin (WillyDevNET)
 

Similar a Variants of GANs - Jaejun Yoo (20)

[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417
 
Big Data LDN 2017: Deep Learning Demystified
Big Data LDN 2017: Deep Learning DemystifiedBig Data LDN 2017: Deep Learning Demystified
Big Data LDN 2017: Deep Learning Demystified
 
Nips 2016 tutorial generative adversarial networks review
Nips 2016 tutorial  generative adversarial networks reviewNips 2016 tutorial  generative adversarial networks review
Nips 2016 tutorial generative adversarial networks review
 
brief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANsbrief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANs
 
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Age and Gender Detection-converted.pdf
Age and Gender Detection-converted.pdfAge and Gender Detection-converted.pdf
Age and Gender Detection-converted.pdf
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
An Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’sAn Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’s
 
Age and Gender Detection.docx
Age and Gender Detection.docxAge and Gender Detection.docx
Age and Gender Detection.docx
 
Project_Final_Review.pdf
Project_Final_Review.pdfProject_Final_Review.pdf
Project_Final_Review.pdf
 
Face-GAN project report.pptx
Face-GAN project report.pptxFace-GAN project report.pptx
Face-GAN project report.pptx
 
Face-GAN project report
Face-GAN project reportFace-GAN project report
Face-GAN project report
 
Generative Adversarial Networks and Their Applications in Medical Imaging
Generative Adversarial Networks  and Their Applications in Medical ImagingGenerative Adversarial Networks  and Their Applications in Medical Imaging
Generative Adversarial Networks and Their Applications in Medical Imaging
 
IRJET - Deep Learning Approach to Inpainting and Outpainting System
IRJET -  	  Deep Learning Approach to Inpainting and Outpainting SystemIRJET -  	  Deep Learning Approach to Inpainting and Outpainting System
IRJET - Deep Learning Approach to Inpainting and Outpainting System
 
Image captioning
Image captioningImage captioning
Image captioning
 
PROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATION
PROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATIONPROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATION
PROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATION
 

Más de JaeJun Yoo

[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniquesJaeJun Yoo
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...JaeJun Yoo
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsJaeJun Yoo
 
[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial NetworksJaeJun Yoo
 
Introduction to ambient GAN
Introduction to ambient GANIntroduction to ambient GAN
Introduction to ambient GANJaeJun Yoo
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmaxJaeJun Yoo
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo
 
[PR12] Capsule Networks - Jaejun Yoo
[PR12] Capsule Networks - Jaejun Yoo[PR12] Capsule Networks - Jaejun Yoo
[PR12] Capsule Networks - Jaejun YooJaeJun Yoo
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun Yoo[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun YooJaeJun Yoo
 
[Pr12] dann jaejun yoo
[Pr12] dann   jaejun yoo[Pr12] dann   jaejun yoo
[Pr12] dann jaejun yooJaeJun Yoo
 

Más de JaeJun Yoo (12)

[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks
 
Introduction to ambient GAN
Introduction to ambient GANIntroduction to ambient GAN
Introduction to ambient GAN
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
[PR12] Capsule Networks - Jaejun Yoo
[PR12] Capsule Networks - Jaejun Yoo[PR12] Capsule Networks - Jaejun Yoo
[PR12] Capsule Networks - Jaejun Yoo
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun Yoo[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun Yoo
 
[Pr12] dann jaejun yoo
[Pr12] dann   jaejun yoo[Pr12] dann   jaejun yoo
[Pr12] dann jaejun yoo
 

Último

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Último (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Variants of GANs - Jaejun Yoo

  • 1. VARIANTS OF GANs Jaejun Yoo Ph.D. Candidate @KAIST 13th May, 2017 초짜 대학원생의 입장에서 이해하는
  • 2. 안녕하세요 저는 유재준 - Ph.D. Candidate - Medical Image Reconstruction, - http://jaejunyoo.blogspot.com/ - CVPR 2017 NTIRE Challenge: Ranked 3rd Topological Data Analysis, EEG
  • 3. 이 강의의 목표 1. GAN에 대한 더 깊은 이해 2. 이후 GAN 연구의 흐름을 따라가기 위한 기반 다지기 • 기존의 GAN이 가지고 있는 문제점과 그 이유에 대한 이해 • Variants of GAN을 소개하되 주요 문제점을 해결하거나 큰 틀에서 새 로운 방향을 제시한 논문들 위주 소개
  • 6. PREREQUISITES Generative Models * Figure adopted from BEGAN paper released at 31. Mar. 2017 David Berthelot et al. Google (link) Generated Images by Neural Network
  • 7. PREREQUISITES Generative Models “What I cannot create, I do not understand”
  • 8. PREREQUISITES Generative Models “What I cannot create, I do not understand” If the network can learn how to draw cat and dog separately, it must be able to classify them, i.e. feature learning follows naturally.
  • 9. PREREQUISITES Taxonomy of Machine Learning From Yann Lecun, (NIPS 2016)From David silver, Reinforcement learning (UCL course on RL, 2015)
  • 10. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017) y = f(x)
  • 11. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
  • 12. PREREQUISITES Taxonomy of Machine Learning From Yann Lecun, (NIPS 2016)From David silver, Reinforcement learning (UCL course on RL, 2015)
  • 13. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
  • 14. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
  • 15. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
  • 16. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
  • 17. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
  • 18. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
  • 19. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
  • 20. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
  • 21. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)
  • 22. PREREQUISITES Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017) * Figure adopted from NIPS 2016 Tutorial: GAN paper, Ian Goodfellow 2016
  • 23. GAN
  • 24. SCHEMATIC OVERVIEW z G D x Real or Fake? Diagram of Standard GAN Gaussian noise as an input for G
  • 25. z G D x Real or Fake? Diagram of Standard GAN 지폐위조범 경찰 SCHEMATIC OVERVIEW
  • 26. z G D x Real or Fake? Diagram of Standard GAN 지폐위조범 경찰 QP SCHEMATIC OVERVIEW
  • 27. Diagram of Standard GAN Data distribution Model distribution Discriminator SCHEMATIC OVERVIEW * Figure adopted from Generative Adversarial Nets, Ian Goodfellow et al. 2014
  • 28. Minimax problem of GAN THEORETICAL RESULTS Show that… 1. The minimax problem of GAN has a global optimum at 𝒑𝒑𝒈𝒈 = 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 2. The proposed algorithm can find that global optimum TWO STEP APPROACH
  • 31. THEORETICAL RESULTS Convergence of the proposed algorithm
  • 32. SUMMARY • Supervised / Unsupervised / Reinforcement Learning • Generative Models • Variational Inference Technique • Adversarial Training • Reduce the gap between 𝑸𝑸 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 and 𝑷𝑷𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 TAKE HOME KEYPOINTS QP
  • 33. RELATED WORKS (APPLICATIONS) * CycleGAN Jun-Yan Zhu et al. 2017 * SRGAN Christian Ledwig et al. 2017 Super-resolution Domain Adaptation Img2Img Translation
  • 37. DIFFICULTIES MODE COLLAPSE (SAMPLE DIVERSITY) * Slide adopted from NIPS 2016 Tutorial, Ian Goodfellow
  • 40. DIFFICULTIES HOW TO EVALUATE THE QUALITY?
  • 41. DIFFICULTIES HOW TO EVALUATE THE QUALITY?
  • 42. DIFFICULTIES HOW TO EVALUATE THE QUALITY? TEMPORARY SOLUTION
  • 43. SUMMARY “TRAINING GAN IS HARD” • Power balance (NO learning) • Convergence (oscillation) • Mode collapse • Evaluation (GAN training loss is intractable)
  • 44. SUMMARY “TRAINING GAN IS HARD” • Power balance (NO learning) • Convergence (oscillation) • Mode collapse • Evaluation (GAN training loss is intractable) HOW TO SOLVE THESE PROBLEMS?
  • 45. DCGAN LET’S CAREFULLY SELECT THE ARCHITECTURE!
  • 46. MOTIVATION TRAINING IS TOOOO HARD… (Ahhh…it just does not work…orz…)
  • 48. SCHEMATIC OVERVIEW Guideline for stable learning “However, after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for higher resolution and deeper generative models.”
  • 49. SCHEMATIC OVERVIEW Guideline for stable learning “However, after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for higher resolution and deeper generative models.” "Most GANs today are at least loosely based on the DCGAN architecture." - NIPS 2016 Tutorial by Ian Goodfellow
  • 50. Okay, learning is finished and the model converged. Then… KEYPOINTS “How to show that our network or generator learned A MEANINGFUL FUNCTION?”
  • 51. KEYPOINTS • The generator DOES NOT MEMORIZED the images. • There are NO SHARP TRANSITION while walking in the latent space. • The generator UNDERSTANDS the feature of the data. Okay, learning is finished and the model converged. Then… Show that
  • 52. RESULTS * Figure adopted from DCGAN, Alec Radford et al. 2016 (link) What can GAN do?
  • 53. RESULTS * Figure adopted from DCGAN, Alec Radford et al. 2016 (link) What can GAN do? “Walking in the latent space” z-space
  • 54. RESULTS * Figure adopted from DCGAN, Alec Radford et al. 2016 (link) What can GAN do?
  • 55. RESULTS * Figure adopted from DCGAN, Alec Radford et al. 2016 (link) What can GAN do? “Forgetting the feature it learned”
  • 56. RESULTS What can GAN do? “Vector arithmetic“ (e.g. word2vec)
  • 57. RESULTS What can GAN do? “Vector arithmetic“ (e.g. word2vec)
  • 58. RESULTS * Figure adopted from DCGAN, Alec Radford et al. 2016 (link) What can GAN do? “Vector arithmetic“ (e.g. word2vec)
  • 59. RESULTS Neural network understanding “Rotation” * Figure adopted from DCGAN, Alec Radford et al. 2016 (link) What can GAN do? “Understand the meaning of the data“ (e.g. code: rotation, category, and etc.)
  • 60. SUMMARY 1. Guideline for stable learning 2. Good analysis on the results • Show that the generator DOES NOT MEMORIZED the images • Show that there are NO SHARP TRANSITION while walking in the latent space • Show that the generator UNDERSTANDS the feature of the data
  • 61. Unrolled GAN LET’S GIVE EXTRA INFORMATION TO THE NETWORK (allow it to ‘see into the future’)
  • 62. Convergence of the proposed algorithm MOTIVATION Impossible to achieve in practice
  • 63. MOTIVATION WHAT HAPPENS? * Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016
  • 64. MOTIVATION WHAT HAPPENS? * Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016 “GAN procedure normally do not cover the whole distribution, even when targeting a mode covering divergence such as KL”
  • 65. MOTIVATION WHAT HAPPENS? VS 𝑮𝑮∗ = 𝒎𝒎𝒎𝒎 𝒎𝒎 𝑮𝑮 𝒎𝒎𝒎𝒎𝒎𝒎 𝑫𝑫 𝑽𝑽(𝑮𝑮, 𝑫𝑫) 𝑮𝑮∗ = 𝒎𝒎𝒎𝒎𝒎𝒎 𝑫𝑫 𝒎𝒎𝒎𝒎 𝒎𝒎 𝑮𝑮 𝑽𝑽 𝑮𝑮, 𝑫𝑫
  • 66. MOTIVATION WHAT HAPPENS? VS 𝑮𝑮∗ = 𝒎𝒎𝒎𝒎 𝒎𝒎 𝑮𝑮 𝒎𝒎𝒎𝒎𝒎𝒎 𝑫𝑫 𝑽𝑽(𝑮𝑮, 𝑫𝑫) 𝑮𝑮∗ = 𝒎𝒎𝒎𝒎𝒎𝒎 𝑫𝑫 𝒎𝒎𝒎𝒎 𝒎𝒎 𝑮𝑮 𝑽𝑽 𝑮𝑮, 𝑫𝑫 * A single output which can fool the current discriminator most
  • 67. SCHEMATIC OVERVIEW * https://www.youtube.com/watch?v=JmON4S0kl04 “Adversarial games are not guaranteed to converge using gradient descent, e.g. rock, scissor, paper.”
  • 68. UNROLLED GAN * Figure adopted from Unrolled GAN, Luke Metz et al. 2016 Let’s “UNROLL” the discriminator to see several modes!
  • 69. UNROLLED GAN * Figure adopted from Unrolled GAN, Luke Metz et al. 2016
  • 70. UNROLLED GAN * Figure adopted from Unrolled GAN, Luke Metz et al. 2016 * Here, only the “G” part is unrolled (∵In practice, “D” usually over-powers “G”)
  • 71. UNROLLED GAN The Missing Gradient Term “How the discriminator would react to a change in the generator.” (두 가지 경우를 모두 생각해봅시다. )Trade off between
  • 72. RESULTS Increased stability in terms of power balance * Figure adopted from Unrolled GAN, Luke Metz et al. 2016
  • 73. SUMMARY 1. Address the mode collapsing problem 2. Unrolling the optimization problem • Make the discriminator optimal as possible as it can
  • 74. IMPLEMENTATION * Codes from the jupyter notebook of Ben Poole (2nd Author) : https://github.com/poolio/unrolled_gan
  • 75. IMPLEMENTATION * Codes from the jupyter notebook of Ben Poole (2nd Author) : https://github.com/poolio/unrolled_gan
  • 76. IMPLEMENTATION * Codes from the jupyter notebook of Ben Poole (2nd Author) : https://github.com/poolio/unrolled_gan
  • 77. InfoGAN LET’S USE ADDITIONAL CONSTRAINTS FOR THE GENERATOR
  • 78. MOTIVATION “We want to get a disentangled representation space EXPLICITLY.” Neural network understanding “Rotation” * Figure adopted from DCGAN paper (link)
  • 79. MOTIVATION “We want to get a disentangled representation space EXPLICITLY.” Neural network understanding “Digit Type” * Figure adopted from infoGAN paper (link) Code
  • 80. MOTIVATION * Slide adopted from Takato Horii’s slides in SlideShare (link)
  • 81. • When Generator studies data representations, infoGAN imposes an extra constraint to make NN learn the feature space in disentangled way. • Unlike standard GAN, Generator takes a pair of variables as an input: 1. Gaussian noise z (source of incompressible noise) 2. latent code c (semantic feature of data distribution) infoGAN SCHEMATIC OVERVIEW
  • 82. z G D x Real or Fake? Diagram of Standard GAN SCHEMATIC OVERVIEW
  • 83. c z G D x Real or Fake? add an extra “code” variable Diagram of infoGAN 1. Gaussian noise z (source of incompressible noise) 2. latent code c (semantic feature of data distribution) SCHEMATIC OVERVIEW
  • 84. c z G D x Real or Fake? add an extra “code” variable Diagram of infoGAN 1. Gaussian noise z (source of incompressible noise) 2. latent code c (semantic feature of data distribution) 𝐜𝐜 ~ 𝐜𝐜𝐜𝐜𝐜𝐜( 𝐊𝐊 − 𝟏𝟏𝟏𝟏, 𝐩𝐩 = 𝟎𝟎. 𝟏𝟏) 1 9 𝟏𝟏 𝟏𝟏𝟏𝟏 0 … … SCHEMATIC OVERVIEW
  • 85. c z G D x Real or Fake? add an extra “code” variable Diagram of infoGAN 1. Gaussian noise z (source of incompressible noise) 2. latent code c (semantic feature of data distribution) * SCHEMATIC OVERVIEW
  • 86. c z G D x I Real or Fake? Mutual Info. infoGAN : maximize I(c,G(z,c)) Diagram of infoGAN Impose an extra constraint to learn disentangled feature space SCHEMATIC OVERVIEW
  • 87. “The information in the latent code c should not be lost in the generation process.” c z G D x I Real or Fake? Mutual Info. infoGAN : maximize I(c,G(z,c)) Diagram of infoGAN Impose an extra constraint to learn disentangled feature space SCHEMATIC OVERVIEW
  • 88. INFOGAN * Figure adopted from Wikipedia “Mutual Information” Changed Minimax problem: Mutual Information: ∴ We need to minimize the entropy of where .
  • 89. INFOGAN * Figure adopted from Wikipedia “Mutual Information” Changed Minimax problem: Mutual Information: ∴ We need to minimize the entropy of where . Uncertainty
  • 90. INFOGAN * Figure adopted from Wikipedia “Mutual Information” Changed Minimax problem: Mutual Information: ∴ We need to minimize the entropy of where . Uncertainty * intractable
  • 91. VARIATIONAL INFORMATION MAXIMIZATION Changed Minimax problem: Let’s MAXIMIZE the LOWER BOUND which is tractable ! QP
  • 92. RESULTS MNIST dataset * Figure adopted from infoGAN paper (link)
  • 93. RESULTS 3D FACE dataset * Figure adopted from infoGAN paper (link)
  • 94. RESULTS * Figure adopted from infoGAN paper (link)
  • 95. RESULTS * Figure adopted from infoGAN paper (link)
  • 97. SUMMARY 1. Add an additional constraint to improve the performance • Mutual information • No adding on the computational cost 2. Learn better feature space 3. Unsupervised way to learn implicit features in the dataset 4. Variational method
  • 100. 𝒇𝒇-GAN LET’S USE f-DIVERGENCE RATHER THAN FIXING A SINGLE ONE. Here, I have heavily reused the slides from S. Nowozin’s (1st author) NIPS 2016 workshop for GAN. You can easily find the related information (slides) at: http://www.nowozin.net/sebastian/blog/nips-2016-generative-adversarial-training-workshop-talk.html
  • 101. 𝑄𝑄 𝑃𝑃 𝒫𝒫 MOTIVATION LEARNING PROBABILISTIC MODELS * Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
  • 102. 𝑃𝑃 𝑄𝑄 𝒫𝒫 Assumptions on P : • tractable sampling • tractable parameter gradient with respect to sample • tractable likelihood function MOTIVATION LEARNING PROBABILISTIC MODELS * Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
  • 103. [Goodfellow et al., 2014] 𝑧𝑧 → Lin 100,1200 → ReLU → Lin 1200,1200 → ReLU → Lin(1200,784) → Sigmoid Random input Generator Output 𝑧𝑧 ~ Uniform100 MOTIVATION Likelihood-free Model
  • 104. • P: Expectation • Q: Expectation • Structure in ℱ • Examples: • Energy statistic [Szekely, 1997] • Kernel MMD [Gretton et al., 20 12], [Smola et al., 2007] • Wasserstein distance [Cuturi, 20 13] • DISCO Nets [Bouchacourt et al., 2016] Integral Probability Metrics [Müller, 1997] [Sriperumbudur et al., 2010] 𝛾𝛾ℱ 𝑃𝑃, 𝑄𝑄 = sup 𝑓𝑓∈ℱ � 𝑓𝑓d𝑃𝑃 − � 𝑓𝑓d𝑄𝑄 Proper scoring rules [Gneiting and Raftery, 2007] 𝑆𝑆 𝑃𝑃, 𝑄𝑄 = � 𝑆𝑆 𝑃𝑃, 𝑥𝑥 d𝑄𝑄(𝑥𝑥) • P: Distribution • Q: Expectation • Examples: • Log-likelihood [Fisher, 1922], [Good, 1952] • Quadratic score [Bernardo, 1979] f-divergences [Ali and Silvey, 1966] 𝐷𝐷𝑓𝑓 𝑃𝑃 ∥ 𝑄𝑄 = � 𝑞𝑞 𝑥𝑥 𝑓𝑓 𝑝𝑝(𝑥𝑥) 𝑞𝑞(𝑥𝑥) d𝑥𝑥 • P: Distribution • Q: Distribution • Examples: • Kullback-Leibler divergence [Kullback and Leibler, 1952] • Jensen-Shannon divergence • Total variation • Pearson 𝜒𝜒2 LEARNING PROBABILISTIC MODELS SCHEMATIC OVERVIEW
  • 105. • P: Distribution • Q: Expectation • P: Expectation • Q: Expectation • P: Distribution • Q: Distribution [Nguyen et al., 2010], [Reid and Williamson, 2011], [Goodfellow et a l., 2014] Variational representation of divergences LEARNING PROBABILISTIC MODELS SCHEMATIC OVERVIEW
  • 106. Neural Sampler samples Training samples How do we measure the distance only based on empirical samples from 𝑷𝑷𝜽𝜽(𝒙𝒙) and 𝐐𝐐(𝒙𝒙)? TRAINING NEURAL SAMPLERS SCHEMATIC OVERVIEW * Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
  • 107. “Show that the GAN approach is a special case of an existing more general variational divergence estimation approach.” Let’s generalize the GAN objective to arbitrary 𝒇𝒇-divergences! SCHEMATIC OVERVIEW
  • 108. Neural Sampler distribution True distribution 𝑃𝑃𝜃𝜃 𝑥𝑥 𝑄𝑄 𝑥𝑥 We can minimize some distance (divergence) between the distributions if we had 𝑷𝑷𝜽𝜽(𝒙𝒙) and 𝑸𝑸(𝑥𝑥) TRAINING NEURAL SAMPLERS SCHEMATIC OVERVIEW * Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
  • 109. • Divergence between two distributions 𝑫𝑫𝒇𝒇 𝑸𝑸 ∥ 𝑷𝑷 = � 𝓧𝓧 𝒑𝒑 𝒙𝒙 𝒇𝒇 𝒒𝒒(𝒙𝒙) 𝒑𝒑(𝒙𝒙) 𝒅𝒅𝒅𝒅 • f : generator function, convex, f (1) = 0 [Ali and Silvey, 1966] 𝑓𝑓-DIVERGENCE
  • 110. TRY WHAT YOU WANT! (골라먹는 재미) 𝑓𝑓-DIVERGENCE
  • 111. • Divergence between two distributions 𝐷𝐷𝑓𝑓 𝑄𝑄 ∥ 𝑃𝑃 = � 𝒳𝒳 𝑝𝑝 𝑥𝑥 𝑓𝑓 𝑞𝑞(𝑥𝑥) 𝑝𝑝(𝑥𝑥) d𝑥𝑥 • Every convex function 𝑓𝑓 has a Fenchel conjugate 𝑓𝑓∗ so that 𝑓𝑓 𝑢𝑢 = sup 𝑡𝑡∈dom𝑓𝑓∗ 𝑡𝑡𝑢𝑢 − 𝑓𝑓∗ (𝑡𝑡) [Nguyen, Wainwright, Jordan, 2010] “Any convex f can be represented as point-wise max of linear functions” Estimating 𝑓𝑓-divergences from samples 𝑓𝑓-DIVERGENCE
  • 112.
  • 113. 𝐷𝐷𝑓𝑓 𝑄𝑄 ∥ 𝑃𝑃 = � 𝒳𝒳 𝑝𝑝 𝑥𝑥 𝑓𝑓 𝑞𝑞(𝑥𝑥) 𝑝𝑝(𝑥𝑥) d𝑥𝑥 = � 𝒳𝒳 𝑝𝑝 𝑥𝑥 sup 𝑡𝑡∈dom𝑓𝑓∗ 𝑡𝑡 𝑞𝑞(𝑥𝑥) 𝑝𝑝(𝑥𝑥) − 𝑓𝑓∗(𝑡𝑡) d𝑥𝑥 ≥ sup 𝑇𝑇∈𝒯𝒯 � 𝒳𝒳 𝑞𝑞 𝑥𝑥 𝑇𝑇 𝑥𝑥 d𝑥𝑥 − � 𝒳𝒳 𝑝𝑝 𝑥𝑥 𝑓𝑓∗ 𝑇𝑇 𝑥𝑥 d𝑥𝑥 = sup 𝑇𝑇∈𝒯𝒯 𝔼𝔼𝑥𝑥~𝑄𝑄 𝑇𝑇(𝑥𝑥) − 𝔼𝔼𝑥𝑥~𝑃𝑃[𝑓𝑓∗(𝑇𝑇(𝑥𝑥))] Approximate using: samples from Q samples from P Estimating 𝑓𝑓-divergences from samples (cont) 𝑓𝑓-DIVERGENCE * Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
  • 114. • GAN min 𝜃𝜃 max 𝜔𝜔 𝔼𝔼𝑥𝑥~𝑄𝑄[log 𝐷𝐷𝜔𝜔 𝑥𝑥 ] + 𝔼𝔼𝑥𝑥~𝑃𝑃𝜃𝜃 [log(1 − 𝐷𝐷𝜔𝜔(𝑥𝑥))] • 𝑓𝑓-GAN min 𝜃𝜃 max 𝜔𝜔 𝔼𝔼𝑥𝑥~𝑄𝑄 𝑇𝑇𝜔𝜔 (𝑥𝑥) − 𝔼𝔼𝑥𝑥~𝑃𝑃𝜃𝜃 [𝑓𝑓∗ (𝑇𝑇𝜔𝜔(𝑥𝑥))] • GAN discriminator-variational function correspondence: log𝐷𝐷𝜔𝜔 𝑥𝑥 = 𝑇𝑇𝜔𝜔 𝑥𝑥 • GAN minimizes the Jensen-Shannon divergence (which was also pointed out in Goodfellow et al., 2014) 𝑓𝑓-GAN and GAN objectives 𝑓𝑓-GAN * Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
  • 115. min 𝜃𝜃 max 𝜔𝜔 𝔼𝔼𝑥𝑥~𝑄𝑄 𝑔𝑔𝑓𝑓(𝑉𝑉𝜔𝜔 𝑥𝑥 ) + 𝔼𝔼𝑥𝑥~𝑃𝑃𝜃𝜃 −𝑓𝑓∗ 𝑔𝑔𝑓𝑓 𝑉𝑉𝜔𝜔 𝑥𝑥 Comparison of the objectives 𝑓𝑓-GAN * Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.
  • 116. • Double-loop algorithm [Goodfellow et al., 2014] • Algorithm: • Inner loop: tighten divergence lower bound • Outer loop: minimize generator loss • In practice the inner loop is run only for one step (two backprops) • Missing justification for this practice • Single-step algorithm (proposed) • Algorithm: simultaneously take (one backprop) • a positive gradient step w.r.t. variational function 𝑇𝑇𝜔𝜔(𝑥𝑥) • a negative gradient step w.r.t. generator function 𝑃𝑃𝜃𝜃 𝑥𝑥 • Does this converge? THEORETICAL RESULTS Algorithm: Double-Loop versus Single-Step
  • 117. GENERAL ALGORITHM f-GAN * Please note that in S. Nowozin’s paper, P represents the real distribution and 𝑄𝑄𝜃𝜃 stands for the parametric model we set.
  • 119. • Assumptions • F is locally (strongly) convex with respect to 𝜃𝜃 • F is (strongly) concave with respect to 𝜔𝜔 • Local convergence: Define 𝐽𝐽 𝜃𝜃, 𝜔𝜔 = 1 2 𝛻𝛻𝜃𝜃 𝐹𝐹 2 + 1 2 𝛻𝛻𝜔𝜔 𝐹𝐹 2, then 𝐽𝐽 𝜃𝜃𝑡𝑡 , 𝜔𝜔𝑡𝑡 ≤ 1 − 𝛿𝛿2 𝐿𝐿 𝑡𝑡 𝐽𝐽 𝜃𝜃0 , 𝜔𝜔0 𝛻𝛻2 𝐹𝐹 = 𝛻𝛻𝜃𝜃 2 𝐹𝐹 𝛻𝛻𝜃𝜃 𝛻𝛻𝜔𝜔 𝐹𝐹 𝛻𝛻𝜔𝜔 𝛻𝛻𝜃𝜃 𝐹𝐹 𝛻𝛻𝜔𝜔 2 𝐹𝐹 𝛻𝛻𝜃𝜃 2 𝐹𝐹 ≻ 0, 𝛻𝛻𝜔𝜔 2 𝐹𝐹 ≺ 0 𝛿𝛿: strong convexity parameter, L: smoothness parameter Geometric rate of convergence! THEORETICAL RESULTS Local convergence of the algorithm 1
  • 120. 𝑽𝑽 𝒙𝒙, 𝒚𝒚 = 𝒙𝒙𝒙𝒙 + 𝜹𝜹 𝟐𝟐 (𝒙𝒙𝟐𝟐 − 𝒚𝒚𝟐𝟐) 𝑽𝑽 𝒙𝒙, 𝒚𝒚 = 𝒙𝒙𝒚𝒚𝟐𝟐 + 𝜹𝜹 𝟐𝟐 (𝒙𝒙𝟐𝟐 − 𝒚𝒚𝟐𝟐) VIDEOS
  • 121. RESULTS Synthetic 1D Univariate Approximate a mixture of Gaussians by a Gaussian to • Validate the approach • Demonstrate the properties of different divergences [Minka, 2005] Compare the exact optimization of the divergence with the GAN approach * Please note that in S. Nowozin’s paper, P represents the real distribution and 𝑄𝑄𝜃𝜃 stands for the parametric model we set.
  • 122. RESULTS * Figure adopted from f-GAN paper (link) Synthetic 1D Univariate * Please note that in S. Nowozin’s paper, P represents the real distribution and 𝑄𝑄𝜃𝜃 stands for the parametric model we set.
  • 123. RESULTS * Figure adopted from f-GAN paper (link)
  • 124.
  • 125. SUMMARY • Generalize GAN objective to arbitrary 𝒇𝒇-divergences • Simplify GAN algorithm + local convergence proof • Demonstrate different divergences
  • 126. ETC. WHY GAN GENERATES SHARPER IMAGES? * Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016
  • 127. ETC. WHY GAN GENERATES SHARPER IMAGES? * Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016
  • 128. • LSUN experiment: No (visually) • Empirical contradiction to intuition from [Theis et al., 2015], [Huszar, 2015] • Why? • Intuition: strong inductive bias of model class Q ETC. DOES THE DIVERGENCE MATTER?
  • 129. EBGAN LET’S USE ENERGY-BASED MODEL FOR THE DISCRIMINATOR
  • 130. MOTIVATION * Figure adopted from Yann Lecun’s slides, NIPS 2016 (link) We want to learn the data manifold!
  • 131. MOTIVATION * Figure adopted from Yann Lecun’s slides, NIPS 2016 (link) We want to learn the data manifold! ⟺ Do not want to give penalty to both 𝒚𝒚 and �𝒚𝒚 .
  • 132. MOTIVATION * Figure adopted from Yann Lecun’s slides, NIPS 2016 (link) We want to learn the data manifold! ⟺ Do not want to give penalty to both 𝒚𝒚 and �𝒚𝒚 . ⟺ Pen can fall either way. (there are no wrong way to fall)
  • 133. MOTIVATION ENERGY-BASED MODEL Energy based models capture dependencies between variables by associating a scalar energy (a measure of compatibility) to each configuration of the variables. Inference, i.e., making a prediction or decision, consists in setting the value of observed variables and finding values of the remaining variables that minimize the energy. Learning consists in finding an energy function that associates low energies to correct values of the remaining variables, and higer energies to incorrect values. A loss functional, minimized during learning, is used to measure the quality of the available energy functions. * Quoted from “A Tutorial on Energy-Based Learning”, Yann Lecun et al., 2006
  • 134. Energy based models capture dependencies between variables by associating a scalar energy (a measure of compatibility) to each configuration of the variables. Inference, i.e., making a prediction or decision, consists in setting the value of observed variables and finding values of the remaining variables that minimize the energy. Learning consists in finding an energy function that associates low energies to correct values of the remaining variables, and higer energies to incorrect values. A loss functional, minimized during learning, is used to measure the quality of the available energy functions. MOTIVATION ENERGY-BASED MODEL WITHIN the COMMON inference/learning FRAMEWORK, the wide choice of energy functions and loss functionals allows for the design of many types of statistical models, both probabilistic and non-probabilistic. * Quoted from “A Tutorial on Energy-Based Learning”, Yann Lecun et al., 2006
  • 135. MOTIVATION ENERGY-BASED MODEL * Figure adopted from Yann Lecun’s slides, NIPS 2016 (link) “LET’S USE IT!”
  • 136. MOTIVATION ENERGY-BASED MODEL * Figure adopted from Yann Lecun’s slides, NIPS 2016 (link) “BUT HOW do we choose where to push up?“
  • 137. MOTIVATION * Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
  • 138. MOTIVATION * Figure adopted from Yann Lecun’s slides, NIPS 2016 (link) LIMITATIONS → Limit the space → Every interesting case is intractable → How to pick the point to push up? → Limit the model or space ⋮ ⋮
  • 139. MOTIVATION * Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
  • 140. MOTIVATION * Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)
  • 141. SCHEMATIC OVERVIEW * Slides from Yann Lecun’s talk in NIPS 2016 (link)
  • 142. * Slides from Yann Lecun’s talk in NIPS 2016 (link) SCHEMATIC OVERVIEW
  • 143. * Slides from Yann Lecun’s talk in NIPS 2016 (link) SCHEMATIC OVERVIEW
  • 144. Architecture: discriminator is an auto-encoder Loss functions: EBGAN * Slides from Yann Lecun’s talk in NIPS 2016 (link)
  • 145. Architecture: discriminator is an auto-encoder Loss functions: hinge loss EBGAN * Slides from Yann Lecun’s talk in NIPS 2016 (link)
  • 146. THEORETICAL RESULTS * Slides from Yann Lecun’s talk in NIPS 2016 (link)
  • 147. RESULTS * Slides from Yann Lecun’s talk in NIPS 2016 (link)
  • 148. RESULTS * Slides from Yann Lecun’s talk in NIPS 2016 (link)
  • 149. SUMMARY 1. New framework using an energy-based model • Discriminator as an energy function • Low values on the data manifold • Higher values everywhere else • Generator produce contrastive samples 2. Stable learning
  • 150. WGAN
  • 151. MOTIVATION What does it mean to learn a probability model? : we learned it from 𝒇𝒇-GAN! 𝑄𝑄 𝑃𝑃𝜃𝜃 𝒫𝒫 LEARNING PROBABILISTIC MODELS Assumptions on P : tractable sampling, parameter gradient with respect to sample, likelihood function
  • 152. MOTIVATION What does it mean to learn a probability model? : we learned it from 𝒇𝒇-GAN! 𝑄𝑄 𝑃𝑃𝜃𝜃 𝒫𝒫 LEARNING PROBABILISTIC MODELS Assumptions on P : tractable sampling, parameter gradient with respect to sample, likelihood function … 𝒎𝒎𝒎𝒎 𝒎𝒎 𝜽𝜽 𝑲𝑲𝑲𝑲[𝑸𝑸| 𝑷𝑷𝜽𝜽 ⟺ 𝒎𝒎𝒎𝒎𝒎𝒎 𝜽𝜽 � 𝒊𝒊=𝟏𝟏 𝑵𝑵 𝒍𝒍𝒍𝒍𝒍𝒍 𝑷𝑷 𝒙𝒙𝒊𝒊 𝜽𝜽 QP
  • 153. MOTIVATION What if the supports of two distribution does not overlap? 𝑰𝑰𝑰𝑰 𝑲𝑲𝑲𝑲[𝑸𝑸| 𝑷𝑷𝜽𝜽 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅? QP
  • 154. MOTIVATION What if the supports of two distribution does not overlap? 𝑨𝑨𝑨𝑨𝑨𝑨 𝒂𝒂 𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕𝒕𝒕 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 QP * Note that this is a very rough explanation
  • 155. MOTIVATION What if the supports of two distribution does not overlap? 𝑨𝑨𝑨𝑨𝑨𝑨 𝒂𝒂 𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕𝒕𝒕 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 QP TEMPORARY? OR UNSATISFYING SOLUTION * Note that this is a very rough explanation
  • 156. • P: Expectation • Q: Expectation • Structure in ℱ • Examples: • Energy statistic [Szekely, 1997] • Kernel MMD [Gretton et al., 20 12], [Smola et al., 2007] • Wasserstein distance [Cuturi, 20 13] • DISCO Nets [Bouchacourt et al., 2016] Integral Probability Metrics [Müller, 1997] [Sriperumbudur et al., 2010] 𝛾𝛾ℱ 𝑃𝑃, 𝑄𝑄 = sup 𝑓𝑓∈ℱ � 𝑓𝑓d𝑃𝑃 − � 𝑓𝑓d𝑄𝑄 Proper scoring rules [Gneiting and Raftery, 2007] 𝑆𝑆 𝑃𝑃, 𝑄𝑄 = � 𝑆𝑆 𝑃𝑃, 𝑥𝑥 d𝑄𝑄(𝑥𝑥) • P: Distribution • Q: Expectation • Examples: • Log-likelihood [Fisher, 1922], [Good, 1952] • Quadratic score [Bernardo, 1979] f-divergences [Ali and Silvey, 1966] 𝐷𝐷𝑓𝑓 𝑃𝑃 ∥ 𝑄𝑄 = � 𝑞𝑞 𝑥𝑥 𝑓𝑓 𝑝𝑝(𝑥𝑥) 𝑞𝑞(𝑥𝑥) d𝑥𝑥 • P: Distribution • Q: Distribution • Examples: • Kullback-Leibler divergence [Kullback and Leibler, 1952] • Jensen-Shannon divergence • Total variation • Pearson 𝜒𝜒2 LEARNING PROBABILISTIC MODELS REVIEW!
  • 157. [Goodfellow et al., 2014] 𝑧𝑧 → Lin 100,1200 → ReLU → Lin 1200,1200 → ReLU → Lin(1200,784) → Sigmoid Random input Generator Output 𝑧𝑧 ~ Uniform100 REVIEW! Likelihood-free Model
  • 158. [Goodfellow et al., 2014] 𝑧𝑧 → Lin 100,1200 → ReLU → Lin 1200,1200 → ReLU → Lin(1200,784) → Sigmoid Random input Generator Output 𝑧𝑧 ~ Uniform100 REVIEW! WELL KNOWN FOR BEING DELICATE AND UNSTABLE FOR TRAINING Likelihood-free Model
  • 159. • P: Expectation • Q: Expectation • Structure in ℱ • Examples: • Energy statistic [Szekely, 1997] • Kernel MMD [Gretton et al., 20 12], [Smola et al., 2007] • Wasserstein distance [Cuturi, 20 13] • DISCO Nets [Bouchacourt et al., 2016] Integral Probability Metrics [Müller, 1997] [Sriperumbudur et al., 2010] 𝛾𝛾ℱ 𝑃𝑃, 𝑄𝑄 = sup 𝑓𝑓∈ℱ � 𝑓𝑓d𝑃𝑃 − � 𝑓𝑓d𝑄𝑄 Proper scoring rules [Gneiting and Raftery, 2007] 𝑆𝑆 𝑃𝑃, 𝑄𝑄 = � 𝑆𝑆 𝑃𝑃, 𝑥𝑥 d𝑄𝑄(𝑥𝑥) • P: Distribution • Q: Expectation • Examples: • Log-likelihood [Fisher, 1922], [Good, 1952] • Quadratic score [Bernardo, 1979] f-divergences [Ali and Silvey, 1966] 𝐷𝐷𝑓𝑓 𝑃𝑃 ∥ 𝑄𝑄 = � 𝑞𝑞 𝑥𝑥 𝑓𝑓 𝑝𝑝(𝑥𝑥) 𝑞𝑞(𝑥𝑥) d𝑥𝑥 • P: Distribution • Q: Distribution • Examples: • Kullback-Leibler divergence [Kullback and Leibler, 1952] • Jensen-Shannon divergence • Total variation • Pearson 𝜒𝜒2 LEARNING PROBABILISTIC MODELS REVIEW!
  • 161. THEORETIC RESULTS Different distances Slide courtesy of Sungbin Lim, DeepBio, 2017
  • 162. THEORETIC RESULTS Different distances Slide courtesy of Sungbin Lim, DeepBio, 2017
  • 163. THEORETIC RESULTS Different distances Slide courtesy of Sungbin Lim, DeepBio, 2017
  • 164. THEORETIC RESULTS Different distances Slide courtesy of Sungbin Lim, DeepBio, 2017
  • 165. THEORETIC RESULTS Slide courtesy of Sungbin Lim, DeepBio, 2017
  • 166. THEORETIC RESULTS Slide courtesy of Sungbin Lim, DeepBio, 2017
  • 167. THEORETIC RESULTS Slide courtesy of Sungbin Lim, DeepBio, 2017
  • 168. Slide courtesy of Sungbin Lim, DeepBio, 2017 THEORETIC RESULTS
  • 169. THEORETIC RESULTS Slide courtesy of Sungbin Lim, DeepBio, 2017
  • 171. Slide courtesy of Sungbin Lim, DeepBio, 2017 WASSERSTEIN DISTANCE
  • 172. Slide courtesy of Sungbin Lim, DeepBio, 2017 WASSERSTEIN DISTANCE
  • 173. THEORETIC RESULTS * Figure adopted from WGAN paper (link)
  • 174. THEORETIC RESULTS Slide courtesy of Sungbin Lim, DeepBio, 2017
  • 175. THEORETIC RESULTS Slide courtesy of Sungbin Lim, DeepBio, 2017
  • 176. THEORETIC RESULTS Wasserstein distance is a continuous function on 𝜽𝜽 under mild assumption!
  • 179. THEORETIC RESULTS Change the problem with its dual problem which is tractable (somehow)!
  • 180. THEORETIC RESULTS Okay! We can use the neural network! (up to a constant)
  • 182. RESULTS * Figure adopted from WGAN paper (link)
  • 183. RESULTS * Figure adopted from WGAN paper (link)
  • 184. RESULTS * Figure adopted from WGAN paper (link) Stable without Batch normalization!
  • 185. RESULTS * Figure adopted from WGAN paper (link) Stable without DCGAN structure (generator)!
  • 186. SUMMARY 1. Provide a comprehensive theoretical analysis of how the EM distance behaves in comparison to the others 2. Introduce Wasserstein-GAN that minimizes a reasonable and efficient approximation of the EM distance. 3. Empirically show that WGANs cure the main training problems of GANs (e.g. stability, power balance, mode-collapsing) 4. Evaluation criteria (learning curve)
  • 187. IMPLEMENTATION THAT SIMPLE! (after all those mathematics…)
  • 188. • 거의 모든 GAN에 대한 구현이 Pytorch와 tensorflow 버전으 로 구현되어있는 repo: https://github.com/wiseodd/generative-models IMPLEMENTATION
  • 190. MLE & KL DIVERGENCE