Deep Generative Learning for All

Universitat Politècnica de Catalunya
Universitat Politècnica de CatalunyaAssociate Professor at Universitat Politècnica de Catalunya en Universitat Politècnica de Catalunya
Deep Generative
Learning for All
(a.k.a. The GenAI Hype)
Xavier Giro-i-Nieto
@DocXavi
xavigiro.upc@gmail.com
Associate Professor (on leave)
Universitat Politècnica de Catalunya
Institut de Robòtica Industrial
ELLIS Unit Barcelona
Spring 2020
[Summer School website]
2
Acknowledgements
Santiago Pascual
santi.pascual@upc.edu
@santty128
PhD 2019
Universitat Politecnica de Catalunya
Technical University of Catalonia
Albert Pumarola
apumarola@iri.upc.edu
@AlbertPumarola
PhD 2021
Universitat Politècnica de Catalunya
Technical University of Catalonia
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
Gerard I. Gállego
PhD Student
Universitat Politècnica de Catalunya
gerard.ion.gallego@upc.edu
@geiongallego
3
Acknowledgements
Eduard Ramon
Applied Scientist
Amazon Barcelona
@eram1205
Wentong Liao
Applied Scientist
Amazon Barcelona
Ciprian Corneanu
Applied Scientist
Amazon Seattle
Laia Tarrés
PhD Student
Universitat Politècnica de Catalunya
laia.tarres@upc.edu
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Image generation
5
#StyleGAN3 (NVIDIA) Karras, Tero, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and
Timo Aila. "Alias-free generative adversarial networks." NeurIPS 2021. [code]
6
#DiT Peebles, William, and Saining Xie. "Scalable Diffusion Models with Transformers." arXiv 2022.
Image generation
7
#DALL-E-2 (OpenAI) Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen "Hierarchical Text-Conditional
Image Generation with CLIP Latents." 2022. [blog]
Text-to-Image generation
8
Text-to-Video generation
#Make-a-video (Meta) Singer, Uriel, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu et al.
"Make-a-video: Text-to-video generation without text-video data." arXiv 2022.
“A dog wearing a Superhero
outfit with red cape flying
through the sky”
Synthetic labels to train discriminative models
9
#BigDatasetGAN Li, Daiqing, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, and Antonio
Torralba. "BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations." arXiv 2022.
Video Super-resolution
10
#TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N. Learning temporal coherence via self-supervision for
GAN-based video generation. ACM Transactions on Graphics 2020.
Human Motion Transfer
11
#EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. Everybody dance now. ICCV 2019.
Speech Enhancement
12
Recover lost information/add enhancing details by learning the natural distribution of audio
samples.
original
enhanced
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
14
Discriminative vs Generative Models
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
a. Pθ
(Y|X): Discriminative Models
b. Pθ
(X): Generative Models
c. Pθ
(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Pθ
(Y|X): Discriminative Models
16
Slide credit:
Albert Pumarola (UPC 2019)
Classification Regression
Text Prob. of being a Potential Customer
Image
Audio Speech Translation
Jim Carrey
What Language?
X=Data
Y=Labels
θ = Model parameters
Discriminative Modeling
Pθ
(Y|X)
17
0.01
0.09
0.9
input
Network (θ) output
class
Figure credit: Javier Ruiz (UPC TelecomBCN)
Discriminative model: Tell me the probability of some ‘Y’ responses given ‘X’
inputs.
Pθ
(Y | X = [pixel1
, pixel2
, …, pixel784
])
Pθ
(Y|X): Discriminative Models
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
19
Slide Concept: Albert Pumarola (UPC 2019)
Pθ
(X): Generative Models
Classification Regression Generative
Text Prob. of being a Potential Customer
“What about Ron magic?” offered Ron.
To Harry, Ron was loud, slow and soft
bird. Harry did not like to think about
birds.
Image
Audio Language Translation
Music Composer and Interpreter
MuseNet Sample
Jim Carrey
What Language?
Discriminative Modeling
Pθ
(Y|X)
Generative Modeling
Pθ
(X)
X=Data
Y=Labels
θ = Model parameters
Each real sample xi
comes from
an M-dimensional probability
distribution P(X).
X = {x1
, x2
, …, xN
}
Pθ
(X): Generative Models
21
1) We want our model with parameters θ to output samples with distribution
Pθ
(X), matching the distribution of our training data P(X).
2) We can sample points from Pθ
(X) plausibly looking how P(X) distributed.
P(X)
Distribution of training data
Pλ,μ,σ
(X)
Distribution of training data
Example: Gaussian Mixture Models (GMM)
Pθ
(X): Generative Models
22
What are the parameters θ we need to estimate in deep neural networks ?
θ = (weights & biases)
output
Network (θ)
?
Pθ
(X): Generative Models
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Pθ
(X|Y): Conditioned Generative Models
Joint probabilities P(X|Y) to
model conditioning variables on
the generative process:
X = {x1
, x2
, …, xN
}
Y = {y1
, y2
, …, yN
}
DOG
CAT
TRUCK
PIZZA
THRILLER
SCI-FI
HISTORY
/aa/
/e/
/o/
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. Generative Adversarial Networks (GANs)
b. Auto-regressive
c. Variational Autoencoders (VAEs)
d. Diffusion
Our learned model should be able to make up new samples from the distribution,
not just copy and paste existing samples!
26
Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
Sampling
Philip Isola, Generative Models of Images. MIT 2023.
Sampling
Slide concept: Albert Pumarola (UPC 2019)
Learn
Sample Out
Training Dataset
Generated Samples
Feature
space
Manifold Pθ
(X)
“Model the data distribution so that we can sample new points out of the
distribution”
Sampling
Sampling
z
Generated Samples
How could we generate diverse samples from a deterministic deep neural network ?
Generator
(θ)
Sampling
Generated Samples
How could we generate diverse samples from a deterministic deep neural network ?
Generator
(θ)
Sample z from a known prior, for example, a multivariate normal distribution N(0, I).
Example: dim(z)=2
x’
z
Slide concept: Albert Pumarola (UPC 2019)
Learn
Training Dataset
Interpolated Samples
Feature
space
Manifold Pθ
(X)
Traversing the learned manifold through interpolation.
Interpolation
Disentanglement
Philip Isola, Generative Models of Images. MIT 2023.
Disentanglement
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Auto-regressive
○ Variational Autoencoders (VAEs)
○ Diffusion
35
Credit: Santiago Pascual [slides] [video]
36
Generator & Discriminator
We have two modules: Generator (G) and Discriminator (D).
● They “fight” against each other during training→ Adversarial Learning
D’s goal:
Classify between real
samples and those
produced by G.
G’s goal:
Fool D to
missclassify.
Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. "Generative Adversarial Nets." NeurIPS 2014.
37
Discriminator
Discriminator network D → binary classifier between real (x) and generated (x’).
samples.
Generated (1)
Discriminator
(θ)
x’
Discriminator
(θ)
x Real (0)
38
Generator
Real world
samples
Database
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
z
Generator & Discriminator
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Auto-regressive
○ Variational Autoencoders (VAEs)
○ Diffusion
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to
detect whether money is real or fake.
100
100
FAKE: It’s
not even
green
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE:
There is no
watermark
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE:
Watermark
should be
rounded
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to
detect whether money is real or fake.
After enough iterations, and if the counterfeiter is good enough (in terms of G network it
means “has enough parameters”), the police should be confused.
REAL?
FAKE?
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Adversarial Training
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
Alternate between training the discriminator and generator
Neural Network
Neural Network
Figure: Kevin McGuinness (DCU)
Adversarial Training: Discriminator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
1. Fix generator weights, draw samples from both real world and generated images
2. Train discriminator to distinguish between real world and generated images
Backprop error to
update discriminator
weights
Figure: Kevin McGuinness (DCU)
Adversarial Training: Discriminator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update discriminator
weights
Figure: Kevin McGuinness (DCU)
In the set up of the figure, which ground truth label for a generated image should we use to train the
discriminator ? Consider a binary encoding of “1” (Real) and “0” (Fake).
Generated
Adversarial Training: Generator
1. Fix discriminator weights
2. Sample from generator by injecting noise.
3. Backprop error through discriminator to update generator weights
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update generator
weights
Figure: Kevin McGuinness (DCU)
Generated
Adversarial Training: Generator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update generator
weights
Figure: Kevin McGuinness (DCU)
In the set up of the figure, which ground truth label for a generated image should we use to train the
generator ? Consider a binary encoding of “1” (Real) and “0” (Fake).
Generated
Adversarial Training: How to make it work ?
Soumith Chintala, “How to train a GAN ? Tips and tricks to make GAN work”. Github 2016.
NeurIPS Barcelona 2016
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Non-Conditional GANs
51
Slide credit: Víctor Garcia
Discriminator
D(·)
Generator
G(·)
Real World
Random
seed (z)
Real/Generated
52
Conditional GANs (cGAN)
Slide credit: Víctor Garcia
Conditional Adversarial Networks
Real World
Real/Generated
Condition
Discriminator
D(·)
Generator
G(·)
53
Learn more about GANs
Ian Goodfellow.
NeurIPS Barcelona 2016.
Mihaela Rosca & Jeff Donahue.
UCL x Deepmind 2020.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Manifold Pθ
(X)
Encode Decode
“Generate”
56
Auto-Encoder (AE)
z
Feature
space
● Learns Pθ
(X) with a reconstruction loss.
● Proposed as a pre-training stage for the encoder (“self-supervised learning”).
57
Auto-Encoder (AE)
Encode Decode
“Generate”
z
Feature
space
Manifold Pθ
(X)
Could we generate new samples by sampling from a normal distribution and
feeding it into the encoder, or the decoder (as in GANs) ?
?
58
Auto-Encoder (AE)
No, because the noise (or encoded noise) would be out of the learned manifold.
Encode Decode
“Generate”
z
Feature
space
Manifold Pθ
(X)
Could we generate new samples by sampling from a normal distribution and
feeding it into the encoder, or the decoder (as in GANs) ?
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
60
Variational Auto-Encoder (AE)
Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013.
Encoder: Predict the mean μ(X) and covariance ∑(X) of a multivariate normal
distribution.
Encode
Encode
Loss term to follow a normal
distribution N(0, I).
61
Source: Wikipedia. Image by Bscan - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=25235145
Maths 101: Multivariate normal distribution
62
Variational Auto-Encoder (AE)
Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013.
Decoder: Trained to reconstruct the input data from a z sampled from N(μ, ∑).
Encode
z
Decode Reconstruction
loss term.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
z
Encode Decode
Challenge:
We cannot backprop through sampling of because “Sampling” is not differentiable!
64
Reparametrization Trick
z
Solution: Reparameterization trick
Sample and define z from it, multiplying by and summing
65
Reparametrization Trick
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Generative behaviour
z
67
How can we now generate new samples once the underlying generating
distribution is learned ?
z1
We can sample from our prior N(0,I), discarding the encoder path.
z2
z3
68
Generative behaviour
69
Generative behaviour
N(0, I)
Example: P(X) can be modelled mapping a simple normal distribution N(0, I) through a
powerful non-linear function g(z).
70
Generative behaviour
#NVAE Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." NeurIPS 2020. [code]
71
Walking around z manifold dimensions gives us spontaneous generation of
samples with different shapes, poses, identities, lightning, etc..
Generative behaviour
Learn more about VAEs
72
Andriy Mnih (UCL - Deepmind 2020)
Max Welling - University of Amsterdam (2020)
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
■ Forward diffusion process
■ Reverse denoising process
○ Auto-regressive
Forward Diffusion Process
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
■ Forward diffusion process
■ Reverse denoising process
○ Auto-regressive
Denoising Autoencoder (DAE)
Encode Decode
“Generate”
#DAE Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. "Extracting and composing robust
features with denoising autoencoders." ICML 2008.
Philip Isola, Generative Models of Images. MIT 2023.
Reverse Denoising process
Data Manifold Pθ
(x0
)
x0
xT
Noise
Image
Network learns to
denoise step by step
CNN
U-net
Reverse Denoising process
What is the dimension of the latent variable in diffusion models ?
Same dimensionality as the diffused data.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive Models (AR)
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Motivation
PixelRNN
An RNN predicts the probability of each sample xi
with a categorical output
distribution: Softmax
83
#PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
PixelRNN
84
#PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
Why are not all completions identical ?
(aka how can AR offer a generative behaviour ?)
PixelCNN
85
#PixelCNN Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., & Graves, A. Conditional image generation with
pixelcnn decoders. NeurIPS 2016.
Wavenet
86
Wavenet used dilated convolutions to produce synthetic audio, sample by
sample, conditioned over by receptive field of size T:
#Wavenet Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal
Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. "Wavenet: A generative model for raw audio." arXiv 2016. [blog]
The Transformer
Figure: Jay Alammar, “The illustrated Transformer” (2018)
#Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention
is all you need. NeurIPS 2017.
Auto-regressive (at test).
The Transformer
Figure: Jay Alammar, “The illustrated Transformer” (2018)
Text completion
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
“GPT-2 is trained with a simple objective: predict the next word, given all of the
previous words within some text.”
Condition Generated completions
In a shocking finding, scientist
discovered a herd of unicorns
living in a remote, previously
unexplored valley, in the Andes
Mountains. Even more surprising to
the researchers was the fact that
the unicorns spoke perfect
English.
The scientist named the population,
after their distinctive horn, Ovid’s
Unicorn. These four-horned, silver-white
unicorns were previously unknown to
science.
Now, after almost two centuries, the
mystery of what sparked this odd
phenomenon is finally solved.
Zero-shot learning
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
GPT-2/3 can also solve tasks for which it was not trained for (zero-shot
learning).
Text Reading Comprehension
The 2008 Summer Olympics torch relay was run from March 24
until August 8, 2008, prior to the 2008 Summer Olympics,
with the theme of “one world, one dream”. Plans for the
relay were announced on April 26, 2007, in Beijing, China.
The relay, also called by the organizers as the “Journey of
Harmony”, lasted 129 days and carried the torch 137,000 km
(85,000 mi) – the longest distance of any Olympic torch
relay since the tradition was started ahead of the 1936
Summer Olympics.
After being lit at the birthplace of the Olympic Games in
Olympia, Greece on March 24, the torch traveled to the
Panathinaiko Stadium in Athens, and then to Beijing,
arriving on March 31. From Beijing, the torch was following
a route passing through six continents. The torch has
visited cities along the Silk Road, symbolizing ancient
links between China and the rest of the world. The relay
also included an ascent with the flame to the top of Mount
Everest on the border of Nepal and Tibet, China from the
Chinese side, which was closed specially for the event.
Q: What was the theme?
A: “one world, one dream”.
Q: What was the length of the race?
A: 137,000 km
Q: Was it larger than previous ones?
A: No
Q: Where did the race begin?
A: Olympia, Greece
Zero-shot learning
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
“GPT-2 is trained with a simple objective: predict the next word, given all of the
previous words within some text.”
Zero-shot task performances
(GPT-2 was never trained for these tasks)
#iGPT Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., & Sutskever, I. Generative Pretraining from Pixels. ICML
2020.
GPT-2 / GPT-3
#ChatGPT [blog]
#GPT-4 (OpenAI) GPT-4 Technical Report. arXiv 2023. [blog]
ChatGPT / GPT-4
Discussion
Learn more about AR models
Nal Kalchbrenner, Mediterranean Machine Learning
Summer School 2022.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive Models (AR)
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
97
Source: David Foster
Recommended books
Interview of David Foster for Machine
Learning Street Talk (2023)
Recommended courses
Deep Unsupervised Learning
(UC Berkeley CS294-158-SP2020)
1 de 99

Recomendados

Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision) por
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
870 vistas26 diapositivas
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018 por
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
1.3K vistas61 diapositivas
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona... por
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
353 vistas42 diapositivas
Generative Models and Adversarial Training (D3L4 2017 UPC Deep Learning for ... por
Generative Models and Adversarial Training  (D3L4 2017 UPC Deep Learning for ...Generative Models and Adversarial Training  (D3L4 2017 UPC Deep Learning for ...
Generative Models and Adversarial Training (D3L4 2017 UPC Deep Learning for ...Universitat Politècnica de Catalunya
1.3K vistas26 diapositivas
Style space analysis paper review ! por
Style space analysis paper review !Style space analysis paper review !
Style space analysis paper review !taeseon ryu
427 vistas21 diapositivas
【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa... por
【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...
【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...Deep Learning JP
286 vistas25 diapositivas

Más contenido relacionado

La actualidad más candente

Introduction to Diffusion Models por
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
5.5K vistas28 diapositivas
Generating Diverse High-Fidelity Images with VQ-VAE-2 por
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2harmonylab
14.1K vistas21 diapositivas
Basic Generative Adversarial Networks por
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial NetworksDong Heon Cho
467 vistas25 diapositivas
Human-level Control Through Deep Reinforcement Learning (Presentation) por
Human-level Control Through Deep Reinforcement Learning (Presentation)Human-level Control Through Deep Reinforcement Learning (Presentation)
Human-level Control Through Deep Reinforcement Learning (Presentation)Muhammed Kocabaş
1.8K vistas46 diapositivas
【機械学習勉強会】画像の翻訳 ”Image-to-Image translation” por
【機械学習勉強会】画像の翻訳 ”Image-to-Image translation” 【機械学習勉強会】画像の翻訳 ”Image-to-Image translation”
【機械学習勉強会】画像の翻訳 ”Image-to-Image translation” yoshitaka373
849 vistas16 diapositivas
PR-409: Denoising Diffusion Probabilistic Models por
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
1.2K vistas28 diapositivas

La actualidad más candente(20)

Introduction to Diffusion Models por Sangwoo Mo
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
Sangwoo Mo5.5K vistas
Generating Diverse High-Fidelity Images with VQ-VAE-2 por harmonylab
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2
harmonylab14.1K vistas
Basic Generative Adversarial Networks por Dong Heon Cho
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
Dong Heon Cho467 vistas
Human-level Control Through Deep Reinforcement Learning (Presentation) por Muhammed Kocabaş
Human-level Control Through Deep Reinforcement Learning (Presentation)Human-level Control Through Deep Reinforcement Learning (Presentation)
Human-level Control Through Deep Reinforcement Learning (Presentation)
Muhammed Kocabaş1.8K vistas
【機械学習勉強会】画像の翻訳 ”Image-to-Image translation” por yoshitaka373
【機械学習勉強会】画像の翻訳 ”Image-to-Image translation” 【機械学習勉強会】画像の翻訳 ”Image-to-Image translation”
【機械学習勉強会】画像の翻訳 ”Image-to-Image translation”
yoshitaka373849 vistas
PR-409: Denoising Diffusion Probabilistic Models por Hyeongmin Lee
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
Hyeongmin Lee1.2K vistas
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima... por Deep Learning JP
 [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima... [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
Deep Learning JP4.6K vistas
Variational Autoencoder por Mark Chang
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang7.7K vistas
(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg... por Sungha Choi
(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...
(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...
Sungha Choi888 vistas
音情報処理における特徴表現 por NU_I_TODALAB
音情報処理における特徴表現音情報処理における特徴表現
音情報処理における特徴表現
NU_I_TODALAB6.2K vistas
RUTILEA社内勉強会第4回 「敵対的生成ネットワーク(GAN)」 por TRUE_RUTILEA
RUTILEA社内勉強会第4回 「敵対的生成ネットワーク(GAN)」RUTILEA社内勉強会第4回 「敵対的生成ネットワーク(GAN)」
RUTILEA社内勉強会第4回 「敵対的生成ネットワーク(GAN)」
TRUE_RUTILEA352 vistas
Reducing the Dimensionality of Data with Neural Networks por Nagayoshi Yamashita
Reducing the Dimensionality of Data with Neural NetworksReducing the Dimensionality of Data with Neural Networks
Reducing the Dimensionality of Data with Neural Networks
Nagayoshi Yamashita1.9K vistas
MobileNet - PR044 por Jinwon Lee
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
Jinwon Lee8.5K vistas
Introduction to Computer Vision using OpenCV por Dylan Seychell
Introduction to Computer Vision using OpenCVIntroduction to Computer Vision using OpenCV
Introduction to Computer Vision using OpenCV
Dylan Seychell1.9K vistas

Similar a Deep Generative Learning for All

GAN - Theory and Applications por
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
9.5K vistas41 diapositivas
EuroSciPy 2019 - GANs: Theory and Applications por
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEmanuele Ghelfi
1.1K vistas41 diapositivas
Lecture17 xing fei-fei por
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-feiTianlu Wang
417 vistas120 diapositivas
Adversarial examples in deep learning (Gregory Chatel) por
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)MeetupDataScienceRoma
1.3K vistas39 diapositivas
Using model-based statistical inference to learn about evolution por
Using model-based statistical inference to learn about evolutionUsing model-based statistical inference to learn about evolution
Using model-based statistical inference to learn about evolutionErick Matsen
1.9K vistas73 diapositivas
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute... por
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Universitat Politècnica de Catalunya
646 vistas73 diapositivas

Similar a Deep Generative Learning for All(20)

GAN - Theory and Applications por Emanuele Ghelfi
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Emanuele Ghelfi9.5K vistas
EuroSciPy 2019 - GANs: Theory and Applications por Emanuele Ghelfi
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and Applications
Emanuele Ghelfi1.1K vistas
Lecture17 xing fei-fei por Tianlu Wang
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-fei
Tianlu Wang417 vistas
Adversarial examples in deep learning (Gregory Chatel) por MeetupDataScienceRoma
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)
MeetupDataScienceRoma1.3K vistas
Using model-based statistical inference to learn about evolution por Erick Matsen
Using model-based statistical inference to learn about evolutionUsing model-based statistical inference to learn about evolution
Using model-based statistical inference to learn about evolution
Erick Matsen1.9K vistas
Distributed Meta-Analysis System por jarising
Distributed Meta-Analysis SystemDistributed Meta-Analysis System
Distributed Meta-Analysis System
jarising8.3K vistas
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B... por NTNU
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
NTNU459 vistas
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B... por Albert Orriols-Puig
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
Albert Orriols-Puig437 vistas
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo... por Codiax
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Codiax161 vistas
ISBA 2022 Susie Bayarri lecture por Pierre Jacob
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
Pierre Jacob448 vistas
Gf o2014talk por Bob O'Hara
Gf o2014talkGf o2014talk
Gf o2014talk
Bob O'Hara721 vistas
Deep Generative Models por Chia-Wen Cheng
Deep Generative Models Deep Generative Models
Deep Generative Models
Chia-Wen Cheng1.7K vistas
Striving to Demystify Bayesian Computational Modelling por Marco Wirthlin
Striving to Demystify Bayesian Computational ModellingStriving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational Modelling
Marco Wirthlin280 vistas
Dirty data science machine learning on non-curated data por Gael Varoquaux
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
Gael Varoquaux20K vistas
Generative Adversarial Networks (GANs) at the Data Science Meetup Luxembourg ... por Chris Hammerschmidt
Generative Adversarial Networks (GANs) at the Data Science Meetup Luxembourg ...Generative Adversarial Networks (GANs) at the Data Science Meetup Luxembourg ...
Generative Adversarial Networks (GANs) at the Data Science Meetup Luxembourg ...
Chris Hammerschmidt232 vistas
Algoritma genetika por Hendra Arie
Algoritma genetikaAlgoritma genetika
Algoritma genetika
Hendra Arie168 vistas

Más de Universitat Politècnica de Catalunya

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto por
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
291 vistas94 diapositivas
The Transformer - Xavier Giró - UPC Barcelona 2021 por
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021Universitat Politècnica de Catalunya
259 vistas53 diapositivas
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI... por
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
183 vistas92 diapositivas
Open challenges in sign language translation and production por
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and productionUniversitat Politècnica de Catalunya
187 vistas83 diapositivas
Generation of Synthetic Referring Expressions for Object Segmentation in Videos por
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
522 vistas42 diapositivas
Discovery and Learning of Navigation Goals from Pixels in Minecraft por
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftUniversitat Politècnica de Catalunya
193 vistas40 diapositivas

Más de Universitat Politècnica de Catalunya(20)

Último

BLOTTING TECHNIQUES SPECIAL por
BLOTTING TECHNIQUES SPECIALBLOTTING TECHNIQUES SPECIAL
BLOTTING TECHNIQUES SPECIALMuhammadImranMirza2
14 vistas56 diapositivas
Exploring the nature and synchronicity of early cluster formation in the Larg... por
Exploring the nature and synchronicity of early cluster formation in the Larg...Exploring the nature and synchronicity of early cluster formation in the Larg...
Exploring the nature and synchronicity of early cluster formation in the Larg...Sérgio Sacani
1.5K vistas12 diapositivas
selection of preformed arch wires during the alignment stage of preadjusted o... por
selection of preformed arch wires during the alignment stage of preadjusted o...selection of preformed arch wires during the alignment stage of preadjusted o...
selection of preformed arch wires during the alignment stage of preadjusted o...MaherFouda1
8 vistas100 diapositivas
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... por
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...SwagatBehera9
6 vistas36 diapositivas
Presentation on experimental laboratory animal- Hamster por
Presentation on experimental laboratory animal- HamsterPresentation on experimental laboratory animal- Hamster
Presentation on experimental laboratory animal- HamsterKanika13641
6 vistas8 diapositivas
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptx por
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptxCyanobacteria as a Biofertilizer (BY- Ayushi).pptx
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptxAyushiKardam
5 vistas13 diapositivas

Último(20)

Exploring the nature and synchronicity of early cluster formation in the Larg... por Sérgio Sacani
Exploring the nature and synchronicity of early cluster formation in the Larg...Exploring the nature and synchronicity of early cluster formation in the Larg...
Exploring the nature and synchronicity of early cluster formation in the Larg...
Sérgio Sacani1.5K vistas
selection of preformed arch wires during the alignment stage of preadjusted o... por MaherFouda1
selection of preformed arch wires during the alignment stage of preadjusted o...selection of preformed arch wires during the alignment stage of preadjusted o...
selection of preformed arch wires during the alignment stage of preadjusted o...
MaherFouda18 vistas
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... por SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera96 vistas
Presentation on experimental laboratory animal- Hamster por Kanika13641
Presentation on experimental laboratory animal- HamsterPresentation on experimental laboratory animal- Hamster
Presentation on experimental laboratory animal- Hamster
Kanika136416 vistas
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptx por AyushiKardam
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptxCyanobacteria as a Biofertilizer (BY- Ayushi).pptx
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptx
AyushiKardam5 vistas
Gel Filtration or Permeation Chromatography por Poonam Aher Patil
Gel Filtration or Permeation ChromatographyGel Filtration or Permeation Chromatography
Gel Filtration or Permeation Chromatography
Poonam Aher Patil12 vistas
2. Natural Sciences and Technology Author Siyavula.pdf por ssuser821efa
2. Natural Sciences and Technology Author Siyavula.pdf2. Natural Sciences and Technology Author Siyavula.pdf
2. Natural Sciences and Technology Author Siyavula.pdf
ssuser821efa13 vistas
RADIATION PHYSICS.pptx por drpriyanka8
RADIATION PHYSICS.pptxRADIATION PHYSICS.pptx
RADIATION PHYSICS.pptx
drpriyanka815 vistas
Real Science Radio - Dr Paul Homan Climate Change.pptx por Fred Williams
Real Science Radio - Dr Paul Homan Climate Change.pptxReal Science Radio - Dr Paul Homan Climate Change.pptx
Real Science Radio - Dr Paul Homan Climate Change.pptx
Fred Williams8 vistas
Vegetable grafting: A new crop improvement approach.pptx por Himul Suthar
Vegetable grafting: A new crop improvement approach.pptxVegetable grafting: A new crop improvement approach.pptx
Vegetable grafting: A new crop improvement approach.pptx
Himul Suthar9 vistas
Geometrical qualities of the generalised Schwarzschild spacetimes por Orchidea Maria Lecian
Geometrical qualities of the generalised Schwarzschild spacetimesGeometrical qualities of the generalised Schwarzschild spacetimes
Geometrical qualities of the generalised Schwarzschild spacetimes
Determination of color fastness to rubbing(wet and dry condition) by crockmeter. por ShadmanSakib63
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
ShadmanSakib638 vistas
INTRODUCTION TO PLANT SYSTEMATICS.pptx por RASHMI M G
INTRODUCTION TO PLANT SYSTEMATICS.pptxINTRODUCTION TO PLANT SYSTEMATICS.pptx
INTRODUCTION TO PLANT SYSTEMATICS.pptx
RASHMI M G 5 vistas
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy... por Anmol Vishnu Gupta
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Paper Chromatography or Paper partition chromatography por Poonam Aher Patil
Paper Chromatography or Paper partition chromatographyPaper Chromatography or Paper partition chromatography
Paper Chromatography or Paper partition chromatography

Deep Generative Learning for All

  • 1. Deep Generative Learning for All (a.k.a. The GenAI Hype) Xavier Giro-i-Nieto @DocXavi xavigiro.upc@gmail.com Associate Professor (on leave) Universitat Politècnica de Catalunya Institut de Robòtica Industrial ELLIS Unit Barcelona Spring 2020 [Summer School website]
  • 2. 2 Acknowledgements Santiago Pascual santi.pascual@upc.edu @santty128 PhD 2019 Universitat Politecnica de Catalunya Technical University of Catalonia Albert Pumarola apumarola@iri.upc.edu @AlbertPumarola PhD 2021 Universitat Politècnica de Catalunya Technical University of Catalonia Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Gerard I. Gállego PhD Student Universitat Politècnica de Catalunya gerard.ion.gallego@upc.edu @geiongallego
  • 3. 3 Acknowledgements Eduard Ramon Applied Scientist Amazon Barcelona @eram1205 Wentong Liao Applied Scientist Amazon Barcelona Ciprian Corneanu Applied Scientist Amazon Seattle Laia Tarrés PhD Student Universitat Politècnica de Catalunya laia.tarres@upc.edu
  • 4. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 5. Image generation 5 #StyleGAN3 (NVIDIA) Karras, Tero, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. "Alias-free generative adversarial networks." NeurIPS 2021. [code]
  • 6. 6 #DiT Peebles, William, and Saining Xie. "Scalable Diffusion Models with Transformers." arXiv 2022. Image generation
  • 7. 7 #DALL-E-2 (OpenAI) Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen "Hierarchical Text-Conditional Image Generation with CLIP Latents." 2022. [blog] Text-to-Image generation
  • 8. 8 Text-to-Video generation #Make-a-video (Meta) Singer, Uriel, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu et al. "Make-a-video: Text-to-video generation without text-video data." arXiv 2022. “A dog wearing a Superhero outfit with red cape flying through the sky”
  • 9. Synthetic labels to train discriminative models 9 #BigDatasetGAN Li, Daiqing, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, and Antonio Torralba. "BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations." arXiv 2022.
  • 10. Video Super-resolution 10 #TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N. Learning temporal coherence via self-supervision for GAN-based video generation. ACM Transactions on Graphics 2020.
  • 11. Human Motion Transfer 11 #EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. Everybody dance now. ICCV 2019.
  • 12. Speech Enhancement 12 Recover lost information/add enhancing details by learning the natural distribution of audio samples. original enhanced
  • 13. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 14. 14 Discriminative vs Generative Models Philip Isola, Generative Models of Images. MIT 2023.
  • 15. Outline 1. Motivation 2. Discriminative vs Generative Models a. Pθ (Y|X): Discriminative Models b. Pθ (X): Generative Models c. Pθ (X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 16. Pθ (Y|X): Discriminative Models 16 Slide credit: Albert Pumarola (UPC 2019) Classification Regression Text Prob. of being a Potential Customer Image Audio Speech Translation Jim Carrey What Language? X=Data Y=Labels θ = Model parameters Discriminative Modeling Pθ (Y|X)
  • 17. 17 0.01 0.09 0.9 input Network (θ) output class Figure credit: Javier Ruiz (UPC TelecomBCN) Discriminative model: Tell me the probability of some ‘Y’ responses given ‘X’ inputs. Pθ (Y | X = [pixel1 , pixel2 , …, pixel784 ]) Pθ (Y|X): Discriminative Models
  • 18. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 19. 19 Slide Concept: Albert Pumarola (UPC 2019) Pθ (X): Generative Models Classification Regression Generative Text Prob. of being a Potential Customer “What about Ron magic?” offered Ron. To Harry, Ron was loud, slow and soft bird. Harry did not like to think about birds. Image Audio Language Translation Music Composer and Interpreter MuseNet Sample Jim Carrey What Language? Discriminative Modeling Pθ (Y|X) Generative Modeling Pθ (X) X=Data Y=Labels θ = Model parameters
  • 20. Each real sample xi comes from an M-dimensional probability distribution P(X). X = {x1 , x2 , …, xN } Pθ (X): Generative Models
  • 21. 21 1) We want our model with parameters θ to output samples with distribution Pθ (X), matching the distribution of our training data P(X). 2) We can sample points from Pθ (X) plausibly looking how P(X) distributed. P(X) Distribution of training data Pλ,μ,σ (X) Distribution of training data Example: Gaussian Mixture Models (GMM) Pθ (X): Generative Models
  • 22. 22 What are the parameters θ we need to estimate in deep neural networks ? θ = (weights & biases) output Network (θ) ? Pθ (X): Generative Models
  • 23. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 24. Pθ (X|Y): Conditioned Generative Models Joint probabilities P(X|Y) to model conditioning variables on the generative process: X = {x1 , x2 , …, xN } Y = {y1 , y2 , …, yN } DOG CAT TRUCK PIZZA THRILLER SCI-FI HISTORY /aa/ /e/ /o/
  • 25. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. Generative Adversarial Networks (GANs) b. Auto-regressive c. Variational Autoencoders (VAEs) d. Diffusion
  • 26. Our learned model should be able to make up new samples from the distribution, not just copy and paste existing samples! 26 Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow) Sampling
  • 27. Philip Isola, Generative Models of Images. MIT 2023. Sampling
  • 28. Slide concept: Albert Pumarola (UPC 2019) Learn Sample Out Training Dataset Generated Samples Feature space Manifold Pθ (X) “Model the data distribution so that we can sample new points out of the distribution” Sampling
  • 29. Sampling z Generated Samples How could we generate diverse samples from a deterministic deep neural network ? Generator (θ)
  • 30. Sampling Generated Samples How could we generate diverse samples from a deterministic deep neural network ? Generator (θ) Sample z from a known prior, for example, a multivariate normal distribution N(0, I). Example: dim(z)=2 x’ z
  • 31. Slide concept: Albert Pumarola (UPC 2019) Learn Training Dataset Interpolated Samples Feature space Manifold Pθ (X) Traversing the learned manifold through interpolation. Interpolation
  • 32. Disentanglement Philip Isola, Generative Models of Images. MIT 2023.
  • 33. Disentanglement Philip Isola, Generative Models of Images. MIT 2023.
  • 34. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Auto-regressive ○ Variational Autoencoders (VAEs) ○ Diffusion
  • 35. 35 Credit: Santiago Pascual [slides] [video]
  • 36. 36 Generator & Discriminator We have two modules: Generator (G) and Discriminator (D). ● They “fight” against each other during training→ Adversarial Learning D’s goal: Classify between real samples and those produced by G. G’s goal: Fool D to missclassify. Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative Adversarial Nets." NeurIPS 2014.
  • 37. 37 Discriminator Discriminator network D → binary classifier between real (x) and generated (x’). samples. Generated (1) Discriminator (θ) x’ Discriminator (θ) x Real (0)
  • 39. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Auto-regressive ○ Variational Autoencoders (VAEs) ○ Diffusion
  • 40. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: It’s not even green Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 41. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: There is no watermark Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 42. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: Watermark should be rounded Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 43. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. After enough iterations, and if the counterfeiter is good enough (in terms of G network it means “has enough parameters”), the police should be confused. REAL? FAKE? Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 44. Adversarial Training Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Generated Alternate between training the discriminator and generator Neural Network Neural Network Figure: Kevin McGuinness (DCU)
  • 45. Adversarial Training: Discriminator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Generated 1. Fix generator weights, draw samples from both real world and generated images 2. Train discriminator to distinguish between real world and generated images Backprop error to update discriminator weights Figure: Kevin McGuinness (DCU)
  • 46. Adversarial Training: Discriminator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update discriminator weights Figure: Kevin McGuinness (DCU) In the set up of the figure, which ground truth label for a generated image should we use to train the discriminator ? Consider a binary encoding of “1” (Real) and “0” (Fake). Generated
  • 47. Adversarial Training: Generator 1. Fix discriminator weights 2. Sample from generator by injecting noise. 3. Backprop error through discriminator to update generator weights Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update generator weights Figure: Kevin McGuinness (DCU) Generated
  • 48. Adversarial Training: Generator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update generator weights Figure: Kevin McGuinness (DCU) In the set up of the figure, which ground truth label for a generated image should we use to train the generator ? Consider a binary encoding of “1” (Real) and “0” (Fake). Generated
  • 49. Adversarial Training: How to make it work ? Soumith Chintala, “How to train a GAN ? Tips and tricks to make GAN work”. Github 2016. NeurIPS Barcelona 2016
  • 50. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive
  • 51. Non-Conditional GANs 51 Slide credit: Víctor Garcia Discriminator D(·) Generator G(·) Real World Random seed (z) Real/Generated
  • 52. 52 Conditional GANs (cGAN) Slide credit: Víctor Garcia Conditional Adversarial Networks Real World Real/Generated Condition Discriminator D(·) Generator G(·)
  • 53. 53 Learn more about GANs Ian Goodfellow. NeurIPS Barcelona 2016. Mihaela Rosca & Jeff Donahue. UCL x Deepmind 2020.
  • 54. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 55. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 56. Manifold Pθ (X) Encode Decode “Generate” 56 Auto-Encoder (AE) z Feature space ● Learns Pθ (X) with a reconstruction loss. ● Proposed as a pre-training stage for the encoder (“self-supervised learning”).
  • 57. 57 Auto-Encoder (AE) Encode Decode “Generate” z Feature space Manifold Pθ (X) Could we generate new samples by sampling from a normal distribution and feeding it into the encoder, or the decoder (as in GANs) ? ?
  • 58. 58 Auto-Encoder (AE) No, because the noise (or encoded noise) would be out of the learned manifold. Encode Decode “Generate” z Feature space Manifold Pθ (X) Could we generate new samples by sampling from a normal distribution and feeding it into the encoder, or the decoder (as in GANs) ?
  • 59. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 60. 60 Variational Auto-Encoder (AE) Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013. Encoder: Predict the mean μ(X) and covariance ∑(X) of a multivariate normal distribution. Encode Encode Loss term to follow a normal distribution N(0, I).
  • 61. 61 Source: Wikipedia. Image by Bscan - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=25235145 Maths 101: Multivariate normal distribution
  • 62. 62 Variational Auto-Encoder (AE) Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013. Decoder: Trained to reconstruct the input data from a z sampled from N(μ, ∑). Encode z Decode Reconstruction loss term.
  • 63. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 64. z Encode Decode Challenge: We cannot backprop through sampling of because “Sampling” is not differentiable! 64 Reparametrization Trick
  • 65. z Solution: Reparameterization trick Sample and define z from it, multiplying by and summing 65 Reparametrization Trick
  • 66. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive
  • 67. Generative behaviour z 67 How can we now generate new samples once the underlying generating distribution is learned ?
  • 68. z1 We can sample from our prior N(0,I), discarding the encoder path. z2 z3 68 Generative behaviour
  • 69. 69 Generative behaviour N(0, I) Example: P(X) can be modelled mapping a simple normal distribution N(0, I) through a powerful non-linear function g(z).
  • 70. 70 Generative behaviour #NVAE Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." NeurIPS 2020. [code]
  • 71. 71 Walking around z manifold dimensions gives us spontaneous generation of samples with different shapes, poses, identities, lightning, etc.. Generative behaviour
  • 72. Learn more about VAEs 72 Andriy Mnih (UCL - Deepmind 2020) Max Welling - University of Amsterdam (2020)
  • 73. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 74. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ■ Forward diffusion process ■ Reverse denoising process ○ Auto-regressive
  • 75. Forward Diffusion Process Philip Isola, Generative Models of Images. MIT 2023.
  • 76. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ■ Forward diffusion process ■ Reverse denoising process ○ Auto-regressive
  • 77. Denoising Autoencoder (DAE) Encode Decode “Generate” #DAE Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. "Extracting and composing robust features with denoising autoencoders." ICML 2008.
  • 78. Philip Isola, Generative Models of Images. MIT 2023. Reverse Denoising process
  • 79. Data Manifold Pθ (x0 ) x0 xT Noise Image Network learns to denoise step by step CNN U-net Reverse Denoising process What is the dimension of the latent variable in diffusion models ? Same dimensionality as the diffused data.
  • 80. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 81. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Models (AR) Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 83. PixelRNN An RNN predicts the probability of each sample xi with a categorical output distribution: Softmax 83 #PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
  • 84. PixelRNN 84 #PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016. Why are not all completions identical ? (aka how can AR offer a generative behaviour ?)
  • 85. PixelCNN 85 #PixelCNN Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., & Graves, A. Conditional image generation with pixelcnn decoders. NeurIPS 2016.
  • 86. Wavenet 86 Wavenet used dilated convolutions to produce synthetic audio, sample by sample, conditioned over by receptive field of size T: #Wavenet Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. "Wavenet: A generative model for raw audio." arXiv 2016. [blog]
  • 87. The Transformer Figure: Jay Alammar, “The illustrated Transformer” (2018) #Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention is all you need. NeurIPS 2017. Auto-regressive (at test).
  • 88. The Transformer Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 89. Text completion #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. “GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.” Condition Generated completions In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.
  • 90. Zero-shot learning #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. GPT-2/3 can also solve tasks for which it was not trained for (zero-shot learning). Text Reading Comprehension The 2008 Summer Olympics torch relay was run from March 24 until August 8, 2008, prior to the 2008 Summer Olympics, with the theme of “one world, one dream”. Plans for the relay were announced on April 26, 2007, in Beijing, China. The relay, also called by the organizers as the “Journey of Harmony”, lasted 129 days and carried the torch 137,000 km (85,000 mi) – the longest distance of any Olympic torch relay since the tradition was started ahead of the 1936 Summer Olympics. After being lit at the birthplace of the Olympic Games in Olympia, Greece on March 24, the torch traveled to the Panathinaiko Stadium in Athens, and then to Beijing, arriving on March 31. From Beijing, the torch was following a route passing through six continents. The torch has visited cities along the Silk Road, symbolizing ancient links between China and the rest of the world. The relay also included an ascent with the flame to the top of Mount Everest on the border of Nepal and Tibet, China from the Chinese side, which was closed specially for the event. Q: What was the theme? A: “one world, one dream”. Q: What was the length of the race? A: 137,000 km Q: Was it larger than previous ones? A: No Q: Where did the race begin? A: Olympia, Greece
  • 91. Zero-shot learning #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. “GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.” Zero-shot task performances (GPT-2 was never trained for these tasks)
  • 92. #iGPT Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., & Sutskever, I. Generative Pretraining from Pixels. ICML 2020. GPT-2 / GPT-3
  • 93. #ChatGPT [blog] #GPT-4 (OpenAI) GPT-4 Technical Report. arXiv 2023. [blog] ChatGPT / GPT-4
  • 95. Learn more about AR models Nal Kalchbrenner, Mediterranean Machine Learning Summer School 2022.
  • 96. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Models (AR) Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 98. Recommended books Interview of David Foster for Machine Learning Street Talk (2023)
  • 99. Recommended courses Deep Unsupervised Learning (UC Berkeley CS294-158-SP2020)