https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
2. 2
Acknowledgements
Santiago Pascual
santi.pascual@upc.edu
@santty128
PhD
Universitat Politecnica de Catalunya
Technical University of Catalonia
Albert Pumarola
apumarola@iri.upc.edu
@AlbertPumarola
PhD Student
Universitat Politécnica de Catalunya
Technical University of Catalonia
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
4. Today
4
● We are going to make a shift in the modeling paradigm:
discriminative → generative.
● Is this a type of neural network? No → either fully connected
networks, convolutional networks or recurrent networks fit,
depending on how we condition data points (sequentially,
globally, etc.).
● This is a very fun topic with which we can make a network
paint, sing or write.
6. What we are used to do with Neural Nets
6
Slide credit: Albert Pumarola (UPC 2019)
Classification Regression
Text Prob. of being a Potential Customer
Image
Audio Language Translation
Jim Carrey
What Language?
Discriminative Modeling
P(y|x)
7. 7
Slide credit: Albert Pumarola (UPC 2019)
What are generative models?
Classification Regression Generative
Text Prob. of being a Potential Customer
“What about Ron magic?” offered Ron.
To Harry, Ron was loud, slow and soft
bird. Harry did not like to think about
birds.
Image
Audio Language Translation
Music Composer and Interpreter
MuseNet Sample
Jim Carrey
What Language?
Discriminative Modeling
P(y|x)
Generative Modeling
P(x)
8. Slide credit: Albert Pumarola (UPC 2019)
What are generative models?
Learn
Sample Out
Training Dataset
Generated Samples
Modeled Data Space
“Model the data distribution so that we can sample new samples out of the
distribution”
9. Slide credit: Albert Pumarola (UPC 2019)
What are generative models?
“Model the data distribution so that we can sample new samples out of the
distribution”
Learn
Training Dataset
Interpolated Samples
Modeled Data Space
10. 10
1) We want our model with parameters θ={weights, biases} to output samples
distributed Pmodel, matching the distribution of our training data Pdata or
P(X).
2) We can sample points from Pmodel plausibly looking Pdata distributed.
What is a generative model?
We have not mentioned any network structure
or model input data format, just a requirement
on what we want in the output of our model.
11. 11
We can generate any type of data, like speech waveform samples or image
pixels.
M samples = M dimensions
32
32
.
.
.
Scan and unroll
x3 channels
M = 32x32x3 = 3072 dimensions
What is a generative model?
12. Our learned model should be able to make up new samples from the
distribution, not just copy and paste existing samples!
12
Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
What is a generative model?
14. Super-resolution
14#TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N. (2020). Learning temporal coherence via self-supervision
for GAN-based video generation. ACM Transactions on Graphics (TOG), 39(4), 75-1.
15. Image translation
15#pix2pixHD Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J., & Catanzaro, B. (2018). High-resolution image synthesis
and semantic manipulation with conditional gans. CVPR 2018.
16. Image generation
16#StyleGAN Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial
networks. CVPR 2019.
17. Image generation
17#StarGANv2 Choi, Y., Uh, Y., Yoo, J., & Ha, J. W. (2020). Stargan v2: Diverse image synthesis for multiple domains. CVPR
2020. [code]
18. Human Motion Transfer
18
#EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. (2019). Everybody dance now. ICCV 2019.
19. Speech Enhancement
19
Recover lost information/add enhancing details by learning the natural distribution of audio
samples.
original
enhanced
20. Refinement of Synthetic Training Data
20
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. Learning from simulated and unsupervised
images through adversarial training. CVPR 2017 [Best paper award]
Refiner = Generator
28. 28
Generator & Discriminator
We have two modules: Generator (G) and Discriminator (D).
● They “fight” against each other during training→ Adversarial Learning
● G mission: Fool D to missclassify.
● D mission: Discriminate between G samples and real samples.
29. 29
Generator network G → generate samples ẍ that follow a probability distribution Pg
, and make it close
to Pdata
(our dataset samples distribution).
Q: What do we input to our model in order to make up new samples through a neural network?
...
...
...
...
ẍ?
Generated Samples
Generator & Discriminator
30. 30
Generator network G → generate samples ẍ that follow a probability distribution Pg
, and make it close
to Pdata
(our dataset samples distribution).
Q: What do we input to our model in order to make up new samples through a neural network?
A: We can sample from a simplistic prior, for example, a Gaussian distribution N(0, I).
...
...
...
...
ẍz
Generated Samples
Generator & Discriminator
31. 31
A: With another network trained to do so.
Discriminator network D → probability that a sample came from the training data rather than G.
...
...
...
...
cx
Real cat?
Yes or No
Generator & Discriminator
Q: How can we evaluate the quality of the generated samples?
33. 33
Generator
Example: DCGAN for image generation
#DCGAN Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative
adversarial networks. ICLR 2016.
34. Outline
1. Generative Models
2. Motivating Applications
3. Generative Adversarial Networks (GANs)
a. Generator & Discriminator Networks
b. Adversarial Training
35. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE: It’s
not even
green
Adversarial Training Analogy: is it fake money?
36. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE:
There is no
watermark
Adversarial Training Analogy: is it fake money?
37. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE:
Watermark
should be
rounded
Adversarial Training Analogy: is it fake money?
38. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to
detect whether money is real or fake.
After enough iterations, and if the counterfeiter is good enough (in terms of G network it
means “has enough parameters”), the police should be confused.
REAL?
FAKE?
Adversarial Training Analogy: is it fake money?
40. Adversarial Training: Discriminator
Generator
Real world
images
Discriminator
Real
Loss
Latentrandomvariable
Sample
Sample
Fake
1. Fix generator weights, draw samples from both real world and generated images
2. Train discriminator to distinguish between real world and generated images
Backprop error to
update discriminator
weights
Figure: Kevin McGuinness
41. Adversarial Training: Generator
1. Fix discriminator weights
2. Sample from generator
3. Backprop error through discriminator to update generator weights
Generator
Real world
images
Discriminator
Real
Loss
Latentrandomvariable
Sample
Sample
Fake
Backprop error to
update generator
weights
Figure: Kevin McGuinness
43. Adversarial Training: Why does it work ?
#SRGAN Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single
image super-resolution using a generative adversarial network. CVPR 2017.
44. Adversarial Training: How to make it work ?
Soumith Chintala, “How to train a GAN ? Tips and tricks to make GAN work”. Github 2016.
NeurIPS Barcelona 2016
45. Outline
1. Generative Models
2. Motivating Applications
3. Generative Adversarial Networks (GANs)
a. Generator & Discriminator Networks
b. Adversarial Training
c. Conditional GANs
d. GANs at UPC
50. Conditional GANs (cGAN)
#pix2pix Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial
networks." CVPR 2017.
51. Conditional GANs (cGAN)
#pix2pix Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial
networks." CVPR 2017.
Generator
Discriminator
Generated
pairs
Real World
Ground truth
pairs
Loss
Figure: Kevin McGuinness
52. Outline
1. Generative Models
2. Motivating Applications
3. Generative Adversarial Networks (GANs)
a. Generator & Discriminator Networks
b. Adversarial Training
c. Conditional GANs
d. GANs at UPC
53. GANs @ UPC: Pioneers
“How to train a GAN” tutorial, NeurIPS Barcelona 2016.
Víctor
Garcia
Santi
Pascual
Junting
Pan
55. GANs @ UPC: Speech denoising
#SEGAN Pascual, Santiago, Antonio Bonafonte, and Joan Serra. "SEGAN: Speech enhancement generative
adversarial network." Interspeech 2017.
56. #SalGAN Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and
Xavier Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” CVPRW
2017.
Generator Discriminator
GANs @ UPC: Visual Saliency prediction
57. GANs @ UPC: Visual Saliency prediction
#PathGAN Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “PathGan: Visual
Scan-path Prediction with Generative Adversarial Networks“ ECCV EPIC Workshop 2018.
58. GANs @ UPC: Face Hallucination from Speech
#Wav2Pix Amanda Duarte, Francisco Roldan, Miquel Tubau, Janna Escur, Santiago Pascual, Amaia
Salvador, Eva Mohedano et al. “Wav2Pix: Speech-conditioned Face Generation using Generative
Adversarial Networks” ICASSP 2019
Generated faces condition by spoken utterances of known YouTubers.
59. GANs @ UPC: Human Motion Transfer
Ventura L., Duarte, A., Giró-i-Nieto, X. “Can everybody sign now ? Exploring sign language video
generation from 2D poses”. .ECCV Workshop (2020)
60. GANs @ UPC: GANimation
#GANimation Pumarola, A., Agudo, A., Martinez, A. M., Sanfeliu, A., & Moreno-Noguer, F. Ganimation:
Anatomically-aware facial animation from a single image. ECCV 2018. (Best paper honourable mention)
61. GANs @ UPC: GANimation
#GANimation Pumarola, A., Agudo, A., Martinez, A. M., Sanfeliu, A., & Moreno-Noguer, F. Ganimation:
Anatomically-aware facial animation from a single image. ECCV 2018. (Best paper honourable mention)
62. GANs @ UPC: Cloth Transfer
Pumarola, A., Goswami, V., Vicente, F., De la Torre, F., & Moreno-Noguer, F. (2019). Unsupervised image-to-video
clothing transfer. ICCV Workshops 2019.
63. Outline
1. Generative Models
2. Motivating Applications
3. Generative Adversarial Networks (GANs)
a. Generator & Discriminator Networks
b. Adversarial Training
c. Conditional GANs
d. GANs at UPC
64. Learn more
Victòria Miró / Oriol Esteve, “La meitat de les notícies que consumirem el 2022
seran falses”. TN Vespre, TV3 (26 de Novembre, 2017)