https://telecombcn-dl.github.io/2019-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
3. 3
This lecture objective
● Understand the ImageNet task of image classification.
● Know the most popular CNN architectures for computer vision.
● Raise awareness of the importance of data for deep learning models.
7. 7
Convolutional Layer (Conv)
Figure Credit: Ranzatto
The goal is to estimate the parameters of
multiple convolutional filters.
100 Convolutional Filters
Filter size: 3x3
—————————-
900 parameters
The amount of parameters size does not
depend on input image size!
Finally, 900 (Conv) vs 10^12 (FC)
parameters
8. 8
Feature Maps
Each of these learned convolutional
filters detects a different pattern
(“feature’’).
The responses at different locations
from each convolutional filter
defines a feature maps.
Figure Credit: Ranzatto
9. 9
Conv Layer & Feature Maps
A convolutional layer is a module that transforms some feature maps to other
feature maps, which learn higher-abstract concepts.
Figure Credit: Ranzatto
10. 10
Conv Layer & Feature Maps
output feature mapfilter of depth=4
Notice that the amount of input feature maps defines the depth of the conv filters...
11. 11
Conv Layer & Feature Maps
output feature mapfilter of depth=4
Many feature
maps
Figure Credit: Ranzatto
...and the amount of convolutional features defines the amount of channels of the
output feature map.
12. 12
Pooling Layer
Figure Credit: Ranzatto
Pooling is a downsample operation
along the spatial dimensions (width,
height)
● It reduces progressively the
spatial size of the
representation, so it reduces the
computation greatly.
● Provides invariance to small
local changes
13. 13
Convolutional Neural Networks for Vision
LeNet-5: Several convolutional layers, combined with pooling layers, and followed by a
small number of fully connected layers
#LeNet-5 LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document
recognition. Proceedings of the IEEE, 86(11), 2278-2324.
15. 15
ImageNet Challenge
Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. "Imagenet
large scale visual recognition challenge." International Journal of Computer Vision 115, no. 3 (2015): 211-252. [web]
16. 16
ImageNet Challenge: 2012
Slide credit:
Rob Fergus (NYU)
-9.8%
Based on SIFT + Fisher Vectors
Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. "Imagenet
large scale visual recognition challenge." International Journal of Computer Vision 115, no. 3 (2015): 211-252. [web]
18. 18
Filters learned by Alexnet
Visualization of the 96 filters of size 11 x 11 learned by bottom layer
#AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep
convolutional neural networks." NIPS 2012
19. 19
Filters learned by Alexnet
#AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep
convolutional neural networks." NIPS 2012
First layers learn edges, textures, while deeper layers learn higher-abstract
concepts.
20. 20
ImageNet Challenge: 2013
ImageNet Classification 2013
Slide credit:
Rob Fergus (NYU)
Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. "Imagenet
large scale visual recognition challenge." International Journal of Computer Vision 115, no. 3 (2015): 211-252. [web]
21. 21
Zeiler-Fergus (ZF)
The development of better
convnets is reduced to
trial-and-error.
Visualization can help in
proposing better architectures.
Zeiler, M. D., & Fergus, R. . Visualizing and understanding convolutional networks. ECCV 2014
27. 27
GoogleNet (Inception)
22 layers !
#Inception Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with
30. 30
GoogleNet (Inception)
Two Softmax classifiers at intermediate layers combat the vanishing gradient
while providing regularization at training time.
...and no fully connected layers needed
(12 times fewer parameters than AlexNet. !)
31. 31
GoogleNet (Inception)
#Inception Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan,
Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]
32. 32
VGG
#VGG Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale
image recognition." ICLR 2015. [video] [slides] [project]
33. 33
VGG
#VGG Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale
image recognition." ICLR 2015. [video] [slides] [project]
34. 34
VGG: Stacked 3x3 convolutions
#VGG Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale
image recognition." ICLR 2015. [video] [slides] [project]
35. 35
VGG: Other details
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." ICLR 2015. [video] [slides] [project]
● No poolings between some convolutional layers.
● Convolution strides of 1 (no skipping).
37. 37#ResNet He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image
recognition." CVPR 2016. [slides]
38. 38
ResNet
Deeper networks (34 is deeper than 18) are more difficult to train.
#ResNet He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image
recognition." CVPR 2016. [slides]
39. 39
ResNet
Residual learning: reformulate the layers as learning residual functions with
reference to the layer inputs, instead of learning unreferenced functions
#ResNet He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image
recognition." CVPR 2016. [slides]
41. 41
Canziani, Alfredo, Adam Paszke, and Eugenio Culurciello. "An analysis of deep neural network models for
practical applications." arXiv preprint arXiv:1605.07678 (2016).
42. 42
Canziani, Alfredo, Adam Paszke, and Eugenio Culurciello. "An analysis of deep neural network models for
practical applications." arXiv preprint arXiv:1605.07678 (2016).
45. 45
Ensembles of Models (Hikivision)
● More than 20 models,
including VGG, Inception,
ResNet and variations of
it.
● Novel data
augmentation.
● Novel learning rate
policy.
● …and “some small tricks”
46. 46
The end of the challenge
Electronic Frontier Foundation: “Measuring the Progress of AI Research” (2017)
47. 47
ResNext = ResNet + Inception
#ResNext Xie, Saining, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. "Aggregated residual
transformations for deep neural networks." CVPR 2017 [code]
48. 48
ResNext = ResNet + Inception
#ResNext Xie, Saining, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. "Aggregated residual
transformations for deep neural networks." CVPR 2017 [code]
49. 49
DenseNet
Dense Block of 5-layers
with a growth rate of k=4
Connect every layer to every other layer of the same filter size.
#DenseNet Huang, Gao, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. "Densely
connected convolutional networks." CVPR 2017. [code]
50. 50
DenseNet
#DenseNet Huang, Gao, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. "Densely
connected convolutional networks." CVPR 2017. [code]
51. 51
Neural Architecture Search (NAS)
#AutoML Real, Esteban, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan,
Quoc V. Le, and Alexey Kurakin. "Large-scale evolution of image classifiers." ICML 2017. [blog]
52. 52
Neural Architecture Search (NAS)
#AutoML Real, Esteban, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan,
Quoc V. Le, and Alexey Kurakin. "Large-scale evolution of image classifiers." ICML 2017. [blog]
53. 53
Neural Architecture Search (NAS)
#NasNet Zoph, Barret, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. "Learning transferable
architectures for scalable image recognition." CVPR 2018.
54. 54
Neural Architecture Search (NAS)
#AdaNet Cortes, Corinna, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, and Scott Yang. "Adanet:
Adaptive structural learning of artificial neural networks." ICML 2017. [blog]