3. LeNet: Hello World!
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
Gradient-based learning applied to document recognition, Proc. IEEE 86(11):
2278–2324, 1998.
C(5x5)-P(2x2) pair repeat
Average pooling
Sigmoid or tanh activation function
4. ILSVRC
ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
More than 1.2 Million Images 1000 classes
Impressive new CNN structures from ILSVRC
www.image-
net.org/challenges/LSVRC/
5. AlexNet: ILSVRC 2012 winner
C(11x11)P-C(5x5)P-C(3x3)-C(3x3)-C(3x3)P
Max pooling
Relu activation function
8 layers
A. Krizhevsky, I. Sutskever, and G. Hinton,
ImageNet Classification with Deep Convolutional Neural Networks, NIPS
2012
6. VGGNet: ILSVRC 2014 2nd
All convolutional layer kernels are of size 3x3
MaxPooling of size 2x2 is done after 2 or 3 layers of convolutions
Pooling stride is 2
Stacking building blocks of the same shape
K. Simonyan and A. Zisserman,
Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR
2015
7. GoogleNet: ILSVRC 2014 Winner
Let the network choose the kernel size itself
Pointwise convolution (1x1 convolution) reduce parameters
22 layers
C. Szegedy et al.,
Going deeper with convolutions, CVPR 2015
8. ResNet: ILSVRC 2015 Winner
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,
Deep Residual Learning for Image Recognition, CVPR 2016 (Best Paper)
ResNet: 152
layers
9. ResNet: ILSVRC 2015 Winner
Introduce skip connections
Pointwise convolution reduce and restore feature maps
152 layers, top-5 error rate 3.57% vs. 5.1% of human expert
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,
Deep Residual Learning for Image Recognition, CVPR 2016 (Best Paper)
Directly performing 3x3 convolutions:
Parameters: 256x256x3x3 ~ 600K
Residual module structure:
Parameters:
64x256x1x1 ~ 16K
64x64x3x3 ~ 36K
256x64x1x1 ~ 16K
Total ~70K
10. ResNet: ILSVRC 2015 Winner
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,
Deep Residual Learning for Image Recognition, CVPR 2016 (Best Paper)
Problem:
with the network depth increasing, accuracy gets saturated (which might be
unsurprising) and then degrades rapidly.
Deeper network is not easy to optimize.
Cause:
In some cases some neuron can “die”(output zero) in the training and
become ineffective/useless. This can cause information loss, sometimes
very important information.
Solution:
Skip connections carry important information in the previous layer to the
next layers.
11. Xception: Depthwise Separable Convolutions
François Chollet
Xception: Deep Learning with Depthwise Separable Convolution (2017 Apr)
Important Hypothesis:
The mapping of cross-channels correlations and spatial correlations in the
feature maps of convolutional neural networks can be entirely decoupled.
output
input
12. Xception: Depthwise Separable Convolutions
François Chollet
Xception: Deep Learning with Depthwise Separable Convolution (2017 Apr)
13. ResNeXt: Group Convolutions ILSVRC 2016 2nd
Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, Kaiming He
Aggregated Residual Transformations for Deep Neural Networks (2017 Apr)
Introduce group convolution to the ResNet unit, thus introduce
a new dimension “cardinality” (the number of groups) to
ResNet.
14. ResNeXt: Group Convolutions ILSVRC 2016 2nd
More clear case
Group convolution reduce the complexity compared to the similar ResNet
structure. Gain better performance at the same complexity
Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, Kaiming He
Aggregated Residual Transformations for Deep Neural Networks (2017 Apr)
15. ShuffleNeXt: pointwise group conv+channel shuffle
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin and Jian Sun
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
Channel shuffle: help information flow across feature maps
(B, g x n, H, W) – reshape(B, g, n, H, W) – transpose(B, n, g, H, W) –
reshape(B, g, n, H, W)
16. ShuffleNeXt: pointwise group conv+channel shuffle
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin and Jian Sun
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
Pointwise group convolution:
Reduce complexity allowing more feature maps, especially important to small
networks