SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
CNN Popular Architectures
and Transfer Learning
Palacode Narayana Iyer Anantharaman
16th Oct 2018
Motivation : Why study these?
Understanding popular architectures help us in many ways:
• We develop a better understanding of the subject by studying high performant
architectures
• This helps perform our own research on newer models
• Learning their design philosophy help us to design our models more effectively.
• Use them as a backbone for transfer learning, selecting the right architecture
References
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/
Current trend: Deeper Models work better
• CNNs consistently outperform other
approaches for the core tasks of CV
• Deeper models work better
• Increasing the number of parameters in layers
of CNN without increasing their depth is not
effective at increasing test set performance.
• Shallow models overfit at around 20 million
parameters while deep ones can benefit from
having over 60 million.
• Key insight: Model performs better when it is
architected to reflect composition of simpler
functions than a single complex function. This
may also be explained off viewing the
computation as a chain of dependencies
VGG Net
VGG net
ResNet
Recent Results (Credits: CS231n Stanford)
Resnet Motivation
ResNet Approach
• Generally: Deeper the better - See the fig
• Issue: Hard to train deeper networks effectively :
Validation error goes up not just due to
overfitting but increase in training error
• Solution: Use skip connections to propagate the
activations to reduce the impact
• Rationale: Imagine how to get an identity output
through a deep network accurately.
ResNet Hypothesis : Rationale
• Deeper models should perform at least as well as shallower models
• More parameters, more degrees of freedom to get the training error down
• More depth, more ability to model abstractions
• Solution by construction is copying the learned layers of a shallower
model and setting the additional layers to identity mapping
ResNet
• Very deep network: Uses 152 layers
• Shallower versions (e.g. Resnet50) are
available
• The “go to” backbone network for many
applications such as Faster RCNN
• Pre trained weights are available for
Keras, TensorFlow
ResNet Details
• Default input size: (224, 224, 3)
• Each stage reduces the width, height dimensions by a factor of 2
• This property is leveraged in later implementations such as pyramidal networks
ImageNet Results summary table
GoogleNet
• Design choices of filter sizes: (3, 3), (5, 5) and so on – Which to choose for each
convolution layer?
• Why not try all of them and choose the best?
• Trying each possible value on every layer and experimenting manually is not a solution
• Inception layer allows multiple filter sizes and learn their contributions through
parameters automatically
Inception Layer Naïve Architecture
Inception Layer Naïve Architecture (Fig Udacity Deep Learning)
Inception Architecture with bottleneck layer
Bottleneck Layer
Example
• Consider an input volume (28, 28, 192) and an output volume (28, 28, 32)
• How many computations are needed if we use a 5 x 5 filter?
• Each filter will be 5 x 5 x 192, we will be moving this over a 28 x 28 surface and
we have 32 of them
• 28 x 28 x 32 x 5 x 5 x 192 = 120M
• If we need multiple such filters, we need to add up corresponding computations
for each of them
• On a very deep network these many computations are prohibitively large even
when we use powerful hardware
• By reducing the dimensionality of the input before final convolutions, we get a
manageable number of computations
Example with bottleneck layer
• In our example, we can transform the 28 x 28 x 192 in to same sized surface but
much reduced depth (say 16) using 1 x 1 convolutions
• The bottleneck layer has the shape (28 x 28 x 16)
• Perform the required convolutions (e.g. 5 x 5) on the bottleneck layer to generate
the final output volume
• Computations: #computations between input to bottleneck layer +
#computations between bottleneck to output. In our example this is 12M
GoogleNet architecture with Inception layer
State of the art : SENet
• ImageNet 2017 topper in multiple categories
• A novel technique to weight the contributions of the channels of a
convolutional layer
SeNet : Squeeze and Excitation Network
• SeNet is the winning architecture of ImageNet 2017 in multiple categories
• Error rate on image classification: 2.251%
• Key Idea:
• In authors’ words: “Improve the representational power of the network by explicitly
modelling interdependencies between channels of its convolutional features”
• Simple explanation: Add parameters to each channel of a convolutional block so that network
can adaptively adjust the weighting of each feature map
SeNet Rationale
• Deep CNN’s learn increasing levels of abstractions from lower to higher layers. Lower
layers have higher resolution and can extract basic elements of information
• Higher layers can detect faces or generate text etc and deal with abstract information
• All of this works by fusing the spatial and channel information of an image.
• The network weights each of its channels equally when creating the output feature maps.
• SENets change this by adding a content aware mechanism to weight each channel
adaptively. In it’s most basic form this could mean adding a single parameter to each
channel and giving it a linear scalar how relevant each one is.
SENet Architecture
• Get a global understanding of each channel by squeezing the feature maps to a
single numeric value. This results in a vector of size n, where n is equal to the
number of convolutional channels.
• Afterwards, it is fed through a two-layer neural network, which outputs a vector
of the same size. These n values can now be used as weights on the original
features maps, scaling each channel based on its importance.
Code Illustration of the key idea
Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7
Credits: Paul-Louis Prove
High level steps
Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7
Credits: Paul-Louis Prove
Adding squeeze excitation technique to ResNet

Más contenido relacionado

La actualidad más candente

MNIST and machine learning - presentation
MNIST and machine learning - presentationMNIST and machine learning - presentation
MNIST and machine learning - presentation
Steve Dias da Cruz
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
NoorUlHaq47
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 

La actualidad más candente (20)

Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
 
Resnet
ResnetResnet
Resnet
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
MNIST and machine learning - presentation
MNIST and machine learning - presentationMNIST and machine learning - presentation
MNIST and machine learning - presentation
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORK
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Deep learning
Deep learningDeep learning
Deep learning
 
Digit recognition
Digit recognitionDigit recognition
Digit recognition
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 

Similar a Convolutional Neural Networks : Popular Architectures

PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
milad abbasi
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa461
 

Similar a Convolutional Neural Networks : Popular Architectures (20)

PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
TensorFlow.pptx
TensorFlow.pptxTensorFlow.pptx
TensorFlow.pptx
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 

Más de ananth

Más de ananth (20)

Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligence
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distance
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 

Último

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Último (20)

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

Convolutional Neural Networks : Popular Architectures

  • 1. CNN Popular Architectures and Transfer Learning Palacode Narayana Iyer Anantharaman 16th Oct 2018
  • 2. Motivation : Why study these? Understanding popular architectures help us in many ways: • We develop a better understanding of the subject by studying high performant architectures • This helps perform our own research on newer models • Learning their design philosophy help us to design our models more effectively. • Use them as a backbone for transfer learning, selecting the right architecture
  • 4. Current trend: Deeper Models work better • CNNs consistently outperform other approaches for the core tasks of CV • Deeper models work better • Increasing the number of parameters in layers of CNN without increasing their depth is not effective at increasing test set performance. • Shallow models overfit at around 20 million parameters while deep ones can benefit from having over 60 million. • Key insight: Model performs better when it is architected to reflect composition of simpler functions than a single complex function. This may also be explained off viewing the computation as a chain of dependencies
  • 5.
  • 9. Recent Results (Credits: CS231n Stanford)
  • 11. ResNet Approach • Generally: Deeper the better - See the fig • Issue: Hard to train deeper networks effectively : Validation error goes up not just due to overfitting but increase in training error • Solution: Use skip connections to propagate the activations to reduce the impact • Rationale: Imagine how to get an identity output through a deep network accurately.
  • 12. ResNet Hypothesis : Rationale • Deeper models should perform at least as well as shallower models • More parameters, more degrees of freedom to get the training error down • More depth, more ability to model abstractions • Solution by construction is copying the learned layers of a shallower model and setting the additional layers to identity mapping
  • 13. ResNet • Very deep network: Uses 152 layers • Shallower versions (e.g. Resnet50) are available • The “go to” backbone network for many applications such as Faster RCNN • Pre trained weights are available for Keras, TensorFlow
  • 14. ResNet Details • Default input size: (224, 224, 3) • Each stage reduces the width, height dimensions by a factor of 2 • This property is leveraged in later implementations such as pyramidal networks
  • 16. GoogleNet • Design choices of filter sizes: (3, 3), (5, 5) and so on – Which to choose for each convolution layer? • Why not try all of them and choose the best? • Trying each possible value on every layer and experimenting manually is not a solution • Inception layer allows multiple filter sizes and learn their contributions through parameters automatically
  • 17. Inception Layer Naïve Architecture
  • 18. Inception Layer Naïve Architecture (Fig Udacity Deep Learning)
  • 19. Inception Architecture with bottleneck layer
  • 21. Example • Consider an input volume (28, 28, 192) and an output volume (28, 28, 32) • How many computations are needed if we use a 5 x 5 filter? • Each filter will be 5 x 5 x 192, we will be moving this over a 28 x 28 surface and we have 32 of them • 28 x 28 x 32 x 5 x 5 x 192 = 120M • If we need multiple such filters, we need to add up corresponding computations for each of them • On a very deep network these many computations are prohibitively large even when we use powerful hardware • By reducing the dimensionality of the input before final convolutions, we get a manageable number of computations
  • 22. Example with bottleneck layer • In our example, we can transform the 28 x 28 x 192 in to same sized surface but much reduced depth (say 16) using 1 x 1 convolutions • The bottleneck layer has the shape (28 x 28 x 16) • Perform the required convolutions (e.g. 5 x 5) on the bottleneck layer to generate the final output volume • Computations: #computations between input to bottleneck layer + #computations between bottleneck to output. In our example this is 12M
  • 23. GoogleNet architecture with Inception layer
  • 24. State of the art : SENet • ImageNet 2017 topper in multiple categories • A novel technique to weight the contributions of the channels of a convolutional layer
  • 25. SeNet : Squeeze and Excitation Network • SeNet is the winning architecture of ImageNet 2017 in multiple categories • Error rate on image classification: 2.251% • Key Idea: • In authors’ words: “Improve the representational power of the network by explicitly modelling interdependencies between channels of its convolutional features” • Simple explanation: Add parameters to each channel of a convolutional block so that network can adaptively adjust the weighting of each feature map
  • 26. SeNet Rationale • Deep CNN’s learn increasing levels of abstractions from lower to higher layers. Lower layers have higher resolution and can extract basic elements of information • Higher layers can detect faces or generate text etc and deal with abstract information • All of this works by fusing the spatial and channel information of an image. • The network weights each of its channels equally when creating the output feature maps. • SENets change this by adding a content aware mechanism to weight each channel adaptively. In it’s most basic form this could mean adding a single parameter to each channel and giving it a linear scalar how relevant each one is.
  • 27. SENet Architecture • Get a global understanding of each channel by squeezing the feature maps to a single numeric value. This results in a vector of size n, where n is equal to the number of convolutional channels. • Afterwards, it is fed through a two-layer neural network, which outputs a vector of the same size. These n values can now be used as weights on the original features maps, scaling each channel based on its importance.
  • 28. Code Illustration of the key idea Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7 Credits: Paul-Louis Prove
  • 29. High level steps Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7 Credits: Paul-Louis Prove
  • 30. Adding squeeze excitation technique to ResNet