SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Designing Network Design Spaces
Ilija Radosavovic, et al., “Designing Network Design Spaces”
3rd May, 2020
PR12 Paper Review
JinWon Lee
Samsung Electronics
Designing Network Design Spaces
Introduction
• Over the past several years better architectures have resulted in
considerable progress in a wide range of visual recognition tasks.
 Ex)VGG, ResNet, MobileNet, EfficientNet, etc.
• While manual network design has led to large advances, finding well-
optimized networks manually can be challenging, especially as the
number of design choices increases.
• A popular approach to address this limitation is neural architecture
search (NAS).
• However, it does not enable discovery of network design principles
that deepen our understanding and allow us to generalize to new
settings.
Introduction
• In this work, the authors present a new network design paradigm
that combines the advantages of manual design and NAS.
• Instead of focusing on designing individual network instances, they
design design spaces that parametrize populations of networks.
Exploring RandomlyWired Neural Networks for
Image Recognition(PR-155)
• Design a Network Generator not an
Individual Network!
Introduction
• The authors start with a relatively unconstrained design space we call
AnyNet and apply human-in- the-loop methodology to arrive at a
low-dimensional design space consisting of simple “regular”
networks, RegNet.
• RegNet design space generalizes to various compute regimes,
schedule lengths and network block types.
• They analyze the RegNet design space and arrive at interesting
findings that do not match the current practice of network design.
Tools for Design Space Design
• Rather than designing or searching for a single best model under
specific settings, the authors study the behavior of populations of
models.
• They rely on the concept of network design spaces introduced by
Radosavovic et al., “On network design spaces for visual
recognition.”, ICCV2019.
• Core idea of the paper is that we can quantify the quality of a design
space by sampling a set of models from that design space and
characterizing the resulting model error distribution.
Tools for Design Space Design
• To obtain a distribution of models, sample and train n models from a
design space.
• A primary tool for analyzing design space quality is the error
empirical distribution function (EDF).The error EDF of n models with
errors 𝑒𝑖 is given by:
𝐹 𝑒 =
1
𝑛
෍
𝑖=1
𝑛
1[𝑒𝑖 < 𝑒]
• F(e) gives the fraction of models with
error less than 𝑒.
Tools for Design Space Design
• Given a population of trained models, we can plot and analyze
various network properties versus network error.
• For these plots, an empirical bootstrap is applied to estimate the
likely range in which the best models fall.
The blue shaded regions are ranges containing the best models with 95% confidence, and the black vertical line
the most likely best value.
Tools for Design Space Design
• To summarize:
1. generate distributions of models obtained by sampling and
training n models from a design space.
2. compute and plot error EDFs to summarize design space quality.
3. visualize various properties of a design space and use an
empirical bootstrap to gain insight.
4. use these insights to refine the design space.
The AnyNet Design Space
• Given an input image, a network consists of a simple stem, followed by the
network body that performs the bulk of the computation, and a final network
head that predicts the output classes.
• Keep the stem and head fixed and as simple as possible, and instead focus on
the structure of the network body.
• The network body consists of 4 stages operating at progressively reduced
resolution, each stage consists of a sequence of identical blocks.
AnyNetX
• Most of our experiments use the standard residual bottlenecks block
with group convolution.They refer to this as the X block, and the
AnyNet design space built on it as AnyNetX.
AnyNetX
• The AnyNetX design space has 16 degrees of freedom as each
network consists of 4 stages and each stage 𝑖 has 4 parameters: the
number of blocks 𝑑𝑖, block width 𝑤𝑖, bottleneck ratio 𝑏𝑖, and group
width 𝑔𝑖.
• Resolution 𝑟 = 224 (fixed)
• To obtain valid models, we perform log-uniform sampling of 𝑑𝑖 ≤ 16,
𝑤𝑖 ≤ 1024 and divisible by 8, 𝑏𝑖 ∈ {1, 2, 4}, and 𝑔𝑖 ∈ {1, 2, … , 32}.
• There are (16 ∙ 128 ∙ 3 ∙ 6)4≈ 1018possible model configurations in
the AnyNetX design space.
Design Space Design Aims
1. To simplify the structure of the design.
2. To improve the interpretability of the design space.
3. To improve or maintain the design space quality.
4. To maintain model diversity in the design space.
AnyNetX(A, B, C)
• Refer to unconstrained AnyNet design space as AnyNetXA.
• Shared bottleneck ratio 𝑏𝑖 = 𝑏 for all stage i for the AnyNetXA  AynNetXB.
• Shared group width 𝑔𝑖 = 𝑔 for all stage i for the AnyNetXB  AnyNetXC.
AnyNetX(D, E)
• AnyNetXD is from examining typical network structures of both good
and bad networks from AnyNetXC.
 A pattern emerges: good network have increasing widths.
• AnyNetXD constraint: AnyNetXC & 𝑤𝑖+1 ≥ 𝑤𝑖.
• In addition to stage widths 𝑤𝑖 increasing with i, the stage depths 𝑑𝑖
likewise tend to increase for the best models
• AnyNetXE constraint: AnyNetXD & 𝑑𝑖+1 ≥ 𝑑𝑖.
• Finally, constraints on 𝑤𝑖 and 𝑑𝑖 each reduce the design space by 4!,
with a cumulative reduction of O(107) from AnyNetXA.
AnyNetX(D, E)
Linear Fits
• To gain further insight into the model structure, the best 20 models
from AnyNetXE are showed in a single plot.
• While there is significant variance in the individual models (gray
curves), in the aggregate a pattern emerges.
• In particular, in the same plot we show the line 𝑤𝑗 = 48 · (𝑗 + 1) for
0 ≤ 𝑗 ≤ 20
Linear Fits
• Inspired of AnyNetXD and AnyNetXE, a linear parameterization of
block widths is as follow:
𝑢𝑗 = 𝑤0 + 𝑤 𝑎 ⋅ 𝑗 for 0 ≤ 𝑗 < 𝑑, 𝑤0 > 0, 𝑤 𝑎 > 0
• To quantize 𝑢𝑗, 𝑤 𝑚 is introduced as an additional parameter
𝑢𝑗 = 𝑤0 ⋅ 𝑤 𝑚
𝑠 𝑗
• Then, to quantize 𝑢𝑗, simply rounding 𝑠𝑗 and compute quantized per-
block width 𝑤𝑗 via:
𝑤𝑗 = 𝑤0 ⋅ 𝑤 𝑚
‫ہ‬ 𝑠 ‫ۀ‬𝑗
• Converting the per-block 𝑤𝑗 to per-stage format 𝑤𝑖:
𝑤𝑖 = 𝑤0 ⋅ 𝑤 𝑚
𝑖
𝑑𝑖 = ෍
𝑗
1 ‫ہ‬ 𝑠 ‫ۀ‬𝑗 = 1
Linear Fits
efit is a mean log-ratio
The RegNet Design Space
• The design space of RegNet contains only simple, regular models.
 𝑑 < 64
 𝑤0, 𝑤 𝑎 < 256
 1.5 ≤ 𝑤 𝑚 ≤ 3
 𝑏 𝑎𝑛𝑑 𝑔 are same as AnyNet
• 𝑤 𝑚 = 2 𝑎𝑛𝑑 𝑤0 = 𝑤 𝑎 make good performance, but to maintain
the diversity of models they are not applied to RegNet design space.
Design Space Summary
Design Space Generalization
Design Space Generalization
Common Design Patterns
• The deeper the model, the better the performance.
• Double the number of channels whenever the spatial activation size
is reduced.
• Skip connection is good.
• Bottleneck is good.
• Depthwise separable convolution is popular for low compute regime.
• Inverted bottleneck is also good.
RegNetTrends
• The depth of best models is stable across regimes, with an optimal
depth of ~20 blocks(60 layers).
• This is in contrast to the common practice of using deeper models for
higher flop regimes.
RegNetTrends
• The best models use a bottleneck ratio 𝑏 of 1.0, which effectively
removes the bottleneck.
• The width multiplier 𝑤 𝑚 of good models is ~2.5, similar but not
identical to the popular recipe of doubling widths across stages.
RegNetTrends
• The remaining parameters(𝑔, 𝑤 𝑎, 𝑤0) increase with complexity
Complexity Analysis
• While not a common measure of network complexity, activations can
heavily affect runtime on memory-bound hardware accelerators.
• Activations increase with the square-root of flops, parameters
increase linearly.
RegNetX Constrained
• Using these findings, RegNetX design space is refined – RegNetX C
 𝑏 = 1, 𝑑 ≤ 40, and 𝑤 𝑚 ≥ 2
 Limited parameters and activations following complexity analysis
 Further depth limit: 12 ≤ 𝑑 ≤ 28
Alternate Design Choices
• Inverted bottleneck(𝑏 < 1) degrades the EDF slightly and depthwise
conv performs even worse relative to 𝑏 = 1 and 𝑔 ≥ 1.
• For RegNetX, a fixed resolution of 224x224 is best, even at higher flops.
• Squeeze-and-Excitation(SE) op yields good gains – RegNetY
Comparison to Existing Networks
Comparison to Existing Networks
Comparison to Existing Networks
Comparison to Existing Networks
Comparison to Existing Networks
• The higher flop models have a large number of blocks in the third
stage and a small number of blocks in the last stage.
• The group width 𝑔 increases with complexity, but depth 𝑑 saturates
for large models.
State of the Art Comparison: Mobile Regime
RegNeXt
Comparison
EfficientNet
Comparison
At low flops, EfficientNet outperforms the
RegNetY. At intermediate flops, RegNetY
outperforms EfficientNet, and at higher
flops both RegNetX and RegNetY perform
better.
Test Set Evaluation
Additional Ablations
• Fixed Depth
 Surprisingly, fixed-depth networks can match the performance of variable depth networks
for all flop regimes.
• Fewer Stages
 Top RegNet models at high flops have few blocks in the fourth stage but, 3 stage networks
perform considerably worse.
• Inverted Bottleneck
 In a high-compute regime, b < 1 degrades results further.
Additional Ablations
• Swish vs ReLU
 Swish outperforms ReLU at low flops, but ReLU is better at high flops.
 Interestingly, if g is restricted to be 1(depthwise conv), Swish performs much
better than ReLU.
Optimization Settings
• Initial learning rate and weight decay are stable across complexity regimes.
RegNet
EfficientNet

Más contenido relacionado

La actualidad más candente

Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
 

La actualidad más candente (20)

Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
 
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
Notes of AI for everyone - by Andrew Ng
Notes of AI for everyone - by Andrew NgNotes of AI for everyone - by Andrew Ng
Notes of AI for everyone - by Andrew Ng
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
 
Yapay Sinir Aglari
Yapay Sinir AglariYapay Sinir Aglari
Yapay Sinir Aglari
 
Adversarial training Basics
Adversarial training BasicsAdversarial training Basics
Adversarial training Basics
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNetIntroduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 

Similar a PR243: Designing Network Design Spaces

ConvNeXt.pptx
ConvNeXt.pptxConvNeXt.pptx
ConvNeXt.pptx
YanhuaSi
 

Similar a PR243: Designing Network Design Spaces (20)

Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
ResNet.pptx
ResNet.pptxResNet.pptx
ResNet.pptx
 
ResNet.pptx
ResNet.pptxResNet.pptx
ResNet.pptx
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit Recognition
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
 
ConvNeXt.pptx
ConvNeXt.pptxConvNeXt.pptx
ConvNeXt.pptx
 
Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image Recognition
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 

Más de Jinwon Lee

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
Jinwon Lee
 

Más de Jinwon Lee (20)

PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
PVANet - PR033
PVANet - PR033PVANet - PR033
PVANet - PR033
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

PR243: Designing Network Design Spaces

  • 1. Designing Network Design Spaces Ilija Radosavovic, et al., “Designing Network Design Spaces” 3rd May, 2020 PR12 Paper Review JinWon Lee Samsung Electronics
  • 3. Introduction • Over the past several years better architectures have resulted in considerable progress in a wide range of visual recognition tasks.  Ex)VGG, ResNet, MobileNet, EfficientNet, etc. • While manual network design has led to large advances, finding well- optimized networks manually can be challenging, especially as the number of design choices increases. • A popular approach to address this limitation is neural architecture search (NAS). • However, it does not enable discovery of network design principles that deepen our understanding and allow us to generalize to new settings.
  • 4. Introduction • In this work, the authors present a new network design paradigm that combines the advantages of manual design and NAS. • Instead of focusing on designing individual network instances, they design design spaces that parametrize populations of networks.
  • 5. Exploring RandomlyWired Neural Networks for Image Recognition(PR-155) • Design a Network Generator not an Individual Network!
  • 6. Introduction • The authors start with a relatively unconstrained design space we call AnyNet and apply human-in- the-loop methodology to arrive at a low-dimensional design space consisting of simple “regular” networks, RegNet. • RegNet design space generalizes to various compute regimes, schedule lengths and network block types. • They analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design.
  • 7. Tools for Design Space Design • Rather than designing or searching for a single best model under specific settings, the authors study the behavior of populations of models. • They rely on the concept of network design spaces introduced by Radosavovic et al., “On network design spaces for visual recognition.”, ICCV2019. • Core idea of the paper is that we can quantify the quality of a design space by sampling a set of models from that design space and characterizing the resulting model error distribution.
  • 8. Tools for Design Space Design • To obtain a distribution of models, sample and train n models from a design space. • A primary tool for analyzing design space quality is the error empirical distribution function (EDF).The error EDF of n models with errors 𝑒𝑖 is given by: 𝐹 𝑒 = 1 𝑛 ෍ 𝑖=1 𝑛 1[𝑒𝑖 < 𝑒] • F(e) gives the fraction of models with error less than 𝑒.
  • 9. Tools for Design Space Design • Given a population of trained models, we can plot and analyze various network properties versus network error. • For these plots, an empirical bootstrap is applied to estimate the likely range in which the best models fall. The blue shaded regions are ranges containing the best models with 95% confidence, and the black vertical line the most likely best value.
  • 10. Tools for Design Space Design • To summarize: 1. generate distributions of models obtained by sampling and training n models from a design space. 2. compute and plot error EDFs to summarize design space quality. 3. visualize various properties of a design space and use an empirical bootstrap to gain insight. 4. use these insights to refine the design space.
  • 11. The AnyNet Design Space • Given an input image, a network consists of a simple stem, followed by the network body that performs the bulk of the computation, and a final network head that predicts the output classes. • Keep the stem and head fixed and as simple as possible, and instead focus on the structure of the network body. • The network body consists of 4 stages operating at progressively reduced resolution, each stage consists of a sequence of identical blocks.
  • 12. AnyNetX • Most of our experiments use the standard residual bottlenecks block with group convolution.They refer to this as the X block, and the AnyNet design space built on it as AnyNetX.
  • 13. AnyNetX • The AnyNetX design space has 16 degrees of freedom as each network consists of 4 stages and each stage 𝑖 has 4 parameters: the number of blocks 𝑑𝑖, block width 𝑤𝑖, bottleneck ratio 𝑏𝑖, and group width 𝑔𝑖. • Resolution 𝑟 = 224 (fixed) • To obtain valid models, we perform log-uniform sampling of 𝑑𝑖 ≤ 16, 𝑤𝑖 ≤ 1024 and divisible by 8, 𝑏𝑖 ∈ {1, 2, 4}, and 𝑔𝑖 ∈ {1, 2, … , 32}. • There are (16 ∙ 128 ∙ 3 ∙ 6)4≈ 1018possible model configurations in the AnyNetX design space.
  • 14. Design Space Design Aims 1. To simplify the structure of the design. 2. To improve the interpretability of the design space. 3. To improve or maintain the design space quality. 4. To maintain model diversity in the design space.
  • 15. AnyNetX(A, B, C) • Refer to unconstrained AnyNet design space as AnyNetXA. • Shared bottleneck ratio 𝑏𝑖 = 𝑏 for all stage i for the AnyNetXA  AynNetXB. • Shared group width 𝑔𝑖 = 𝑔 for all stage i for the AnyNetXB  AnyNetXC.
  • 16. AnyNetX(D, E) • AnyNetXD is from examining typical network structures of both good and bad networks from AnyNetXC.  A pattern emerges: good network have increasing widths. • AnyNetXD constraint: AnyNetXC & 𝑤𝑖+1 ≥ 𝑤𝑖. • In addition to stage widths 𝑤𝑖 increasing with i, the stage depths 𝑑𝑖 likewise tend to increase for the best models • AnyNetXE constraint: AnyNetXD & 𝑑𝑖+1 ≥ 𝑑𝑖. • Finally, constraints on 𝑤𝑖 and 𝑑𝑖 each reduce the design space by 4!, with a cumulative reduction of O(107) from AnyNetXA.
  • 18. Linear Fits • To gain further insight into the model structure, the best 20 models from AnyNetXE are showed in a single plot. • While there is significant variance in the individual models (gray curves), in the aggregate a pattern emerges. • In particular, in the same plot we show the line 𝑤𝑗 = 48 · (𝑗 + 1) for 0 ≤ 𝑗 ≤ 20
  • 19. Linear Fits • Inspired of AnyNetXD and AnyNetXE, a linear parameterization of block widths is as follow: 𝑢𝑗 = 𝑤0 + 𝑤 𝑎 ⋅ 𝑗 for 0 ≤ 𝑗 < 𝑑, 𝑤0 > 0, 𝑤 𝑎 > 0 • To quantize 𝑢𝑗, 𝑤 𝑚 is introduced as an additional parameter 𝑢𝑗 = 𝑤0 ⋅ 𝑤 𝑚 𝑠 𝑗 • Then, to quantize 𝑢𝑗, simply rounding 𝑠𝑗 and compute quantized per- block width 𝑤𝑗 via: 𝑤𝑗 = 𝑤0 ⋅ 𝑤 𝑚 ‫ہ‬ 𝑠 ‫ۀ‬𝑗 • Converting the per-block 𝑤𝑗 to per-stage format 𝑤𝑖: 𝑤𝑖 = 𝑤0 ⋅ 𝑤 𝑚 𝑖 𝑑𝑖 = ෍ 𝑗 1 ‫ہ‬ 𝑠 ‫ۀ‬𝑗 = 1
  • 20. Linear Fits efit is a mean log-ratio
  • 21. The RegNet Design Space • The design space of RegNet contains only simple, regular models.  𝑑 < 64  𝑤0, 𝑤 𝑎 < 256  1.5 ≤ 𝑤 𝑚 ≤ 3  𝑏 𝑎𝑛𝑑 𝑔 are same as AnyNet • 𝑤 𝑚 = 2 𝑎𝑛𝑑 𝑤0 = 𝑤 𝑎 make good performance, but to maintain the diversity of models they are not applied to RegNet design space.
  • 25. Common Design Patterns • The deeper the model, the better the performance. • Double the number of channels whenever the spatial activation size is reduced. • Skip connection is good. • Bottleneck is good. • Depthwise separable convolution is popular for low compute regime. • Inverted bottleneck is also good.
  • 26. RegNetTrends • The depth of best models is stable across regimes, with an optimal depth of ~20 blocks(60 layers). • This is in contrast to the common practice of using deeper models for higher flop regimes.
  • 27. RegNetTrends • The best models use a bottleneck ratio 𝑏 of 1.0, which effectively removes the bottleneck. • The width multiplier 𝑤 𝑚 of good models is ~2.5, similar but not identical to the popular recipe of doubling widths across stages.
  • 28. RegNetTrends • The remaining parameters(𝑔, 𝑤 𝑎, 𝑤0) increase with complexity
  • 29. Complexity Analysis • While not a common measure of network complexity, activations can heavily affect runtime on memory-bound hardware accelerators. • Activations increase with the square-root of flops, parameters increase linearly.
  • 30. RegNetX Constrained • Using these findings, RegNetX design space is refined – RegNetX C  𝑏 = 1, 𝑑 ≤ 40, and 𝑤 𝑚 ≥ 2  Limited parameters and activations following complexity analysis  Further depth limit: 12 ≤ 𝑑 ≤ 28
  • 31. Alternate Design Choices • Inverted bottleneck(𝑏 < 1) degrades the EDF slightly and depthwise conv performs even worse relative to 𝑏 = 1 and 𝑔 ≥ 1. • For RegNetX, a fixed resolution of 224x224 is best, even at higher flops. • Squeeze-and-Excitation(SE) op yields good gains – RegNetY
  • 36. Comparison to Existing Networks • The higher flop models have a large number of blocks in the third stage and a small number of blocks in the last stage. • The group width 𝑔 increases with complexity, but depth 𝑑 saturates for large models.
  • 37. State of the Art Comparison: Mobile Regime
  • 39. EfficientNet Comparison At low flops, EfficientNet outperforms the RegNetY. At intermediate flops, RegNetY outperforms EfficientNet, and at higher flops both RegNetX and RegNetY perform better.
  • 41. Additional Ablations • Fixed Depth  Surprisingly, fixed-depth networks can match the performance of variable depth networks for all flop regimes. • Fewer Stages  Top RegNet models at high flops have few blocks in the fourth stage but, 3 stage networks perform considerably worse. • Inverted Bottleneck  In a high-compute regime, b < 1 degrades results further.
  • 42. Additional Ablations • Swish vs ReLU  Swish outperforms ReLU at low flops, but ReLU is better at high flops.  Interestingly, if g is restricted to be 1(depthwise conv), Swish performs much better than ReLU.
  • 43. Optimization Settings • Initial learning rate and weight decay are stable across complexity regimes. RegNet EfficientNet