MobileNet - PR044

•

15 recomendaciones•8,914 vistas

Jinwon Lee

Tensorflow-KR 논문읽기모임 44번째 발표자료입니다 영상링크 : https://youtu.be/7UoOFKcyIvM 논문링크 : https://arxiv.org/abs/1704.04861

Tecnología

MobileNets:
Efficient Convolutional Neural Networks for
MobileVision Applications
29th October, 2017
PR12 Paper Review
Jinwon Lee
Samsung Electronics

Key Requirements for Commercial Computer
Vision Usage
• Data-centers(Clouds)
 Rarely safety-critical
 Low power is nice to have
 Real-time is preferable
• Gadgets – Smartphones, Self-driving cars, Drones, etc.
 Usually safety-critical(except smartphones)
 Low power is must-have
 Real-time is required
Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

What’s the “Right” Neural Network for Use in a
Gadget?
• Desirable Properties
 Sufficiently high accuracy
 Low computational complexity
 Low energy usage
 Small model size
Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

Why Small Deep Neural Networks?
• Small DNNs train faster on distributed hardware
• Small DNNs are more deployable on embedded processors
• Small DNNs are easily updatable Over-The-Air(OTA)
Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

Techniques for Small Deep Neural Networks
• Remove Fully-Connected Layers
• Kernel Reduction ( 3x3  1x1 )
• Channel Reduction
• Evenly Spaced Downsampling
• Depthwise Separable Convolutions
• Shuffle Operations
• Distillation & Compression
Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

Key Idea : Depthwise Separable Convolution!

Recap – Convolution Operation
1 1 1 1 0
1 1 1 1 0
0 0 0 1 1
0 1 1 1 0
0 1 1 0 0
0 1 1 0 1
0 1 1 0 0
0 0 1 1 0
0 0 1 1 1
1 1 1 0 0
1 1 1 0 0
0 1 1 1 0
0 0 1 1 1
0 0 1 1 0
0 1 1 0 0
-1 0 0
0 1 0
0 0 -1
0 -1 0
-1 1 -1
0 -1 0
1 0 1
0 1 0
1 0 1
-1 0 -1
0 1 0
0 0 -1
0 -1 0
-1 1 0
0 -1 0
1 0 1
0 -1 0
1 0 1
1 -1 1
0 -1 -1
3 1 0
3 0 1
-2 0 2
0 2 3
=
convolution
Input channel : 3 Output channel : 2# of filters : 2

Recap –VGG, Inception-v3
• VGG – use only 3x3 convolution
 Stack of 3x3 conv layers has same effective receptive
field as 5x5 or 7x7 conv layer
 Deeper means more non-linearities
 Fewer parameters: 2 x (3 x 3 x C) vs (5 x 5 x C)
 regularization effect
• Inception-v3
 Factorization of filters

Why should we always consider all channels?

Standard Convolution
Depthwise convolution
Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/

Depthwise Separable Convolution
• Depthwise Convolution + Pointwise Convolution(1x1 convolution)
Depthwise convolution Pointwise convolution
Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/

Standard Convolution vs Depthwise Separable
Convolution

Standard Convolution vs Depthwise Separable
Convolution
• Standard convolutions have the computational cost of
 DK x DK x M x N x DF x DF
• Depthwise separable convolutions cost
 DK x DK x M x DF x DF + M x N x DF x DF
• Reduction in computations
 1 / N + 1 / DK
2
 If we use 3x3 depthwise separable convolutions, we get between 8 to 9 times
less computations
DK : width/height of filters
DF : width/height of feature maps
M : number of input channels
N : number of output channels(number of filters)

Width Multiplier & Resolution Multiplier
• Width Multiplier –Thinner Models
 For a given layer and width multiplier α, the number of input channels M
becomes αM and the number of output channels N becomes αN – where α
with typical settings of 1, 0.75, 0.6 and 0.25
• Resolution Multiplier – Reduced Representation
 The second hyper-parameter to reduce the computational cost of a neural
network is a resolution multiplier ρ
 0<ρ≤1, which is typically set of implicitly so that input resolution of network
is 224, 192, 160 or 128(ρ = 1, 0.857, 0.714, 0.571)
• Computational cost:
DK x DK x αM x ρDF x ρDF + αM x αN x ρDF x ρDF

Width Multiplier & Resolution Multiplier

Results
PlaNet : 52M parameters, 5.74B mult-adds
MobilNet : 13M parameters, 0.58M mult-adds

Tensorflow Implementation
https://github.com/Zehaos/MobileNet/blob/master/nets/mobilenet.py

Más contenido relacionado

La actualidad más candente

Variational AutoencoderMark Chang

Convolution Neural Network (CNN)Suraj Aavula

Batch normalization presentationOwin Will

MobileNet V3Wonbeom Jang

Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab

Deep Learning for Computer Vision: Medical Imaging (UPC 2016)Universitat Politècnica de Catalunya

Resnetashwinjoseph95

What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn

PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee

AutoencoderHARISH R

Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...Edureka!

Introduction to Deep learningleopauly

CNN TutorialSungjoon Choi

Machine Learning - Convolutional Neural NetworkRichard Kuo

Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna

Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone

Autoencoders in Deep Learningmilad abbasi

Convolutional neural networkMojammilHusain

Convolutional Neural Networks (CNN)Gaurav Mittal

CNN Machine learning DeepLearningAbhishek Sharma

La actualidad más candente (20)

Variational Autoencoder

Convolution Neural Network (CNN)

Batch normalization presentation

MobileNet V3

Semantic segmentation with Convolutional Neural Network Approaches

Deep Learning for Computer Vision: Medical Imaging (UPC 2016)

Resnet

What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...

PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks

Autoencoder

Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...

Introduction to Deep learning

CNN Tutorial

Machine Learning - Convolutional Neural Network

Convolutional Neural Network and Its Applications

Modern Convolutional Neural Network techniques for image segmentation

Autoencoders in Deep Learning

Convolutional neural network

Convolutional Neural Networks (CNN)

CNN Machine learning DeepLearning

Similar a MobileNet - PR044

Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC

Improving Hardware Efficiency for DNN ApplicationsChester Chen

26_Fan.pdfRioCarthiis

PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee

Improving the throughput of MIMO Ad hoc networks with Spatial Multiplexing an...Sinthana Sambandam

B.tech_project_ppt.pptxsupratikmondal6

Interference cancellation in uwb systemsjayasheelamoses

cdma2000_Fundamentals.pdfCheikhAhmetTidianeDi1

A 15 bit third order power optimized continuous time sigma delta modulator fo...eSAT Publishing House

rcwire.pptSomnathPal64

Cvpr 2018 papers review (efficient computing)DonghyunKang12

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE

Deep Learning in Low Power DevicesLokesh Vadlamudi

Multiband Transceivers - [Chapter 4] Design Parameters of Wireless RadiosSimen Li

MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...Laxmi Kant Tiwari

Lec7 cellular networkPraveen Thomas

12-edicon20185G Designing a Narrowband 28 GHz Band Pass Filter.pdfessedikiftene

Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu

Exhibitor sessions: Khipu and Aruba, HPEJisc

CnnMehrnaz Faraz

Similar a MobileNet - PR044 (20)

Introduction to CNN Models: DenseNet & MobileNet

Improving Hardware Efficiency for DNN Applications

26_Fan.pdf

PR-144: SqueezeNext: Hardware-Aware Neural Network Design

Improving the throughput of MIMO Ad hoc networks with Spatial Multiplexing an...

B.tech_project_ppt.pptx

Interference cancellation in uwb systems

cdma2000_Fundamentals.pdf

A 15 bit third order power optimized continuous time sigma delta modulator fo...

rcwire.ppt

Cvpr 2018 papers review (efficient computing)

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)

Deep Learning in Low Power Devices

Multiband Transceivers - [Chapter 4] Design Parameters of Wireless Radios

MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...

Lec7 cellular network

12-edicon20185G Designing a Narrowband 28 GHz Band Pass Filter.pdf

Once-for-All: Train One Network and Specialize it for Efficient Deployment

Exhibitor sessions: Khipu and Aruba, HPE

Cnn

Más de Jinwon Lee

PR-366: A ConvNet for 2020sJinwon Lee

PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee

PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee

PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee

PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee

PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee

PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee

PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee

PR243: Designing Network Design SpacesJinwon Lee

PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee

PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee

PR-207: YOLOv3: An Incremental ImprovementJinwon Lee

PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee

PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee

PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee

PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee

PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee

PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee

PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee

Más de Jinwon Lee (20)

PR-366: A ConvNet for 2020s

PR-355: Masked Autoencoders Are Scalable Vision Learners

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...

PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...

PR-317: MLP-Mixer: An all-MLP Architecture for Vision

PR-297: Training data-efficient image transformers & distillation through att...

PR-284: End-to-End Object Detection with Transformers(DETR)

PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector

PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...

PR243: Designing Network Design Spaces

PR-231: A Simple Framework for Contrastive Learning of Visual Representations

PR-217: EfficientDet: Scalable and Efficient Object Detection

PR-207: YOLOv3: An Incremental Improvement

PR-197: One ticket to win them all: generalizing lottery ticket initializatio...

PR-183: MixNet: Mixed Depthwise Convolutional Kernels

PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

PR-155: Exploring Randomly Wired Neural Networks for Image Recognition

PR-132: SSD: Single Shot MultiBox Detector

PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...

PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks

Último

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

GenCyber Cyber Security Day PresentationMichael W. Hawkins

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

How to convert PDF to text with Nanonetsnaman860154

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Artificial Intelligence: Facts and MythsJoaquim Jorge

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Scaling API-first – The story of a global engineering organizationRadu Cotescu

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

A Year of the Servo Reboot: Where Are We Now?Igalia

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

MobileNet - PR044

1. MobileNets: Efficient Convolutional Neural Networks for MobileVision Applications 29th October, 2017 PR12 Paper Review Jinwon Lee Samsung Electronics

2. MobileNets

3. Key Requirements for Commercial Computer Vision Usage • Data-centers(Clouds)  Rarely safety-critical  Low power is nice to have  Real-time is preferable • Gadgets – Smartphones, Self-driving cars, Drones, etc.  Usually safety-critical(except smartphones)  Low power is must-have  Real-time is required Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

4. What’s the “Right” Neural Network for Use in a Gadget? • Desirable Properties  Sufficiently high accuracy  Low computational complexity  Low energy usage  Small model size Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

5. Why Small Deep Neural Networks? • Small DNNs train faster on distributed hardware • Small DNNs are more deployable on embedded processors • Small DNNs are easily updatable Over-The-Air(OTA) Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

6. Techniques for Small Deep Neural Networks • Remove Fully-Connected Layers • Kernel Reduction ( 3x3  1x1 ) • Channel Reduction • Evenly Spaced Downsampling • Depthwise Separable Convolutions • Shuffle Operations • Distillation & Compression Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

7. Key Idea : Depthwise Separable Convolution!

8. Recap – Convolution Operation 1 1 1 1 0 1 1 1 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 0 0 1 1 0 1 0 1 1 0 0 0 0 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 -1 0 0 0 1 0 0 0 -1 0 -1 0 -1 1 -1 0 -1 0 1 0 1 0 1 0 1 0 1 -1 0 -1 0 1 0 0 0 -1 0 -1 0 -1 1 0 0 -1 0 1 0 1 0 -1 0 1 0 1 1 -1 1 0 -1 -1 3 1 0 3 0 1 -2 0 2 0 2 3 = convolution Input channel : 3 Output channel : 2# of filters : 2

9. Recap –VGG, Inception-v3 • VGG – use only 3x3 convolution  Stack of 3x3 conv layers has same effective receptive field as 5x5 or 7x7 conv layer  Deeper means more non-linearities  Fewer parameters: 2 x (3 x 3 x C) vs (5 x 5 x C)  regularization effect • Inception-v3  Factorization of filters

10. Why should we always consider all channels?

11. Standard Convolution Depthwise convolution Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/

12. Depthwise Separable Convolution • Depthwise Convolution + Pointwise Convolution(1x1 convolution) Depthwise convolution Pointwise convolution Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/

13. Standard Convolution vs Depthwise Separable Convolution

14. Standard Convolution vs Depthwise Separable Convolution • Standard convolutions have the computational cost of  DK x DK x M x N x DF x DF • Depthwise separable convolutions cost  DK x DK x M x DF x DF + M x N x DF x DF • Reduction in computations  1 / N + 1 / DK 2  If we use 3x3 depthwise separable convolutions, we get between 8 to 9 times less computations DK : width/height of filters DF : width/height of feature maps M : number of input channels N : number of output channels(number of filters)

15. Depthwise Separable Convolutions

16. Model Structure

17. Width Multiplier & Resolution Multiplier • Width Multiplier –Thinner Models  For a given layer and width multiplier α, the number of input channels M becomes αM and the number of output channels N becomes αN – where α with typical settings of 1, 0.75, 0.6 and 0.25 • Resolution Multiplier – Reduced Representation  The second hyper-parameter to reduce the computational cost of a neural network is a resolution multiplier ρ  0<ρ≤1, which is typically set of implicitly so that input resolution of network is 224, 192, 160 or 128(ρ = 1, 0.857, 0.714, 0.571) • Computational cost: DK x DK x αM x ρDF x ρDF + αM x αN x ρDF x ρDF

18. Width Multiplier & Resolution Multiplier

19. Experiments – Model Choices

20. Model Shrinking Hyperparameters

21. Model Shrinking Hyperparameters

22. Results

23. Results PlaNet : 52M parameters, 5.74B mult-adds MobilNet : 13M parameters, 0.58M mult-adds

24. Results

25. Results

26. Tensorflow Implementation https://github.com/Zehaos/MobileNet/blob/master/nets/mobilenet.py