SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Once-for-All: Train One Network and
Specialize it for Efficient Deployment
[ICLR 2020]
2022. 03. 20. (Sun)
Presented by: 김동현
w/ Fundamental Team: 김채현, 박종익, 양현모, 이근배, 이재윤, 송헌
1
Contents
● Problem and Approach
● Key Challenge
● How to Train Once-for-all Network
● How to Deploy Once-for-all Network
● Evaluations
● Discussions
● Conclusion
2
Contents
● Problem and Approach
● Key Challenge
● How to Train Once-for-all Network
● How to Deploy Once-for-all Network
● Evaluations
● Discussions
● Conclusion
3
Main Problem to Solve
● There are various hardware platforms to deploy DNN models.
○ Survey says there are 23.14 billion IoT devices until 2018.
○ The devices have different resource constraints;
It is impossible to deploy the same model to all devices.
● The optimal neural network architecture varies by deployment environments
(e.g., #arithmetic units, application requirements).
4
Main Problem to Solve
● It is computationally prohibitive to find all the optimal architecture by training
on each environment.
● Then, how is it possible to cost-efficiently find the specialized model on
each platform?
5
target latency
= 20ms
Suggested Approach
● Train a Once-for-all(OFA) network, which enables serving on various
environment without additional training.
○ Various scales of sub-networks (about 1019
) are available from one OFA network.
○ Each hardware can find the specialized model for its requirements (e.g, latency).
6
Key Challenges for Once-for-All Network
Requirements
1. The sub-network architecture should be part of the largest network.
2. Sub-networks should share parameters with larger networks.
3. Optimal model architecture for specified hardwares should be easily found.
7
Key Challenges for Once-for-All Network
Requirements
1. The sub-network architecture should be part of the largest network.
2. Sub-networks should share parameters with larger networks.
3. Optimal model architecture for specified hardwares should be easily found.
Challenges
1. How to design sub-network architecture space based on a OFA network.
2. How to let sub-networks share parameters with larger networks.
3. How to select the optimal model for the hardware (in terms of latency,
accuracy).
8
Contents
● Problem and Approach
● Key Challenge
● How to Train Once-for-all Network : Challenges #1, #2
● How to Deploy Once-for-all Network: Challenges #3
● Evaluations
● Discussions
● Conclusion
9
Q&A
10
● Assumption: Follow the common practice of CNN models (e.g., ResNet).
○ A model consists of groups of Layers (i.e., units).
● Architecture Search Space
○ # Layers(L): the depth of each unit is chosen from {2, 3, 4}
○ # Channels(C): expansion ratio in each layer is chosen from {3, 4, 6}
○ Kernel Size(Ks): {3, 5, 7}
○ Input Dimension: ranges from 128 to 224 with a stride
● Num available sub-networks: ((3 * 3)2
+ (3 * 3)3
+ (3 * 3)4
)5
= about 1019
Training OFA Network - Network Architecture
… … …
…
L1 L2 L3
C
…
Ks
# units
11
How sub-networks share parameters:
● Elastic Kernel Size
○ Merely sharing the parameters of larger kernel can affect the performance.
○ When changing kernel size, pass through Transform Matrix:
■ For each layer, hold parameters for elastic kernels.
● # 25*25 parameters for 7x7 -> 5x5.
● # 9*9 parameters for 5x5 -> 3x3.
● E.g., 5x5 kernel = (Center of 7x7) * Transform Matrix
Training OFA Network - Sharing Parameters
12
How sub-networks share parameters:
● Elastic Depth (= #Layers)
○ The first D layers are shared when L layers exist in a unit.
○ Simpler depth settings compared to selecting random layers from L layers.
Training OFA Network - Sharing Parameters
L D
13
How sub-networks share parameters:
● Elastic Width (= #Channels)
○ For the given expansion ratio, select channels through a channel sorting method:
1. Calculate L1 Norm for each channel’s weights.
2. Sort the channels by the L1 Norm order.
3. Choose the top-K channels.
Training OFA Network - Sharing Parameters
L1 Norm
14
Progressive Shrinking
1. Train a full model (i.e. max vaule for each configuration).
● With the trained full-size model, Knowledge-Distillation techniques are leveraged.
● Note: Full model != Best model
Training OFA Network - Training Process
… … …
…
L1 L2 L3
Note1: Input image size is randomly chosen for each training batch
15
Progressive Shrinking
1. Train a full model (i.e. max vaule for each configuration).
2. Sample sub-networks varying kernel sizes and fine-tune.
a. For each step, sample one sub-net with different kernel sizes.
b. Calculate Loss. Loss = Full model loss * KD_raio + sub-net loss
c. Update the weights (updating sub-net’s weight -> updating the full model’s weight)
Training OFA Network - Training Process
… … …
L1 L2 L3 16
Note1: Input image size is randomly chosen for each training batch
Progressive Shrinking
1. Train a full model (i.e. max vaule for each configuration).
2. Sample sub-networks varying kernel sizes and fine-tune.
3. Sample sub-networks varying depth and fine-tune.
4. Sample sub-networks varying channel expansion ratio and fine-tune.
Training OFA Network - Training Process
… … …
L1 L2 L3
Note2: Refer to Appendix B for impl. details of progressive shrinking
Note1: Input image size is randomly chosen for each training batch
17
Deploying Specialized Model w/ OFA Network
Problem:
● derive the specialized sub-network for a given deployment scenario (e.g.,
latency constraints).
Solution:
● Train an accuracy predictor (3-layer FFNN)
○ f(architecture, input image size) => accuracy
○ randomly sample 16K sub-networks, measure the accuracy on 10K validation images
● Latency Lookup Table (Details in the ProxylessNAS paper)
○ On each hardware platform, build a latency lookup table .
● Conduct an evolutionary search leveraging the above information.
○ Mutate from the known sub-network by sampling and predicting the performance.
○ add the mutated sub-network to the child pool if it satisfies the constraint (latency).
18
Q&A
19
Evaluation
● ImageNet Dataset
● Eval on Various Hardware Platforms:
○ Samsung S7 Edge, Note8, Note10, Google Pixel1, Pixel2, LG G8, NVIDIA 1080Ti, V100
GPUs, Jetson TX2, Intel Xeon CPU, Xilinx ZU9EG, and ZU3EG FPGAs
● Please refer to the paper for the detailed training configurations.
20
Evaluation
Performance of sub-networks on ImageNet
● top-1 accuracy under 224x224 resolution.
● Can achieve higher performance through Progressive Shrinking.
○ 74.8% top1 accuracy (D=4, W=3, K=3), which is on par with MobileNetV3-Large.
○ Without PS, it achieves 71.5%, which is 3.3% lower.
21
get the same architecture from
the full model w/o PS
Evaluation
Reduced Design Cost
● reports comparison between OFA and hardware-aware NAS methods
○ NAS: The design cost is linear to the number of deployment scenarios (N).
○ the total CO2 emissions of OFA is:
■ 16× fewer than ProxylessNAS
■ 19× fewer than FBNet
■ 1,300× fewer than MnasNet
22
Evaluation
OFA under Different Computational Resource Constraints
● Better accuracy under the same constraints:
○ (Left): MACs, (Right): Latency
○ Achieves higher accuracy, Requires lower computations
○ Better than “OFA - Train from scratch”, which is trained from the scratch without pretraining.
23
Discussions
● Would it work if the same approach is applied to other models, tasks (e.g.,
Transformer, NLP)?
● The architecture search space is limited to certain models.
○ e.g. How to apply the method to models such as HRNet?
24
Conclusion
● Once-for-all(OFA) Network allows training one large model and deploying
various sub-networks without additional training.
● OFA suggests Progressive Shrinking algorithm to share and find
sub-networks, which highly reduces the design cost.
● The paper shows OFA can achieve higher performance with ImageNet
dataset.
● With a trained OFA network, optimal sub-networks can be found on various
deployment environments.
25
Q&A
26

Más contenido relacionado

La actualidad más candente

文献紹介:X3D: Expanding Architectures for Efficient Video Recognition
文献紹介:X3D: Expanding Architectures for Efficient Video Recognition文献紹介:X3D: Expanding Architectures for Efficient Video Recognition
文献紹介:X3D: Expanding Architectures for Efficient Video RecognitionToru Tamaki
 
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement LearningDeep Learning JP
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...Deep Learning JP
 
動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)cvpaper. challenge
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Yuta Niki
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? Deep Learning JP
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.pptrahul km
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksTianxiang Xiong
 
【DL輪読会】CLIPORT: What and Where Pathways for Robotic Manipulation (CoRL 2021)
【DL輪読会】CLIPORT: What and Where Pathways for Robotic Manipulation (CoRL 2021)【DL輪読会】CLIPORT: What and Where Pathways for Robotic Manipulation (CoRL 2021)
【DL輪読会】CLIPORT: What and Where Pathways for Robotic Manipulation (CoRL 2021)Deep Learning JP
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)Takanori Ogata
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOKai-Wen Zhao
 
[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)
[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)
[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)Deep Learning JP
 
서버학개론(백엔드 서버 개발자를 위한)
서버학개론(백엔드 서버 개발자를 위한)서버학개론(백엔드 서버 개발자를 위한)
서버학개론(백엔드 서버 개발자를 위한)수보 김
 
Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告Yuki Saito
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsNhatHai Phan
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating SystemsAshwani Garg
 
CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)MasanoriSuganuma
 

La actualidad más candente (20)

文献紹介:X3D: Expanding Architectures for Efficient Video Recognition
文献紹介:X3D: Expanding Architectures for Efficient Video Recognition文献紹介:X3D: Expanding Architectures for Efficient Video Recognition
文献紹介:X3D: Expanding Architectures for Efficient Video Recognition
 
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
 
Neural turing machine
Neural turing machineNeural turing machine
Neural turing machine
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
 
動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.ppt
 
Overview on NUMA
Overview on NUMAOverview on NUMA
Overview on NUMA
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
【DL輪読会】CLIPORT: What and Where Pathways for Robotic Manipulation (CoRL 2021)
【DL輪読会】CLIPORT: What and Where Pathways for Robotic Manipulation (CoRL 2021)【DL輪読会】CLIPORT: What and Where Pathways for Robotic Manipulation (CoRL 2021)
【DL輪読会】CLIPORT: What and Where Pathways for Robotic Manipulation (CoRL 2021)
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBO
 
[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)
[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)
[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)
 
서버학개론(백엔드 서버 개발자를 위한)
서버학개론(백엔드 서버 개발자를 위한)서버학개론(백엔드 서버 개발자를 위한)
서버학개론(백엔드 서버 개발자를 위한)
 
Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating Systems
 
CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)
 

Similar a Once-for-All: Train One Network and Specialize it for Efficient Deployment

Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfDuy-Hieu Bui
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networksFörderverein Technische Fakultät
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
Unit 1
Unit 1Unit 1
Unit 1sasi
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learningAmer Ather
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...Edge AI and Vision Alliance
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer visionMarcin Jedyk
 

Similar a Once-for-All: Train One Network and Specialize it for Efficient Deployment (20)

Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
Multicore architectures
Multicore architecturesMulticore architectures
Multicore architectures
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networks
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Clustering
ClusteringClustering
Clustering
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
Unit 1
Unit 1Unit 1
Unit 1
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
VGG.pptx
VGG.pptxVGG.pptx
VGG.pptx
 

Más de taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 

Más de taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Último

怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 

Último (20)

怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 

Once-for-All: Train One Network and Specialize it for Efficient Deployment

  • 1. Once-for-All: Train One Network and Specialize it for Efficient Deployment [ICLR 2020] 2022. 03. 20. (Sun) Presented by: 김동현 w/ Fundamental Team: 김채현, 박종익, 양현모, 이근배, 이재윤, 송헌 1
  • 2. Contents ● Problem and Approach ● Key Challenge ● How to Train Once-for-all Network ● How to Deploy Once-for-all Network ● Evaluations ● Discussions ● Conclusion 2
  • 3. Contents ● Problem and Approach ● Key Challenge ● How to Train Once-for-all Network ● How to Deploy Once-for-all Network ● Evaluations ● Discussions ● Conclusion 3
  • 4. Main Problem to Solve ● There are various hardware platforms to deploy DNN models. ○ Survey says there are 23.14 billion IoT devices until 2018. ○ The devices have different resource constraints; It is impossible to deploy the same model to all devices. ● The optimal neural network architecture varies by deployment environments (e.g., #arithmetic units, application requirements). 4
  • 5. Main Problem to Solve ● It is computationally prohibitive to find all the optimal architecture by training on each environment. ● Then, how is it possible to cost-efficiently find the specialized model on each platform? 5 target latency = 20ms
  • 6. Suggested Approach ● Train a Once-for-all(OFA) network, which enables serving on various environment without additional training. ○ Various scales of sub-networks (about 1019 ) are available from one OFA network. ○ Each hardware can find the specialized model for its requirements (e.g, latency). 6
  • 7. Key Challenges for Once-for-All Network Requirements 1. The sub-network architecture should be part of the largest network. 2. Sub-networks should share parameters with larger networks. 3. Optimal model architecture for specified hardwares should be easily found. 7
  • 8. Key Challenges for Once-for-All Network Requirements 1. The sub-network architecture should be part of the largest network. 2. Sub-networks should share parameters with larger networks. 3. Optimal model architecture for specified hardwares should be easily found. Challenges 1. How to design sub-network architecture space based on a OFA network. 2. How to let sub-networks share parameters with larger networks. 3. How to select the optimal model for the hardware (in terms of latency, accuracy). 8
  • 9. Contents ● Problem and Approach ● Key Challenge ● How to Train Once-for-all Network : Challenges #1, #2 ● How to Deploy Once-for-all Network: Challenges #3 ● Evaluations ● Discussions ● Conclusion 9
  • 11. ● Assumption: Follow the common practice of CNN models (e.g., ResNet). ○ A model consists of groups of Layers (i.e., units). ● Architecture Search Space ○ # Layers(L): the depth of each unit is chosen from {2, 3, 4} ○ # Channels(C): expansion ratio in each layer is chosen from {3, 4, 6} ○ Kernel Size(Ks): {3, 5, 7} ○ Input Dimension: ranges from 128 to 224 with a stride ● Num available sub-networks: ((3 * 3)2 + (3 * 3)3 + (3 * 3)4 )5 = about 1019 Training OFA Network - Network Architecture … … … … L1 L2 L3 C … Ks # units 11
  • 12. How sub-networks share parameters: ● Elastic Kernel Size ○ Merely sharing the parameters of larger kernel can affect the performance. ○ When changing kernel size, pass through Transform Matrix: ■ For each layer, hold parameters for elastic kernels. ● # 25*25 parameters for 7x7 -> 5x5. ● # 9*9 parameters for 5x5 -> 3x3. ● E.g., 5x5 kernel = (Center of 7x7) * Transform Matrix Training OFA Network - Sharing Parameters 12
  • 13. How sub-networks share parameters: ● Elastic Depth (= #Layers) ○ The first D layers are shared when L layers exist in a unit. ○ Simpler depth settings compared to selecting random layers from L layers. Training OFA Network - Sharing Parameters L D 13
  • 14. How sub-networks share parameters: ● Elastic Width (= #Channels) ○ For the given expansion ratio, select channels through a channel sorting method: 1. Calculate L1 Norm for each channel’s weights. 2. Sort the channels by the L1 Norm order. 3. Choose the top-K channels. Training OFA Network - Sharing Parameters L1 Norm 14
  • 15. Progressive Shrinking 1. Train a full model (i.e. max vaule for each configuration). ● With the trained full-size model, Knowledge-Distillation techniques are leveraged. ● Note: Full model != Best model Training OFA Network - Training Process … … … … L1 L2 L3 Note1: Input image size is randomly chosen for each training batch 15
  • 16. Progressive Shrinking 1. Train a full model (i.e. max vaule for each configuration). 2. Sample sub-networks varying kernel sizes and fine-tune. a. For each step, sample one sub-net with different kernel sizes. b. Calculate Loss. Loss = Full model loss * KD_raio + sub-net loss c. Update the weights (updating sub-net’s weight -> updating the full model’s weight) Training OFA Network - Training Process … … … L1 L2 L3 16 Note1: Input image size is randomly chosen for each training batch
  • 17. Progressive Shrinking 1. Train a full model (i.e. max vaule for each configuration). 2. Sample sub-networks varying kernel sizes and fine-tune. 3. Sample sub-networks varying depth and fine-tune. 4. Sample sub-networks varying channel expansion ratio and fine-tune. Training OFA Network - Training Process … … … L1 L2 L3 Note2: Refer to Appendix B for impl. details of progressive shrinking Note1: Input image size is randomly chosen for each training batch 17
  • 18. Deploying Specialized Model w/ OFA Network Problem: ● derive the specialized sub-network for a given deployment scenario (e.g., latency constraints). Solution: ● Train an accuracy predictor (3-layer FFNN) ○ f(architecture, input image size) => accuracy ○ randomly sample 16K sub-networks, measure the accuracy on 10K validation images ● Latency Lookup Table (Details in the ProxylessNAS paper) ○ On each hardware platform, build a latency lookup table . ● Conduct an evolutionary search leveraging the above information. ○ Mutate from the known sub-network by sampling and predicting the performance. ○ add the mutated sub-network to the child pool if it satisfies the constraint (latency). 18
  • 20. Evaluation ● ImageNet Dataset ● Eval on Various Hardware Platforms: ○ Samsung S7 Edge, Note8, Note10, Google Pixel1, Pixel2, LG G8, NVIDIA 1080Ti, V100 GPUs, Jetson TX2, Intel Xeon CPU, Xilinx ZU9EG, and ZU3EG FPGAs ● Please refer to the paper for the detailed training configurations. 20
  • 21. Evaluation Performance of sub-networks on ImageNet ● top-1 accuracy under 224x224 resolution. ● Can achieve higher performance through Progressive Shrinking. ○ 74.8% top1 accuracy (D=4, W=3, K=3), which is on par with MobileNetV3-Large. ○ Without PS, it achieves 71.5%, which is 3.3% lower. 21 get the same architecture from the full model w/o PS
  • 22. Evaluation Reduced Design Cost ● reports comparison between OFA and hardware-aware NAS methods ○ NAS: The design cost is linear to the number of deployment scenarios (N). ○ the total CO2 emissions of OFA is: ■ 16× fewer than ProxylessNAS ■ 19× fewer than FBNet ■ 1,300× fewer than MnasNet 22
  • 23. Evaluation OFA under Different Computational Resource Constraints ● Better accuracy under the same constraints: ○ (Left): MACs, (Right): Latency ○ Achieves higher accuracy, Requires lower computations ○ Better than “OFA - Train from scratch”, which is trained from the scratch without pretraining. 23
  • 24. Discussions ● Would it work if the same approach is applied to other models, tasks (e.g., Transformer, NLP)? ● The architecture search space is limited to certain models. ○ e.g. How to apply the method to models such as HRNet? 24
  • 25. Conclusion ● Once-for-all(OFA) Network allows training one large model and deploying various sub-networks without additional training. ● OFA suggests Progressive Shrinking algorithm to share and find sub-networks, which highly reduces the design cost. ● The paper shows OFA can achieve higher performance with ImageNet dataset. ● With a trained OFA network, optimal sub-networks can be found on various deployment environments. 25