SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
You Only Look One-Level Feature
Qiang Chen et al. (CVPR 2021 accepted)
Presenter: Dongmin Choi
- Various FPN-based One-stage Detectors
- What is the main component for these networks to achieve high
performance?
- Not multi-scale feature fusion but divide-and-conquer
- Instead of adopting the complex feature pyramid, let’s utilize only
one-level feature
- You Only Look One-level Feature (YOLOF)
Dilated Encoder
Uniform Matching
2.5x faster than RetinaNet and 7x less training than DETR
Abstract
Revisiting Feature Pyramid Network (FPN)
T.Y Lin et al., Feature Pyramid Networks for Object Detection, CVPR 2017
- Essential component for both two-stage and one-stage detectors
- Two main benefits:
(1) multi-scale feature fusion: fusing multiple low- and high-res feature inputs
(2) divide-and-conquer: detecting objects on different levels
- Common belief: multi-scale feature fusion is important
- Designing complex fusion methods manually or via NAS have been
proposed actively
- However, few studies on the function of divide-and-conquer
Introduction
Feature Pyramid Network (FPN)
(1)
(2)
Introduction
Multi-scale Feature Fusion vs. Divide-and-Conquer
- Let’s decouple the multi-scale feature fusion and divide-and-conquer
functionalities with RetinaNet
Introduction
Multi-scale Feature Fusion vs. Divide-and-Conquer
FPN
SiMo can achieve comparable performance with MiMo
- Let’s decouple the multi-scale feature fusion and divide-and-conquer
functionalities with RetinaNet
Introduction
Multi-scale Feature Fusion vs. Divide-and-Conquer
Performance drops dramatically in MiSo and SiSo
- Let’s decouple the multi-scale feature fusion and divide-and-conquer
functionalities with RetinaNet
Introduction
Multi-scale Feature Fusion vs. Divide-and-Conquer
- Let’s decouple the multi-scale feature fusion and divide-and-conquer
functionalities with RetinaNet
- SiMO can achieve comparable performance with MiMo
- Performance drops dramatically in MiSo and SiSo
- Two facts:
(1) C5 feature carries sufficient context for detecting objects on
various scales
(2) divide-and-conquer >> multi-scale feature fusion
- Let’s decouple the multi-scale feature fusion and divide-and-conquer
functionalities with RetinaNet
Introduction
Divide-and-Conquer
- divides the complex detection problem into several sub-problems by
object scales; optimization problem
- But memory burden, slow and complex
- Can we solve this problem with SiSo encoder?
(1) extract the multi-scale contexts for objects on various scales only
with a single-level feature
(2) solve the imbalance problem of positive anchors raised by the
sparse anchors in the single feature
Introduction
Divide-and-Conquer
- divides the complex detection problem into several sub-problems by
object scales; optimization problem
- But memory burden, slow and complex
- Can we solve this problem with SiSo encoder?
(1) extract the multi-scale contexts for objects on various scales only
with a single-level feature → Dilated Encoder
(2) solve the imbalance problem of positive anchors raised by the
sparse anchors in the single feature → Uniform Matching
YOLOF
Introduction
Contributions
- show that the most significant benefits of FPS is its divide-and-
conquer solution rather than multi-scale feature fusion
- YOLOF, a simple and efficient baseline without using FPN
- On COCO benchmark, YOLOF can achieve comparable results with
a faster speed on GPUs with RetinaNet, DETR and YOLOv4
Related Works
Multi-level Feature Detectors
- A conventional technique to employ multiple features
- SSD1: first utilizes multi-scale features and performs detection on each
scale for different scales objects
- FPN2: semantic-riched feature pyramids by combining shallow features
and deep features (dominant)
1W Liu at al., SSD: Single Shot MultiBox Detector, ECCV 2016
2T.Y Lin et al., Feature Pyramid Networks for Object Detection, CVPR 2017
Related Works
Single-level Feature Detectors
- R-CNN series
- YOLO and YOLOv2
- CornerNet and CenterNet: high-resolution feature map brings
enormous memory cost
- DETR: needs a long training schedule due to the totally anchor-free
mechanism and transformer learning phase
W Liu at al., SSD: Single Shot MultiBox Detector, ECCV 2016
Super fast but low performance
Cost Analysis of MiMo Encoders
Quantitative Study on the Cost of MiMo Encoders
- Based on RetinaNet w/ ResNet-50
- Pipeline: Backbone + Encoder + Decoder
- The MiMo encoder and decoder bring enormous memory burdens
Method
Replacing MiMo encoder with SiSo encoder
- Two problems of SiSo encoders which cause performance drop:
(1) Limited Scale Range
→ the range of scales matching to the C5 feature’s receptive field is
limited
→ forbid the detection performance for objects across various scales
(2) Imbalance Problem on Positive Anchors
→ the imbalance problem on positive anchors
→ raised by sparse anchors in the single-level feature
Method
1. Limited Scale Range
- In SiSo encoders, the single-level feature setting has a receptive field
which is a constant.
C5
A
B
- A: enlarging the receptive field by stacking standard and dilated conv.
- B: combine the original scale range and A
Method
1. Limited Scale Range
- Dilated Encoder (SiSo)
: the Projector and the Residual Blocks
1. 1 × 1 conv layer to reduce the channel dimension
2. 3 × 3 conv layer to refine semantic contexts
Stacking four dilated residual blocks with different dilations rates
(2, 4, 6, 8) to generate output features with multiple receptive fields
Method
2. Imbalance Problem on Positive Anchors
- Max-IoU matching
: strategy to define positive anchors by measuring the IoUs between
anchors and ground-truth boxes
- In MiMo encoders, the anchors are
pre-defined on multiple levels. So,
Max-IoU matching generates a
sufficient number of positive anchors
- However, in SiSo encoder, the num-
ber of anchors diminish extensively
from about 100𝑘 to 5𝑘. Sparse an-
chors raise a matching problem.
X Zhou at al., Objects as Points, arXiv 2019
Method
2. Imbalance Problem on Positive Anchors
Large ground-truth boxes
dominates the positive anchors
Uniform Matching
Method
2. Imbalance Problem on Positive Anchors
- Uniform Matching
: adopting the 𝑘 𝑛𝑒𝑎𝑟𝑒𝑠𝑡 anchor as positive anchors for each ground-
truth box (top-𝑘 matching)
- IoU thresholds
: ignore large IoU (>0.7) negative anchors and small IoU (<0.15)
positive anchors
Method
3. YOLOF
- A fast and straightforward framework with single-level feature
- Three parts: the backbone, the encoder, and the decoder
Method
3. YOLOF: Backbone
- ResNet and ResNeXt series as the backbone
- Pre-trained on ImageNet
- Output: C5 feature map (2048 channels, output stride 32)
Method
3. YOLOF: Encoder
- Two projection layer (1x1 - 3x3) to reduce channels (2048 → 512)
- Stacked residual blocks (1x1 - 3x3 w/ dilation - 1x1)
Method
3. YOLOF: Decoder
- Two parallel task-specific heads: the clf head and the reg head
- Two minor modification
(1) follow the design of FFN in DETR & different # of conv.
(2) follow AutoAssign and implicit objectness prediction
B Zhu at al., AutoAssign: Differentiable Label Assignment for Dense Object Detection, arXiv 2020
Method
3. YOLOF: Decoder
- Two parallel task-specific heads: the clf head and the reg head
- Two minor modification
(1) follow the design of FFN in DETR & different # of conv.
(2) follow AutoAssign and implicit objectness prediction
B Zhu at al., AutoAssign: Differentiable Label Assignment for Dense Object Detection, arXiv 2020
Objectness (FG or BG) without direct supervision
Method
3. YOLOF: Decoder
- Two parallel task-specific heads: the clf head and the reg head
- Two minor modification
(1) follow the design of FFN in DETR & different # of conv.
(2) follow AutoAssign and implicit objectness prediction
B Zhu at al., AutoAssign: Differentiable Label Assignment for Dense Object Detection, arXiv 2020
Objectness (FG or BG) without direct supervision
# 3x3 Conv
Method
3. YOLOF: Other Details
Random Shift Operation
- Sparse pre-defined anchors in YOLOF
→ decrease the match quality between anchors and ground-truth boxes
- Random shift operation (max 32 pixels) to inject noises into the position
- Increase the prob. of GT boxes matching with high-quality anchors
- Restriction on the anchors’ center’s shift is also helpful (< 32 pixels)
Experiments
Implementation Details
- COCO benchmark
- Synchronized SGD over 8 GPUs with 64 batch-size
- Initial learning rate: 0.12
- Following DETR, a smaller learning rate for the backbone (1/3)
- Warmup iteration: 1500
- ‘1x’ schedule: a total of 22.5𝑘 iterations (lr decay at 15𝑘 and 20𝑘)
- Inference: NMS (threshold = 0.6)
- Other hyperparameters follow the settings of RetinaNet
Experiments
1. Comparison with Previous Works: RetinaNet
- RetinaNet+: implicit objectness and GN layer is adopted to RetinaNet
- ‘1x’ models: single scale training (short side as 800 px, longer side as at most 1333)
- A: multi-scale training
- B: multi-scale training + multi-scale testing
A
B
Experiments
1. Comparison with Previous Works: DETR
- DETR achieved amazing performance only adopting a single C5 features.
That means transformer layers can capture global dependencies.
- YOLOF shows that a conventional local convolution layers can also achieve
this goal!
Experiments
1. Comparison with Previous Works: YOLOv4
- YOLOF-DC5: add dilations on the last stage of the backbone
- YOLOF-DC5 can run 13% faster than YOLOv4 with a 0.8 mAP improvement
Experiments
2. Ablation Experiments: Dilated Encoder and Uniform Matching
Experiments
2. Ablation Experiments: Dilated Encoder and Uniform Matching
- Dilated Encoder has a significant impact on large objects
Experiments
2. Ablation Experiments: Dilated Encoder and Uniform Matching
- Dilated Encoder has a significant impact on large objects
- Uniform Matching has a significant impact on small and medium objects
Experiments
2. Ablation Experiments: Dilated Encoder and Uniform Matching
- Dilated Encoder has a significant impact on large objects
- Uniform Matching has a significant impact on small and medium objects
Experiments
2. Ablation Experiments: ResBlocks and Dilations
Experiments
2. Ablation Experiments: Add Short Cut of Not
Experiments
2. Ablation Experiments: Number of Positives
Experiments
2. Ablation Experiments: Uniform Matching vs. Others
Conclusion
- The success of FPN is due to its divide-and-conquer solution to the
optimization problem in dense object detection
- YOLOF: a simple but highly efficient method without FPN
- Comparable performance with RetinaNet and DETR only with a
single-level feature
Thank you

Más contenido relacionado

La actualidad más candente

Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationDongmin Choi
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Dongmin Choi
 
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Dongmin Choi
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
Review : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingReview : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingDongmin Choi
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...changedaeoh
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료taeseon ryu
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님taeseon ryu
 

La actualidad más candente (20)

Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
 
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
 
Tldr
TldrTldr
Tldr
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Review : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingReview : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-training
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Convolutional Features for Instance Search
Convolutional Features for Instance SearchConvolutional Features for Instance Search
Convolutional Features for Instance Search
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님
 

Similar a Review: You Only Look One-level Feature

Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxfahmi324663
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learningYu Huang
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with TransformersDatabricks
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .netMarco Parenzan
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETMarco Parenzan
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNsAuro Tripathy
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Universität Rostock
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 

Similar a Review: You Only Look One-level Feature (20)

Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
Lp seminar
Lp seminarLp seminar
Lp seminar
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .net
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NET
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
Convolutional neural networks
Convolutional neural  networksConvolutional neural  networks
Convolutional neural networks
 

Más de Dongmin Choi

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Dongmin Choi
 
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationReview : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationDongmin Choi
 
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...Dongmin Choi
 
Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Dongmin Choi
 
Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Dongmin Choi
 
Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]Dongmin Choi
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Dongmin Choi
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Dongmin Choi
 
Augmix review [cdm]
Augmix review [cdm]Augmix review [cdm]
Augmix review [cdm]Dongmin Choi
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...Dongmin Choi
 
ICCV 2019 REVIEW [CDM]
ICCV 2019 REVIEW [CDM]ICCV 2019 REVIEW [CDM]
ICCV 2019 REVIEW [CDM]Dongmin Choi
 
[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...Dongmin Choi
 

Más de Dongmin Choi (12)

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
 
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationReview : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
 
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
 
Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]
 
Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]
 
Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
 
Augmix review [cdm]
Augmix review [cdm]Augmix review [cdm]
Augmix review [cdm]
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
ICCV 2019 REVIEW [CDM]
ICCV 2019 REVIEW [CDM]ICCV 2019 REVIEW [CDM]
ICCV 2019 REVIEW [CDM]
 
[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...
 

Último

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Último (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Review: You Only Look One-level Feature

  • 1. You Only Look One-Level Feature Qiang Chen et al. (CVPR 2021 accepted) Presenter: Dongmin Choi
  • 2. - Various FPN-based One-stage Detectors - What is the main component for these networks to achieve high performance? - Not multi-scale feature fusion but divide-and-conquer - Instead of adopting the complex feature pyramid, let’s utilize only one-level feature - You Only Look One-level Feature (YOLOF) Dilated Encoder Uniform Matching 2.5x faster than RetinaNet and 7x less training than DETR Abstract Revisiting Feature Pyramid Network (FPN) T.Y Lin et al., Feature Pyramid Networks for Object Detection, CVPR 2017
  • 3. - Essential component for both two-stage and one-stage detectors - Two main benefits: (1) multi-scale feature fusion: fusing multiple low- and high-res feature inputs (2) divide-and-conquer: detecting objects on different levels - Common belief: multi-scale feature fusion is important - Designing complex fusion methods manually or via NAS have been proposed actively - However, few studies on the function of divide-and-conquer Introduction Feature Pyramid Network (FPN) (1) (2)
  • 4. Introduction Multi-scale Feature Fusion vs. Divide-and-Conquer - Let’s decouple the multi-scale feature fusion and divide-and-conquer functionalities with RetinaNet
  • 5. Introduction Multi-scale Feature Fusion vs. Divide-and-Conquer FPN SiMo can achieve comparable performance with MiMo - Let’s decouple the multi-scale feature fusion and divide-and-conquer functionalities with RetinaNet
  • 6. Introduction Multi-scale Feature Fusion vs. Divide-and-Conquer Performance drops dramatically in MiSo and SiSo - Let’s decouple the multi-scale feature fusion and divide-and-conquer functionalities with RetinaNet
  • 7. Introduction Multi-scale Feature Fusion vs. Divide-and-Conquer - Let’s decouple the multi-scale feature fusion and divide-and-conquer functionalities with RetinaNet - SiMO can achieve comparable performance with MiMo - Performance drops dramatically in MiSo and SiSo - Two facts: (1) C5 feature carries sufficient context for detecting objects on various scales (2) divide-and-conquer >> multi-scale feature fusion - Let’s decouple the multi-scale feature fusion and divide-and-conquer functionalities with RetinaNet
  • 8. Introduction Divide-and-Conquer - divides the complex detection problem into several sub-problems by object scales; optimization problem - But memory burden, slow and complex - Can we solve this problem with SiSo encoder? (1) extract the multi-scale contexts for objects on various scales only with a single-level feature (2) solve the imbalance problem of positive anchors raised by the sparse anchors in the single feature
  • 9. Introduction Divide-and-Conquer - divides the complex detection problem into several sub-problems by object scales; optimization problem - But memory burden, slow and complex - Can we solve this problem with SiSo encoder? (1) extract the multi-scale contexts for objects on various scales only with a single-level feature → Dilated Encoder (2) solve the imbalance problem of positive anchors raised by the sparse anchors in the single feature → Uniform Matching YOLOF
  • 10. Introduction Contributions - show that the most significant benefits of FPS is its divide-and- conquer solution rather than multi-scale feature fusion - YOLOF, a simple and efficient baseline without using FPN - On COCO benchmark, YOLOF can achieve comparable results with a faster speed on GPUs with RetinaNet, DETR and YOLOv4
  • 11. Related Works Multi-level Feature Detectors - A conventional technique to employ multiple features - SSD1: first utilizes multi-scale features and performs detection on each scale for different scales objects - FPN2: semantic-riched feature pyramids by combining shallow features and deep features (dominant) 1W Liu at al., SSD: Single Shot MultiBox Detector, ECCV 2016 2T.Y Lin et al., Feature Pyramid Networks for Object Detection, CVPR 2017
  • 12. Related Works Single-level Feature Detectors - R-CNN series - YOLO and YOLOv2 - CornerNet and CenterNet: high-resolution feature map brings enormous memory cost - DETR: needs a long training schedule due to the totally anchor-free mechanism and transformer learning phase W Liu at al., SSD: Single Shot MultiBox Detector, ECCV 2016 Super fast but low performance
  • 13. Cost Analysis of MiMo Encoders Quantitative Study on the Cost of MiMo Encoders - Based on RetinaNet w/ ResNet-50 - Pipeline: Backbone + Encoder + Decoder - The MiMo encoder and decoder bring enormous memory burdens
  • 14. Method Replacing MiMo encoder with SiSo encoder - Two problems of SiSo encoders which cause performance drop: (1) Limited Scale Range → the range of scales matching to the C5 feature’s receptive field is limited → forbid the detection performance for objects across various scales (2) Imbalance Problem on Positive Anchors → the imbalance problem on positive anchors → raised by sparse anchors in the single-level feature
  • 15. Method 1. Limited Scale Range - In SiSo encoders, the single-level feature setting has a receptive field which is a constant. C5 A B - A: enlarging the receptive field by stacking standard and dilated conv. - B: combine the original scale range and A
  • 16. Method 1. Limited Scale Range - Dilated Encoder (SiSo) : the Projector and the Residual Blocks 1. 1 × 1 conv layer to reduce the channel dimension 2. 3 × 3 conv layer to refine semantic contexts Stacking four dilated residual blocks with different dilations rates (2, 4, 6, 8) to generate output features with multiple receptive fields
  • 17. Method 2. Imbalance Problem on Positive Anchors - Max-IoU matching : strategy to define positive anchors by measuring the IoUs between anchors and ground-truth boxes - In MiMo encoders, the anchors are pre-defined on multiple levels. So, Max-IoU matching generates a sufficient number of positive anchors - However, in SiSo encoder, the num- ber of anchors diminish extensively from about 100𝑘 to 5𝑘. Sparse an- chors raise a matching problem. X Zhou at al., Objects as Points, arXiv 2019
  • 18. Method 2. Imbalance Problem on Positive Anchors Large ground-truth boxes dominates the positive anchors Uniform Matching
  • 19. Method 2. Imbalance Problem on Positive Anchors - Uniform Matching : adopting the 𝑘 𝑛𝑒𝑎𝑟𝑒𝑠𝑡 anchor as positive anchors for each ground- truth box (top-𝑘 matching) - IoU thresholds : ignore large IoU (>0.7) negative anchors and small IoU (<0.15) positive anchors
  • 20. Method 3. YOLOF - A fast and straightforward framework with single-level feature - Three parts: the backbone, the encoder, and the decoder
  • 21. Method 3. YOLOF: Backbone - ResNet and ResNeXt series as the backbone - Pre-trained on ImageNet - Output: C5 feature map (2048 channels, output stride 32)
  • 22. Method 3. YOLOF: Encoder - Two projection layer (1x1 - 3x3) to reduce channels (2048 → 512) - Stacked residual blocks (1x1 - 3x3 w/ dilation - 1x1)
  • 23. Method 3. YOLOF: Decoder - Two parallel task-specific heads: the clf head and the reg head - Two minor modification (1) follow the design of FFN in DETR & different # of conv. (2) follow AutoAssign and implicit objectness prediction B Zhu at al., AutoAssign: Differentiable Label Assignment for Dense Object Detection, arXiv 2020
  • 24. Method 3. YOLOF: Decoder - Two parallel task-specific heads: the clf head and the reg head - Two minor modification (1) follow the design of FFN in DETR & different # of conv. (2) follow AutoAssign and implicit objectness prediction B Zhu at al., AutoAssign: Differentiable Label Assignment for Dense Object Detection, arXiv 2020 Objectness (FG or BG) without direct supervision
  • 25. Method 3. YOLOF: Decoder - Two parallel task-specific heads: the clf head and the reg head - Two minor modification (1) follow the design of FFN in DETR & different # of conv. (2) follow AutoAssign and implicit objectness prediction B Zhu at al., AutoAssign: Differentiable Label Assignment for Dense Object Detection, arXiv 2020 Objectness (FG or BG) without direct supervision # 3x3 Conv
  • 26. Method 3. YOLOF: Other Details Random Shift Operation - Sparse pre-defined anchors in YOLOF → decrease the match quality between anchors and ground-truth boxes - Random shift operation (max 32 pixels) to inject noises into the position - Increase the prob. of GT boxes matching with high-quality anchors - Restriction on the anchors’ center’s shift is also helpful (< 32 pixels)
  • 27. Experiments Implementation Details - COCO benchmark - Synchronized SGD over 8 GPUs with 64 batch-size - Initial learning rate: 0.12 - Following DETR, a smaller learning rate for the backbone (1/3) - Warmup iteration: 1500 - ‘1x’ schedule: a total of 22.5𝑘 iterations (lr decay at 15𝑘 and 20𝑘) - Inference: NMS (threshold = 0.6) - Other hyperparameters follow the settings of RetinaNet
  • 28. Experiments 1. Comparison with Previous Works: RetinaNet - RetinaNet+: implicit objectness and GN layer is adopted to RetinaNet - ‘1x’ models: single scale training (short side as 800 px, longer side as at most 1333) - A: multi-scale training - B: multi-scale training + multi-scale testing A B
  • 29. Experiments 1. Comparison with Previous Works: DETR - DETR achieved amazing performance only adopting a single C5 features. That means transformer layers can capture global dependencies. - YOLOF shows that a conventional local convolution layers can also achieve this goal!
  • 30. Experiments 1. Comparison with Previous Works: YOLOv4 - YOLOF-DC5: add dilations on the last stage of the backbone - YOLOF-DC5 can run 13% faster than YOLOv4 with a 0.8 mAP improvement
  • 31. Experiments 2. Ablation Experiments: Dilated Encoder and Uniform Matching
  • 32. Experiments 2. Ablation Experiments: Dilated Encoder and Uniform Matching - Dilated Encoder has a significant impact on large objects
  • 33. Experiments 2. Ablation Experiments: Dilated Encoder and Uniform Matching - Dilated Encoder has a significant impact on large objects - Uniform Matching has a significant impact on small and medium objects
  • 34. Experiments 2. Ablation Experiments: Dilated Encoder and Uniform Matching - Dilated Encoder has a significant impact on large objects - Uniform Matching has a significant impact on small and medium objects
  • 35. Experiments 2. Ablation Experiments: ResBlocks and Dilations
  • 36. Experiments 2. Ablation Experiments: Add Short Cut of Not
  • 38. Experiments 2. Ablation Experiments: Uniform Matching vs. Others
  • 39. Conclusion - The success of FPN is due to its divide-and-conquer solution to the optimization problem in dense object detection - YOLOF: a simple but highly efficient method without FPN - Comparable performance with RetinaNet and DETR only with a single-level feature