SlideShare una empresa de Scribd logo
1 de 61
Descargar para leer sin conexión
Gang Yu
旷 视 研 究 院
Object Detection in Recent 3 Years
Beyond RetinaNet and Mask R-CNN
Schedule of Tutorial
• Lecture 1: Beyond RetinaNet and Mask R-CNN (Gang Yu)
• Lecture 2: AutoML for Object Detection (Xiangyu Zhang)
• Lecture 3: Finegrained Visual Analysis (Xiu-shen Wei)
Outline
• Introduction to Object Detection
• Modern Object detectors
• One Stage detector vs Two-stage detector
• Challenges
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-Grained
• Conclusion
Outline
• Introduction to Object Detection
• Modern Object detectors
• One Stage detector vs Two-stage detector
• Challenges
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-Grained
• Conclusion
What is object detection?
What is object detection?
Detection - Evaluation Criteria
Average Precision (AP) and mAP
Figures are from wikipedia
Detection - Evaluation Criteria
mmAP
Figures are from http://cocodataset.org
How to perform a detection?
• Sliding window: enumerate all the windows (up to millions of windows)
• VJ detector: cascade chain
• Fully Convolutional network
• shared computation
Robust Real-time Object Detection; Viola, Jones; IJCV 2001
http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf
General Detection Before Deep Learning
• Feature + classifier
• Feature
• Haar Feature
• HOG (Histogram of Gradient)
• LBP (Local Binary Pattern)
• ACF (Aggregated Channel Feature)
• …
• Classifier
• SVM
• Bootsing
• Random Forest
Traditional Hand-crafted Feature: HoG
Traditional Hand-crafted Feature: HoG
General Detection Before Deep Learning
Traditional Methods
• Pros
• Efficient to compute (e.g., HAAR, ACF) on CPU
• Easy to debug, analyze the bad cases
• reasonable performance on limited training data
• Cons
• Limited performance on large dataset
• Hard to be accelerated by GPU
Deep Learning for Object Detection
Based on the whether following the “proposal and refine”
• One Stage
• Example: Densebox, YOLO (YOLO v2), SSD, Retina Net
• Keyword: Anchor, Divide and conquer, loss sampling
• Two Stage
• Example: RCNN (Fast RCNN, Faster RCNN), RFCN, FPN,
MaskRCNN
• Keyword: speed, performance
A bit of History
Image
Feature
Extractor
classification
localization
(bbox)
One stage detector
Densebox (2015) UnitBox (2016) EAST (2017)
YOLO (2015) Anchor Free
Anchor importedYOLOv2 (2016)
SSD (2015)
RON(2017)
RetinaNet(2017)
DSSD (2017)
two stages detector
Image
Feature
Extractor
classification
localization
(bbox)
Proposal
classification
localization
(bbox)
Refine
RCNN (2014) Fast RCNN(2015)
Faster RCNN (2015)
RFCN (2016)
MultiBox(2014)
RFCN++ (2017)
FPN (2017)
Mask RCNN (2017)
OverFeat(2013)
Outline
• Introduction to Object Detection
• Modern Object detectors
• One Stage detector vs Two-stage detector
• Challenges
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-Grained
• Conclusion
Modern Object detectors
Backbone Head
• Modern object detectors
• RetinaNet
• f1-f7 for backbone, f3-f7 with 4 convs for head
• FPN with ROIAlign
• f1-f6 for backbone, two fcs for head
• Recall vs localization
• One stage detector: Recall is high but compromising the localization ability
• Two stage detector: Strong localization ability
Postprocess
NMS
One Stage detector: RetinaNet
• FPN Structure
• Focal loss
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
One Stage detector: RetinaNet
• FPN Structure
• Focal loss
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
Two-Stage detector: FPN/Mask R-CNN
• FPN Structure
• ROIAlign
Mask R-CNN, He etc, ICCV 2017 Best paper
What is next for object detection?
• The pipeline seems to be mature
• There still exists a large gap between existing state-of-arts and product
requirements
• The devil is in the detail
Outline
• Introduction to Object Detection
• Modern Object detectors
• One Stage detector vs Two-stage detector
• Challenges
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-Grained
• Conclusion
Challenges Overview
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-grained
Backbone Head Postprocess
NMS
Challenges - Backbone
• Backbone network is designed for classification task but not for
localization task
• Receptive Field vs Spatial resolution
• Only f1-f5 is pretrained but randomly initializing f6 and f7 (if applicable)
Backbone - DetNet
• DetNet: A Backbone network for Object Detection, Li etc, 2018,
https://arxiv.org/pdf/1804.06215.pdf
Backbone - DetNet
Backbone - DetNet
Backbone - DetNet
Backbone - DetNet
Backbone - DetNet
Challenges - Head
• Speed is significantly improved for the two-stage detector
• RCNN - > Fast RCNN -> Faster RCNN - > RFCN
• How to obtain efficient speed as one stage detector like YOLO, SSD?
• Small Backbone
• Light Head
Head – Light head RCNN
• Light-Head R-CNN: In Defense of Two-Stage Object Detector, 2017,
https://arxiv.org/pdf/1711.07264.pdf
Code: https://github.com/zengarden/light_head_rcnn
Head – Light head RCNN
• Backbone
• L: Resnet101
• S: Xception145
• Thin Feature map
• L:C_{mid} = 256
• S: C_{mid} =64
• C_{out} = 10 * 7 * 7
• R-CNN subnet
• A fc layer is connected to the PS ROI pool/Align
Head – Light head RCNN
Head – Light head RCNN
Head – Light head RCNN
• Mobile Version
• ThunderNet: Towards Real-time Generic Object Detection, Qin etc, Arxiv
2019
• https://arxiv.org/abs/1903.11752
Pretraining – Objects365
• ImageNet pretraining is usually employed for backbone training
• Training from Scratch
• Scratch Det claims GN/BN is important
• Rethinking ImageNet Pretraining validates that training time is important
Pretraining – Objects365
• Objects365 Dataset
Pretraining – Objects365
• Pretraining with Objects365 vs ImageNet vs from Sctratch
Pretraining – Objects365
• Pretraining on Backbone or Pretraining on both backbone and head
Pretraining – Objects365
• Results on VOC Detection & VOC Segmentation
Pretraining – Objects365
• Summary
• Pretraining is important to reduce the training time
• Pretraining with a large dataset is beneficial for the performance
Challenges - Scale
• Scale variations is extremely large for object detection
Challenges - Scale
• Scale variations is extremely large for object detection
• Previous works
• Divide and Conquer: SSD, DSSD, RON, FPN, …
• Limited Scale variation
• Scale Normalization for Image Pyramids, Singh etc, CVPR2018
• Slow inference speed
• How to address extremely large scale variation without compromising
inference speed?
Scale - SFace
• SFace: An Efficient Network for Face Detection in Large Scale Variations,
2018, http://cn.arxiv.org/pdf/1804.06559.pdf
• Anchor-based:
• Good localization for the scales which are covered by anchors
• Difficult to address all the scale ranges of faces
• Anchor-free:
• Able to cover various face scales
• Not good for the localization ability
Scale - SFace
Scale - SFace
Scale - SFace
Scale - SFace
• Summary:
• Integrate anchor-based and anchor-free for the scale issue
• A new benchmark for face detection with large scale variations: 4K Face
Challenges - Batchsize
• Small mini-batchsize for general object detection
• 2 for R-CNN, Faster RCNN
• 16 for RetinaNet, Mask RCNN
• Problem with small mini-batchsize
• Long training time
• Insufficient BN statistics
• Inbalanced pos/neg ratio
Batchsize – MegDet
• MegDet: A Large Mini-Batch Object Detector, CVPR2018,
https://arxiv.org/pdf/1711.07240.pdf
Batchsize – MegDet
• Techniques
• Learning rate warmup
• Cross-GPU Batch Normalization
Challenges - Crowd
• NMS is a post-processing step to eliminate multiple responses on one object
instance
• Reasonable for mild crowdness like COCO and VOC
• Will Fail in the case when the objects are in a crowd
Challenges - Crowd
• A few works have been devoted to this topic
• Softnms, Bodla etc, ICCV 2017, http://www.cs.umd.edu/~bharat/snms.pdf
• Relation Networks, Hu etc, CVPR 2018,
https://arxiv.org/pdf/1711.11575.pdf
• Lacking a good benchmark for evaluation in the literature
Crowd - CrowdHuman
• CrowdHuman: A Benchmark for Detecting Human in a Crowd, 2018,
https://arxiv.org/pdf/1805.00123.pdf, http://www.crowdhuman.org/
• A benchmark with Head, Visible Human, Full body bounding-box
• Generalization ability for other head/pedestrian datasets
• Crowdness
Crowd - CrowdHuman
Crowd-CrowdHuman
Crowd-CrowdHuman
• Generalization
• Head
• Pedestrian
• COCO
Conclusion
• The task of object detection is still far from solved
• Details are important to further improve the performance
• Backbone
• Head
• Pretraining
• Scale
• Batchsize
• Crowd
• The improvement of object detection will be a significantly boost for the
computer vision industry
广告部分
• Megvii Detection 知乎专栏
Email: yugang@megvii.com
Object Detection Beyond Mask R-CNN and RetinaNet I

Más contenido relacionado

La actualidad más candente

Feature pyramid networks for object detection
Feature pyramid networks for object detection Feature pyramid networks for object detection
Feature pyramid networks for object detection
heedaeKwon
 
Yann le cun
Yann le cunYann le cun
Yann le cun
Yandex
 

La actualidad más candente (20)

Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
Feature pyramid networks for object detection
Feature pyramid networks for object detection Feature pyramid networks for object detection
Feature pyramid networks for object detection
 
Computer Vision.pptx
Computer Vision.pptxComputer Vision.pptx
Computer Vision.pptx
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
Yann le cun
Yann le cunYann le cun
Yann le cun
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Deep learning for person re-identification
Deep learning for person re-identificationDeep learning for person re-identification
Deep learning for person re-identification
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Image Segmentation
 Image Segmentation Image Segmentation
Image Segmentation
 

Similar a Object Detection Beyond Mask R-CNN and RetinaNet I

Similar a Object Detection Beyond Mask R-CNN and RetinaNet I (20)

Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
OBDPC 2022
OBDPC 2022OBDPC 2022
OBDPC 2022
 
How static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codeHow static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ code
 
Information Exploitation at BBN
Information Exploitation at BBNInformation Exploitation at BBN
Information Exploitation at BBN
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 

Más de Wanjin Yu

Más de Wanjin Yu (15)

Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
 
Intelligent Multimedia Recommendation
Intelligent Multimedia RecommendationIntelligent Multimedia Recommendation
Intelligent Multimedia Recommendation
 
Architecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIArchitecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks II
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learning
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet II
 
Visual Search and Question Answering II
Visual Search and Question Answering IIVisual Search and Question Answering II
Visual Search and Question Answering II
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 

Último

Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Monica Sydney
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
F
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
ydyuyu
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 

Último (20)

"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 

Object Detection Beyond Mask R-CNN and RetinaNet I

  • 1. Gang Yu 旷 视 研 究 院 Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN
  • 2. Schedule of Tutorial • Lecture 1: Beyond RetinaNet and Mask R-CNN (Gang Yu) • Lecture 2: AutoML for Object Detection (Xiangyu Zhang) • Lecture 3: Finegrained Visual Analysis (Xiu-shen Wei)
  • 3. Outline • Introduction to Object Detection • Modern Object detectors • One Stage detector vs Two-stage detector • Challenges • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-Grained • Conclusion
  • 4. Outline • Introduction to Object Detection • Modern Object detectors • One Stage detector vs Two-stage detector • Challenges • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-Grained • Conclusion
  • 5. What is object detection?
  • 6. What is object detection?
  • 7. Detection - Evaluation Criteria Average Precision (AP) and mAP Figures are from wikipedia
  • 8. Detection - Evaluation Criteria mmAP Figures are from http://cocodataset.org
  • 9. How to perform a detection? • Sliding window: enumerate all the windows (up to millions of windows) • VJ detector: cascade chain • Fully Convolutional network • shared computation Robust Real-time Object Detection; Viola, Jones; IJCV 2001 http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf
  • 10. General Detection Before Deep Learning • Feature + classifier • Feature • Haar Feature • HOG (Histogram of Gradient) • LBP (Local Binary Pattern) • ACF (Aggregated Channel Feature) • … • Classifier • SVM • Bootsing • Random Forest
  • 13. General Detection Before Deep Learning Traditional Methods • Pros • Efficient to compute (e.g., HAAR, ACF) on CPU • Easy to debug, analyze the bad cases • reasonable performance on limited training data • Cons • Limited performance on large dataset • Hard to be accelerated by GPU
  • 14. Deep Learning for Object Detection Based on the whether following the “proposal and refine” • One Stage • Example: Densebox, YOLO (YOLO v2), SSD, Retina Net • Keyword: Anchor, Divide and conquer, loss sampling • Two Stage • Example: RCNN (Fast RCNN, Faster RCNN), RFCN, FPN, MaskRCNN • Keyword: speed, performance
  • 15. A bit of History Image Feature Extractor classification localization (bbox) One stage detector Densebox (2015) UnitBox (2016) EAST (2017) YOLO (2015) Anchor Free Anchor importedYOLOv2 (2016) SSD (2015) RON(2017) RetinaNet(2017) DSSD (2017) two stages detector Image Feature Extractor classification localization (bbox) Proposal classification localization (bbox) Refine RCNN (2014) Fast RCNN(2015) Faster RCNN (2015) RFCN (2016) MultiBox(2014) RFCN++ (2017) FPN (2017) Mask RCNN (2017) OverFeat(2013)
  • 16. Outline • Introduction to Object Detection • Modern Object detectors • One Stage detector vs Two-stage detector • Challenges • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-Grained • Conclusion
  • 17. Modern Object detectors Backbone Head • Modern object detectors • RetinaNet • f1-f7 for backbone, f3-f7 with 4 convs for head • FPN with ROIAlign • f1-f6 for backbone, two fcs for head • Recall vs localization • One stage detector: Recall is high but compromising the localization ability • Two stage detector: Strong localization ability Postprocess NMS
  • 18. One Stage detector: RetinaNet • FPN Structure • Focal loss Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
  • 19. One Stage detector: RetinaNet • FPN Structure • Focal loss Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
  • 20. Two-Stage detector: FPN/Mask R-CNN • FPN Structure • ROIAlign Mask R-CNN, He etc, ICCV 2017 Best paper
  • 21. What is next for object detection? • The pipeline seems to be mature • There still exists a large gap between existing state-of-arts and product requirements • The devil is in the detail
  • 22. Outline • Introduction to Object Detection • Modern Object detectors • One Stage detector vs Two-stage detector • Challenges • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-Grained • Conclusion
  • 23. Challenges Overview • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-grained Backbone Head Postprocess NMS
  • 24. Challenges - Backbone • Backbone network is designed for classification task but not for localization task • Receptive Field vs Spatial resolution • Only f1-f5 is pretrained but randomly initializing f6 and f7 (if applicable)
  • 25. Backbone - DetNet • DetNet: A Backbone network for Object Detection, Li etc, 2018, https://arxiv.org/pdf/1804.06215.pdf
  • 31. Challenges - Head • Speed is significantly improved for the two-stage detector • RCNN - > Fast RCNN -> Faster RCNN - > RFCN • How to obtain efficient speed as one stage detector like YOLO, SSD? • Small Backbone • Light Head
  • 32. Head – Light head RCNN • Light-Head R-CNN: In Defense of Two-Stage Object Detector, 2017, https://arxiv.org/pdf/1711.07264.pdf Code: https://github.com/zengarden/light_head_rcnn
  • 33. Head – Light head RCNN • Backbone • L: Resnet101 • S: Xception145 • Thin Feature map • L:C_{mid} = 256 • S: C_{mid} =64 • C_{out} = 10 * 7 * 7 • R-CNN subnet • A fc layer is connected to the PS ROI pool/Align
  • 34. Head – Light head RCNN
  • 35. Head – Light head RCNN
  • 36. Head – Light head RCNN • Mobile Version • ThunderNet: Towards Real-time Generic Object Detection, Qin etc, Arxiv 2019 • https://arxiv.org/abs/1903.11752
  • 37. Pretraining – Objects365 • ImageNet pretraining is usually employed for backbone training • Training from Scratch • Scratch Det claims GN/BN is important • Rethinking ImageNet Pretraining validates that training time is important
  • 38. Pretraining – Objects365 • Objects365 Dataset
  • 39. Pretraining – Objects365 • Pretraining with Objects365 vs ImageNet vs from Sctratch
  • 40. Pretraining – Objects365 • Pretraining on Backbone or Pretraining on both backbone and head
  • 41. Pretraining – Objects365 • Results on VOC Detection & VOC Segmentation
  • 42. Pretraining – Objects365 • Summary • Pretraining is important to reduce the training time • Pretraining with a large dataset is beneficial for the performance
  • 43. Challenges - Scale • Scale variations is extremely large for object detection
  • 44. Challenges - Scale • Scale variations is extremely large for object detection • Previous works • Divide and Conquer: SSD, DSSD, RON, FPN, … • Limited Scale variation • Scale Normalization for Image Pyramids, Singh etc, CVPR2018 • Slow inference speed • How to address extremely large scale variation without compromising inference speed?
  • 45. Scale - SFace • SFace: An Efficient Network for Face Detection in Large Scale Variations, 2018, http://cn.arxiv.org/pdf/1804.06559.pdf • Anchor-based: • Good localization for the scales which are covered by anchors • Difficult to address all the scale ranges of faces • Anchor-free: • Able to cover various face scales • Not good for the localization ability
  • 49. Scale - SFace • Summary: • Integrate anchor-based and anchor-free for the scale issue • A new benchmark for face detection with large scale variations: 4K Face
  • 50. Challenges - Batchsize • Small mini-batchsize for general object detection • 2 for R-CNN, Faster RCNN • 16 for RetinaNet, Mask RCNN • Problem with small mini-batchsize • Long training time • Insufficient BN statistics • Inbalanced pos/neg ratio
  • 51. Batchsize – MegDet • MegDet: A Large Mini-Batch Object Detector, CVPR2018, https://arxiv.org/pdf/1711.07240.pdf
  • 52. Batchsize – MegDet • Techniques • Learning rate warmup • Cross-GPU Batch Normalization
  • 53. Challenges - Crowd • NMS is a post-processing step to eliminate multiple responses on one object instance • Reasonable for mild crowdness like COCO and VOC • Will Fail in the case when the objects are in a crowd
  • 54. Challenges - Crowd • A few works have been devoted to this topic • Softnms, Bodla etc, ICCV 2017, http://www.cs.umd.edu/~bharat/snms.pdf • Relation Networks, Hu etc, CVPR 2018, https://arxiv.org/pdf/1711.11575.pdf • Lacking a good benchmark for evaluation in the literature
  • 55. Crowd - CrowdHuman • CrowdHuman: A Benchmark for Detecting Human in a Crowd, 2018, https://arxiv.org/pdf/1805.00123.pdf, http://www.crowdhuman.org/ • A benchmark with Head, Visible Human, Full body bounding-box • Generalization ability for other head/pedestrian datasets • Crowdness
  • 59. Conclusion • The task of object detection is still far from solved • Details are important to further improve the performance • Backbone • Head • Pretraining • Scale • Batchsize • Crowd • The improvement of object detection will be a significantly boost for the computer vision industry
  • 60. 广告部分 • Megvii Detection 知乎专栏 Email: yugang@megvii.com