SlideShare a Scribd company logo
1 of 36
Download to read offline
Advanced Deep Learning based Object
Detection Methods
Improving Object Detection With One Line of Code
● Non-Maximum Suppression is a greedy
process.
○ It worked well enough in 2007 but it doesn’t
anymore.
● High scoring detections can be suppressed
just as low scoring detections.
○ Overlap with stronger detection is the only
criteria.
● Should one detection completely suppress
another detection, or simply reduce its
confidence?
Improving Object Detection With One Line of Code
● NMS:
● Linear Soft-NMS:
● Gaussian Soft-NMS:
○ Linear Soft-NMS is not continuous in terms of
overlap and a sudden penalty is applied when a
NMS threshold is reached.
○ Instead we can use a continuous function:
Improving Object Detection With One Line of Code
Improving Object Detection With One Line of Code
Learning Non-Maximum Suppression
● Object detectors are mostly trained
end-to-end, except for the NMS.
○ NMS is still fully hand-crafted, and forces a
trade-off between recall and precision.
● Training loss is not evaluation loss.
○ Training is performed without NMS
○ During evaluation, multiple detections for same
object count as false positives.
● Instead, train the network to include the
suppression process.
○ Only output one bounding box per object.
○ Learn how to handle close objects.
Learning Non-Maximum Suppression
● Additional blocks that:
○ Encode pairwise information.
○ For each detection, pool information from all
pairings.
○ Update feature vector.
○ Repeat.
● New loss:
○ Only one positive candidate per object.
○ Instead of the current practice to take all
objects with IoU>50%
Learning Non-Maximum Suppression
Learning Non-Maximum Suppression
● Multi-scale object detection using image pyramid
○ Predict different scales by applying same model at different image resolutions.
● Classic method.
● But also, in OverFeat.
● Slow. Requires multiple evaluation of the same model.
Multi-Scale Object Detection
Multi-Scale Object Detection
● Predict multiple scale of objects using a single feature map.
● Same as Faster R-CNN.
● Fast
● Single model (same in training as in testing).
● Bad features resolution for small objects.
● Predict different object sizes at different feature scales.
● Same as SSD.
● Good features resolution for small objects
● But features are much weaker than in deeper layers.
Multi-Scale Object Detection
● Single model (same in training as in testing).
● Good features resolution for small objects.
● Strong features in all layers.
● Almost no overhead over SSD (= Fast).
Feature Pyramid Network (FPN)
Feature Pyramid Network (FPN)
Feature Pyramid Network (FPN)
● How important is top-down enrichment?
● How important are lateral connections?
● How important are pyramid representations?
Feature Pyramid Network (FPN)
● How important is top-down enrichment?
● How important are lateral connections?
● How important are pyramid representations?
Focal Loss for Dense Object Detection
● Can we train a single stage detector to be as accurate as two stage detectors?
● Contributions:
○ RetinaNet: Single stage object detector based on FPN backbone.
○ New loss.
Focal Loss for Dense Object Detection
● Class unbalance is an important issue for object detection.
● Previous solutions:
○ Random resampling at 1:3 ratio.
○ Hard negative resampling at 1:3 ratio.
● Both solutions means that at each step, we only a few samples actually matters
to the loss function.
● Instead, include all samples but use different weight for each class.
○ Regular cross entropy:
○ Weighted cross entropy:
● Using weight CE as baseline:
○ Can we do better?
○ Can we use different weight for each sample?
● Focal loss:
● Every sample is weighted according to its error.
○ We want to focus on samples which are
mislabeled.
Focal Loss for Dense Object Detection
● Different parameters for RetinaNet
Focal Loss for Dense Object Detection
● Comparison with online hard negative mining
Focal Loss for Dense Object Detection
● Accuracy/speed trade-offs
Focal Loss for Dense Object Detection
● Benchmark results
Focal Loss for Dense Object Detection
Also Read:
Deformable Convolutional Networks
https://arxiv.org/abs/1703.06211
YouTube Videos
● CS231n
○ Lecture 11 - Detection and segmentation https://youtu.be/nDPWywWRIRo
● Deep Learning for Objects and Scenes (CVPR 2017 Workshop)
○ Lecture 1: Learning Deep Representations for Visual Recognition, by Kaiming He
https://youtu.be/jHv37mKAhV4
○ Lecture 2: Deep Learning for Instance-level Object Understanding, by Ross Girshick
https://youtu.be/jHv37mKAhV4?t=39m4s
Looking for brilliant researchers
cv@brodmann17.com /
amir@brodmann17.com
Computer Vision Tasks
Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
Mask R-CNN
● Instance segmentation with pose
estimation for people.
● Extends faster R-CNN by adding new
branch for the instance mask task.
● Pose estimation can be added by simply
adding an additional branch.
● SOTA accuracy on detection, segmentation
and pose estimation at 5 FPS on GPU.
● https://arxiv.org/abs/1703.06870
● Girshick won young researcher award.
Mask R-CNN
Mask R-CNN
Mask R-CNN
Mask R-CNN
● RoiPool
○ Quantization breaks pixel-to-pixel alignment
○ Too coarse and not good for fine spatial
information required for mask.
● RoiAlign
○ Bilinearly sample the proposal region and avoid
the quantization.
○ Smoothly normalize features and predictions
into coordinate frame free of scale and aspect
ratio
Mask R-CNN
Mask R-CNN
● Backbone architecture
○ ResNet
○ ResNeXt
○ FPN
● Mask representation
○ FC vs. Convolutional
○ Multinomial vs. Independent Masks: softmax
vs. sigmoid
○ Class-Specific vs. Class-Agnostic Masks:
almost same accuracy
● Multi-task learning
○ Mask task improves object detection accuracy.
○ Keypoint task reduces object detection
accuracy.
Mask R-CNN
● Pose estimation
○ Simply add an additional branch.
○ Model a keypoint’s location as a one-hot mask,
and adopt Mask R-CNN to predict K masks.
○ Experiments are mainly to demonstrate the
generality of the Mask R-CNN framework.
○ RoiAlign improves this task’s accuracy as well.
Looking for brilliant researchers
cv@brodmann17.com

More Related Content

What's hot

PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 

What's hot (20)

Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Self supervised learning
Self supervised learningSelf supervised learning
Self supervised learning
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
 
Yolo
YoloYolo
Yolo
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 

Similar to Advanced deep learning based object detection methods

Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...
Lu Jiang
 

Similar to Advanced deep learning based object detection methods (20)

Fast methods for deep learning based object detection
Fast methods for deep learning based object detectionFast methods for deep learning based object detection
Fast methods for deep learning based object detection
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides
 
Cvpr 2017 Summary Meetup
Cvpr 2017 Summary MeetupCvpr 2017 Summary Meetup
Cvpr 2017 Summary Meetup
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet I
 
Paper review
Paper reviewPaper review
Paper review
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Александр Заричковый "Faster than real-time face detection"
Александр Заричковый "Faster than real-time face detection"Александр Заричковый "Faster than real-time face detection"
Александр Заричковый "Faster than real-time face detection"
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
fpres
fpresfpres
fpres
 
Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Deep Neural Networks Presentation
Deep Neural Networks PresentationDeep Neural Networks Presentation
Deep Neural Networks Presentation
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 

Advanced deep learning based object detection methods

  • 1. Advanced Deep Learning based Object Detection Methods
  • 2. Improving Object Detection With One Line of Code ● Non-Maximum Suppression is a greedy process. ○ It worked well enough in 2007 but it doesn’t anymore. ● High scoring detections can be suppressed just as low scoring detections. ○ Overlap with stronger detection is the only criteria. ● Should one detection completely suppress another detection, or simply reduce its confidence?
  • 3. Improving Object Detection With One Line of Code ● NMS: ● Linear Soft-NMS: ● Gaussian Soft-NMS: ○ Linear Soft-NMS is not continuous in terms of overlap and a sudden penalty is applied when a NMS threshold is reached. ○ Instead we can use a continuous function:
  • 4. Improving Object Detection With One Line of Code
  • 5. Improving Object Detection With One Line of Code
  • 6. Learning Non-Maximum Suppression ● Object detectors are mostly trained end-to-end, except for the NMS. ○ NMS is still fully hand-crafted, and forces a trade-off between recall and precision. ● Training loss is not evaluation loss. ○ Training is performed without NMS ○ During evaluation, multiple detections for same object count as false positives. ● Instead, train the network to include the suppression process. ○ Only output one bounding box per object. ○ Learn how to handle close objects.
  • 7. Learning Non-Maximum Suppression ● Additional blocks that: ○ Encode pairwise information. ○ For each detection, pool information from all pairings. ○ Update feature vector. ○ Repeat. ● New loss: ○ Only one positive candidate per object. ○ Instead of the current practice to take all objects with IoU>50%
  • 10. ● Multi-scale object detection using image pyramid ○ Predict different scales by applying same model at different image resolutions. ● Classic method. ● But also, in OverFeat. ● Slow. Requires multiple evaluation of the same model. Multi-Scale Object Detection
  • 11. Multi-Scale Object Detection ● Predict multiple scale of objects using a single feature map. ● Same as Faster R-CNN. ● Fast ● Single model (same in training as in testing). ● Bad features resolution for small objects.
  • 12. ● Predict different object sizes at different feature scales. ● Same as SSD. ● Good features resolution for small objects ● But features are much weaker than in deeper layers. Multi-Scale Object Detection
  • 13. ● Single model (same in training as in testing). ● Good features resolution for small objects. ● Strong features in all layers. ● Almost no overhead over SSD (= Fast). Feature Pyramid Network (FPN)
  • 15. Feature Pyramid Network (FPN) ● How important is top-down enrichment? ● How important are lateral connections? ● How important are pyramid representations?
  • 16. Feature Pyramid Network (FPN) ● How important is top-down enrichment? ● How important are lateral connections? ● How important are pyramid representations?
  • 17. Focal Loss for Dense Object Detection ● Can we train a single stage detector to be as accurate as two stage detectors? ● Contributions: ○ RetinaNet: Single stage object detector based on FPN backbone. ○ New loss.
  • 18. Focal Loss for Dense Object Detection ● Class unbalance is an important issue for object detection. ● Previous solutions: ○ Random resampling at 1:3 ratio. ○ Hard negative resampling at 1:3 ratio. ● Both solutions means that at each step, we only a few samples actually matters to the loss function. ● Instead, include all samples but use different weight for each class. ○ Regular cross entropy: ○ Weighted cross entropy:
  • 19. ● Using weight CE as baseline: ○ Can we do better? ○ Can we use different weight for each sample? ● Focal loss: ● Every sample is weighted according to its error. ○ We want to focus on samples which are mislabeled. Focal Loss for Dense Object Detection
  • 20. ● Different parameters for RetinaNet Focal Loss for Dense Object Detection
  • 21. ● Comparison with online hard negative mining Focal Loss for Dense Object Detection
  • 22. ● Accuracy/speed trade-offs Focal Loss for Dense Object Detection
  • 23. ● Benchmark results Focal Loss for Dense Object Detection
  • 24. Also Read: Deformable Convolutional Networks https://arxiv.org/abs/1703.06211
  • 25. YouTube Videos ● CS231n ○ Lecture 11 - Detection and segmentation https://youtu.be/nDPWywWRIRo ● Deep Learning for Objects and Scenes (CVPR 2017 Workshop) ○ Lecture 1: Learning Deep Representations for Visual Recognition, by Kaiming He https://youtu.be/jHv37mKAhV4 ○ Lecture 2: Deep Learning for Instance-level Object Understanding, by Ross Girshick https://youtu.be/jHv37mKAhV4?t=39m4s
  • 26. Looking for brilliant researchers cv@brodmann17.com / amir@brodmann17.com
  • 27. Computer Vision Tasks Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
  • 28. Mask R-CNN ● Instance segmentation with pose estimation for people. ● Extends faster R-CNN by adding new branch for the instance mask task. ● Pose estimation can be added by simply adding an additional branch. ● SOTA accuracy on detection, segmentation and pose estimation at 5 FPS on GPU. ● https://arxiv.org/abs/1703.06870 ● Girshick won young researcher award.
  • 32. Mask R-CNN ● RoiPool ○ Quantization breaks pixel-to-pixel alignment ○ Too coarse and not good for fine spatial information required for mask. ● RoiAlign ○ Bilinearly sample the proposal region and avoid the quantization. ○ Smoothly normalize features and predictions into coordinate frame free of scale and aspect ratio
  • 34. Mask R-CNN ● Backbone architecture ○ ResNet ○ ResNeXt ○ FPN ● Mask representation ○ FC vs. Convolutional ○ Multinomial vs. Independent Masks: softmax vs. sigmoid ○ Class-Specific vs. Class-Agnostic Masks: almost same accuracy ● Multi-task learning ○ Mask task improves object detection accuracy. ○ Keypoint task reduces object detection accuracy.
  • 35. Mask R-CNN ● Pose estimation ○ Simply add an additional branch. ○ Model a keypoint’s location as a one-hot mask, and adopt Mask R-CNN to predict K masks. ○ Experiments are mainly to demonstrate the generality of the Mask R-CNN framework. ○ RoiAlign improves this task’s accuracy as well.
  • 36. Looking for brilliant researchers cv@brodmann17.com