SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Object Detection
Using R-CNN Deep Learning
Framework
Nader Karimi Bavandpour (nader.karimi.b@gmail.com)
Summer School of Intelligent Learning
IPM, 2019
Table of Content
● Machine Learning Key Point: Inductive Bias
● From Classification to Instance Segmentation
● Region Proposal
● R-CNN Framework
2
Machine Learning Key Point:
Inductive Bias
3
Definition of Inductive Bias
The kind of necessary assumptions about the nature of the target function are subsumed in the phrase
inductive bias.
- Wikipedia
Every machine learning algorithm with any ability to generalize beyond the training data that it sees has
some type of inductive bias.
- StackOverflow
4
Examples of Inductive Bias
● Maximum Margin: Maximize the width of the boundary between two classes
● Nearest Neighbors: Most of the cases in a small neighborhood in feature space belong to the same
class
● Minimum Cross-Validation Error: Select the hypothesis with the lowest cross-validation error
5
○ Although cross-validation may seem to be free of bias,
the "no free lunch" theorems show that cross-validation must be biased.
● Locality of Receptive Field: Use convolutional layers instead of fc layers
From Classification to
Instance Segmentation
6
Object Classification
7
● Image Category Recognition
● Input: image
● Output: Class label
● Types:
○ Binary/Multi-class Classification
○ Multiclass Classification
○ Binary/Multi-label Classification
Object Localization
8
● Object Bounding Box Recognition
● Input: image
● Output: Box in the image (x, y, w, h)
Semantic Segmentation
9
● Pixel Category Recognition
● Input: Image
● Output: Category-aware pixel labels
Instance Segmentation
10
● Instance-Aware Pixel Category Recognition
● Input: Image
● Output: Instance-aware pixel labels
Intersection Over Union (IoU)
Important measurement for object localization
Used in both training and evaluation
11
Datasets: ImageNet Challenge
● 1000 Classes
● Each image has 1 class with at least one bounding box
● About 800 Training images per class
● Algorithm produces 5 (class + bounding box) guesses
● Correct if at least one of guess has correct class and bounding box
at least 50% intersection over union.
12
13
Region Proposal
14
Selective Search for Region Proposal
● A region proposal algorithm used in object detection
● Designed to be fast with a very high recall
● Based on computing hierarchical grouping of similar regions based on
color, texture, size and shape compatibility
15
Selective Search for Region Proposal
● First takes an image as input
16
Selective Search for Region Proposal
● Generates initial sub-segmentations
17
Selective Search for Region Proposal
● Combines the similar regions to form a larger region
○ based on color similarity, texture similarity, size
similarity, and shape compatibility
● Finally, these regions produce the Regions of
Interest (RoI)
18
R-CNN Framework
19
R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Adds Object Boundary Prediction to R-CNN
20
R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Adds Object Boundary Prediction to R-CNN
21
R-CNN
22
R-CNN
23
R-CNN
24
R-CNN
25
R-CNN
26
Problems with R-CNN
● Extracting 2,000 regions for each image based on selective search
● Extracting features using CNN for every image region. Suppose we have N images, then the number of
CNN features will be N*2,000
● The entire process of object detection using R-CNN has three models:
○ CNN for feature extraction
○ Linear SVM classifier for identifying objects
○ Regression model for tightening the bounding boxes
27
R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN
28
Fast RCNN
● Selective search as a proposal method
to find the Regions of Interest is slow
● Takes around 2 seconds per image to
detect objects, which is much better
compared to RCNN
29
R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN
30
Faster RCNN
● Region Proposal Network (RPN) for region proposal
○ Input: Image of any size
○ Output: A set of rectangular object proposals and objectness
scores
○ Related to attention mechanisms
31
Faster RCNN
● Feature maps from CNN are passed to the
Region Proposal Network (RPN)
● k Anchor boxes of different shapes are
generated using a sliding window in the RPN
● Anchor boxes are fixed sized boundary boxes
that are placed throughout the image and
have different shapes and size
32
Faster RCNN
● For each anchor, RPN predicts two things:
○ The first is the probability that an anchor is an object (it does not consider which
class the object belongs to)
○ Second is the bounding box regressor for adjusting the anchors to better fit the
object
33
R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN
34
Mask R-CNN
● Extends Faster R-CNN by adding a
branch for predicting an object mask in
parallel with the existing branch for
bounding box recognition
35
Mask R-CNN
● Defines a multi-task loss on each sampled RoI
as:
L = L_cls + L_box + L_mask
36
Mask R-CNN
37
Thanks for Your Attention!
38

Más contenido relacionado

La actualidad más candente

Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Universitat Politècnica de Catalunya
 

La actualidad más candente (20)

Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Faster rcnn
Faster rcnnFaster rcnn
Faster rcnn
 
YOLO
YOLOYOLO
YOLO
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentation
 
Object detection
Object detectionObject detection
Object detection
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
Yolo
YoloYolo
Yolo
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Yolo releases gianmaria
Yolo releases gianmariaYolo releases gianmaria
Yolo releases gianmaria
 
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 

Similar a Object Detection Using R-CNN Deep Learning Framework

Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
Universitat Politècnica de Catalunya
 

Similar a Object Detection Using R-CNN Deep Learning Framework (20)

object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Adaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom predictionAdaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom prediction
 
Fast methods for deep learning based object detection
Fast methods for deep learning based object detectionFast methods for deep learning based object detection
Fast methods for deep learning based object detection
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
object-detection.pptx
object-detection.pptxobject-detection.pptx
object-detection.pptx
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Panoptic Segmentation @CVPR2019
Panoptic Segmentation @CVPR2019Panoptic Segmentation @CVPR2019
Panoptic Segmentation @CVPR2019
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides
 
Cvpr 2017 Summary Meetup
Cvpr 2017 Summary MeetupCvpr 2017 Summary Meetup
Cvpr 2017 Summary Meetup
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 

Último

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 

Último (20)

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 

Object Detection Using R-CNN Deep Learning Framework

  • 1. Object Detection Using R-CNN Deep Learning Framework Nader Karimi Bavandpour (nader.karimi.b@gmail.com) Summer School of Intelligent Learning IPM, 2019
  • 2. Table of Content ● Machine Learning Key Point: Inductive Bias ● From Classification to Instance Segmentation ● Region Proposal ● R-CNN Framework 2
  • 3. Machine Learning Key Point: Inductive Bias 3
  • 4. Definition of Inductive Bias The kind of necessary assumptions about the nature of the target function are subsumed in the phrase inductive bias. - Wikipedia Every machine learning algorithm with any ability to generalize beyond the training data that it sees has some type of inductive bias. - StackOverflow 4
  • 5. Examples of Inductive Bias ● Maximum Margin: Maximize the width of the boundary between two classes ● Nearest Neighbors: Most of the cases in a small neighborhood in feature space belong to the same class ● Minimum Cross-Validation Error: Select the hypothesis with the lowest cross-validation error 5 ○ Although cross-validation may seem to be free of bias, the "no free lunch" theorems show that cross-validation must be biased. ● Locality of Receptive Field: Use convolutional layers instead of fc layers
  • 7. Object Classification 7 ● Image Category Recognition ● Input: image ● Output: Class label ● Types: ○ Binary/Multi-class Classification ○ Multiclass Classification ○ Binary/Multi-label Classification
  • 8. Object Localization 8 ● Object Bounding Box Recognition ● Input: image ● Output: Box in the image (x, y, w, h)
  • 9. Semantic Segmentation 9 ● Pixel Category Recognition ● Input: Image ● Output: Category-aware pixel labels
  • 10. Instance Segmentation 10 ● Instance-Aware Pixel Category Recognition ● Input: Image ● Output: Instance-aware pixel labels
  • 11. Intersection Over Union (IoU) Important measurement for object localization Used in both training and evaluation 11
  • 12. Datasets: ImageNet Challenge ● 1000 Classes ● Each image has 1 class with at least one bounding box ● About 800 Training images per class ● Algorithm produces 5 (class + bounding box) guesses ● Correct if at least one of guess has correct class and bounding box at least 50% intersection over union. 12
  • 13. 13
  • 15. Selective Search for Region Proposal ● A region proposal algorithm used in object detection ● Designed to be fast with a very high recall ● Based on computing hierarchical grouping of similar regions based on color, texture, size and shape compatibility 15
  • 16. Selective Search for Region Proposal ● First takes an image as input 16
  • 17. Selective Search for Region Proposal ● Generates initial sub-segmentations 17
  • 18. Selective Search for Region Proposal ● Combines the similar regions to form a larger region ○ based on color similarity, texture similarity, size similarity, and shape compatibility ● Finally, these regions produce the Regions of Interest (RoI) 18
  • 20. R-CNN Family ● R-CNN: Selective search → Cropped Image → CNN ● Fast R-CNN: Selective search → Crop feature map of CNN ● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN ● Mask-CNN: Adds Object Boundary Prediction to R-CNN 20
  • 21. R-CNN Family ● R-CNN: Selective search → Cropped Image → CNN ● Fast R-CNN: Selective search → Crop feature map of CNN ● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN ● Mask-CNN: Adds Object Boundary Prediction to R-CNN 21
  • 27. Problems with R-CNN ● Extracting 2,000 regions for each image based on selective search ● Extracting features using CNN for every image region. Suppose we have N images, then the number of CNN features will be N*2,000 ● The entire process of object detection using R-CNN has three models: ○ CNN for feature extraction ○ Linear SVM classifier for identifying objects ○ Regression model for tightening the bounding boxes 27
  • 28. R-CNN Family ● R-CNN: Selective search → Cropped Image → CNN ● Fast R-CNN: Selective search → Crop feature map of CNN ● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN ● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN 28
  • 29. Fast RCNN ● Selective search as a proposal method to find the Regions of Interest is slow ● Takes around 2 seconds per image to detect objects, which is much better compared to RCNN 29
  • 30. R-CNN Family ● R-CNN: Selective search → Cropped Image → CNN ● Fast R-CNN: Selective search → Crop feature map of CNN ● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN ● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN 30
  • 31. Faster RCNN ● Region Proposal Network (RPN) for region proposal ○ Input: Image of any size ○ Output: A set of rectangular object proposals and objectness scores ○ Related to attention mechanisms 31
  • 32. Faster RCNN ● Feature maps from CNN are passed to the Region Proposal Network (RPN) ● k Anchor boxes of different shapes are generated using a sliding window in the RPN ● Anchor boxes are fixed sized boundary boxes that are placed throughout the image and have different shapes and size 32
  • 33. Faster RCNN ● For each anchor, RPN predicts two things: ○ The first is the probability that an anchor is an object (it does not consider which class the object belongs to) ○ Second is the bounding box regressor for adjusting the anchors to better fit the object 33
  • 34. R-CNN Family ● R-CNN: Selective search → Cropped Image → CNN ● Fast R-CNN: Selective search → Crop feature map of CNN ● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN ● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN 34
  • 35. Mask R-CNN ● Extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition 35
  • 36. Mask R-CNN ● Defines a multi-task loss on each sampled RoI as: L = L_cls + L_box + L_mask 36
  • 38. Thanks for Your Attention! 38