SlideShare una empresa de Scribd logo
1 de 91
Improving Region based CNN object detector
using Bayesian Optimization
AMGAD MUHAMMAD
Agenda
• Background
• Problem definition
• Proposed solution
• Baseline with an example
Background
Background: Deformable Parts Model
• Strong low-level features based on
histograms of oriented gradients (HOG)
• Efficient matching algorithms for deformable part-
based models (pictorial structures)
• Discriminative learning with latent variables (latent
SVM)
• Where to look? Every where (the sliding window
approach)
• mean Average Precision (mAP): 33.7% - 33.4%
P.F. Felzenszwalb et al., “Object Detection with Discriminatively Trained Part-Based Models”, PAMI 2010.
J.J. Lim et al., “Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection”, CVPR 2013.
X. Ren et al., “Histograms of Sparse Codes for Object Detection”, CVPR 2013.
Background: Selective search
• Alternative to exhaustive search
with sliding window.
• Starting with over-segmentation,
merge similar regions and produce region
proposals.
van de Sande et al., “Segmentation as Selective Search for Object Recognition”, ICCV 2011.
Deep Learning happened, again!
Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.
ImageNet 2012 :whole-image classification with 1000 categories
Model Top-1(val) Top-5(val) Top-5(test)
1 CNN 40.7% 18.2% -
5 CNNs 38.1% 16.4% 16.4%
1 CNN (pre-trained) 39.0% 16.6% -
7 CNNs (pre-trained) 36.7% 15.4% 15.3%
• Can it be used in object recognition?
• Problems:
• localization: Where is the object?
• annotation: Labeled data is scarce.
• Expensive Computation for dense
search.
R-CNN: Region proposals + CNN
localization featureextraction classification
Approach Summery selective search deep learning
CNN
binary linear SVM
R-CNN
Input image
Girshick et al. CVPR14.
Regions of Interest (RoI)
from a proposal method
(~2k)
Input image
R-CNN
Girshick et al. CVPR14.
Warped image regions
Regions of Interest (RoI)
from a proposal method
(~2k)
Input image
R-CNN
Girshick et al. CVPR14.
ConvNet
ConvNet
ConvNet
Warped image regions
Forward each region
through ConvNet
Regions of Interest (RoI)
from a proposal method
(~2k)
Input image
R-CNN
Girshick et al. CVPR14.
ConvNet
ConvNet
ConvNet
SVMs
SVMs
SVMs
Warped image regions
Forward each region
through ConvNet
Classify regions withSVMs
Regions of Interest (RoI)
from a proposal method
(~2k)
Input image
R-CNN
Girshick et al. CVPR14.
ConvNet
ConvNet
ConvNet
SVMs
Warped image regions
Forward each region
through ConvNet
Bbox reg
Bbox reg
Bbox reg SVMs
SVMs
Apply boundingboxregressors
Classify regions withSVMs
Regions of Interest (RoI)
from a proposal method
(~2k)
Input image
R-CNN
Girshick et al. CVPR14.
What’s wrong with R-CNN?
• Ad hoc training objectives
• Fine-tune network with softmax classifier (log loss)
• Train post-hoc linear SVMs (hingeloss)
• Train post-hoc bounding-box regressors (squaredloss)
What’s wrong with R-CNN?
• Ad hoc training objectives
• FineHtunenetwork with softmax classifier (log loss)
• Train postHhoclinear SVMs (hingeloss)
• Train postHhocboundingHbox regressors (squaredloss)
• Training is slow (84h), takes a lot of disk space
What’s wrong with R-CNN?
• Ad hoc training objectives
• FineHtune network with softmax classifier (log loss)
• Train postHhoclinear SVMs (hingeloss)
• Train postHhocboundingHboxregressions (least squares)
• Training is slow (84h), takes a lot of disk space
• Inference (detection) is slow
• 47s / image with VGG16 [Simonyan & Zisserman. ICLR15]
• Fixed by SPP-net[He et al. ECCV14]
~2000 ConvNet forward passes per image
What’s wrong with R-CNN?
SPP-net
Input image
He et al. ECCV14.
ConvNet
Input image
“conv5” feature map of image
Forward whole image through ConvNet
SPP-net
He et al. ECCV14.
ConvNet
Input image
Forward whole image through ConvNet
“conv5” feature map of imageRegions of
Interest (RoIs)
from a proposal
method
SPP-net
He et al. ECCV14.
ConvNet
Input image
Forward whole image through ConvNet
“conv5” feature map of imageRegions of
Interest (RoIs)
from a proposal
method
Spatial Pyramid Pooling (SPP) layer
SPP-net
He et al. ECCV14.
Input image
Regions of
Interest (RoIs)
from a proposal
method
ConvNet
SVMs Classify regions withSVMs
FullyHconnected layers
Spatial Pyramid Pooling (SPP) layer
“conv5” feature map of image
Forward whole image through ConvNet
FCs
SPP-net
He et al. ECCV14.
Input image
Regions of
Interest (RoIs)
from a proposal
method
ConvNet
SVMs Classify regions withSVMs
FullyHconnected layers
Spatial Pyramid Pooling (SPP) layer
“conv5” feature map of image
Forward whole image through ConvNet
FCs
Bbox reg
Apply boundingbox regressorsSPP-net
He et al. ECCV14.
What’s good about SPP-net?
• Fixes one issue with R-CNN:makes testing fast
ConvNet
SVMs
FCs
Bbox reg
Region-wise
computation
Image-wise
computation
(shared)
What’s wrong with SPP-net?
• Inherits the rest of R-CNN’sproblems
• Ad hoc trainingobjectives
• Training is slow (25h), takes a lot of disk space
• Introduces a new problem: cannot update
parameters below SPP layer during training
SPP-net: the main limitation
ConvNet
He et al. ECCV14.
SVMs
Trainable
(3 layers)
Frozen
(13 layers)
FCs
Bbox reg
SPPisnotdifferentiable
Fast R-CNN
• Fast test-time,like SPP-net
Fast R-CNN
• Fast test-time,like SPP-net
• One network, trained in one stage
Fast R-CNN
• Fast test-time,like SPP-net
• One network, trained in one stage
• Higher mean average precision than R-CNN and SPP-net
Fast R-CNN (test time)
ConvNet
Forward whole image through ConvNet
“conv5” feature map of imageRegions of
Interest (RoIs)
from a proposal
method
Input image
ConvNet
Forward whole image through ConvNet
“conv5” feature map of image
“RoI Pooling” (singleHlevel SPP) layer
Input image
Regions of
Interest (RoIs)
from a proposal
method
Fast R-CNN (test time)
Linear +
softmax
FCs FullyHconnected layers
“RoI Pooling” (singleHlevel SPP) layer
“conv5” feature map of image
Forward whole image through ConvNet
Input image
Softmax classifier
Regions of
Interest (RoIs)
from a proposal
method
ConvNet
Fast R-CNN (test time)
ConvNet
Forward whole image through ConvNet
“conv5” feature map of image
“RoI Pooling” (single-level SPP) layer
Linear +
softmax
FCs FullyHconnected layers
Softmax classifier
Regions of
Interest (RoIs)
from a proposal
method
Linear
Input image
Bounding-box regressors
Fast R-CNN (test time)
Fast R-CNN (training)
Linear +
softmax
FCs
Linear
ConvNet
Log loss + smooth L1 loss
Linear +
softmax
FCs
Linear
ConvNet
Multi-taskloss
Fast R-CNN (training)
Log loss + smooth L1 loss
Linear +
softmax
FCs
Linear
Trainable
Multi-taskloss
ConvNet
Fast R-CNN (training)
What is missing from the previous
architectures?
• All the previous architectures relies on an external region
proposal algorithm.
• Proposed regions are independent from the network loss.
• No control over the regions quality.
• Fast test-time,like FastR-CNN
Faster R-CNN
Faster R-CNN
• Fast test-time,like FastR-CNN
• One network, trained in one stage
• Fast test-time,like FastR-CNN
• One network, trained in one stage
• Higher mean average precision than R-CNN,SPP-net,
Fast-RCNN
Faster R-CNN
• Fast test-time,like FastR-CNN
• One network, trained in one stage
• Higher mean average precision than R-CNN , SPP-
net, Fast-RCNN
• HaveadedicatedRegionProposalNetwork(RPN)trainedto
optimizethenetworkloss.
Faster R-CNN
ConvNet
Forward whole image through ConvNet
Input image
Faster R-CNN
ConvNet
Forward whole image through ConvNet
Input image
Forward whole
image through
RPN ConNet
Faster R-CNN
ConvNet
ConvNet
Forward whole image through ConvNet
Input image
Linear +
softmax Linear
Faster R-CNN
Forward whole
image through
RPN ConNet
ConvNet
ConvNet
Forward whole image through ConvNet
Input image
Linear +
softmax
Softmax classifier
Linear
Bounding-box regressors
Faster R-CNN
Forward whole
image through
RPN ConNet
ConvNet
ConvNet
Forward whole image through ConvNet
Input image
“conv5” feature map of image
Linear +
softmax
Softmax classifier
Linear
Bounding-box regressors
Faster R-CNN
Forward whole
image through
RPN ConNet
ConvNet
ConvNet
Forward whole image through ConvNet
Input image
“conv5” feature map of image
“RoI Pooling” (single-level SPP) layer
FCs FullyHconnected layers
Linear +
softmax
Softmax classifier
Linear
Bounding-box regressors
Faster R-CNN
Forward whole
image through
RPN ConNet
ConvNet
ConvNet
Forward whole image through ConvNet
Input image
“conv5” feature map of image
“RoI Pooling” (single-level SPP) layer
Linear +
softmax
FCs FullyHconnected layers
Softmax classifier
Linear Bounding-box regressors
Linear +
softmax
Softmax classifier
Linear
Bounding-box regressors
Faster R-CNN
Forward whole
image through
RPN ConNet
ConvNet
ConvNet
Linear +
softmax
FCs
Linear
Linear +
softmax Linear
Faster R-CNN
Trainable
ConvNet
Super efficient: shared
weightsbetween detection
andRegion Proposal network
Trainable
Problem definition
Problem definition
• All region based CNN object detector are dependent on the quality of
the region proposal algorithm.
• Although in the Faster R-CNN, the region proposal network was trained
to minimize a multi-task loss function (log-loss and bounding-box
regression), still ,in my experiments, the best proposed regions are ill-
localized.
Problem definition (example)
Top 1 region
Problem definition (example)
Top 1 region Top 3 regions
Problem definition (example)
Top 1 region Top 3 regions
Top 5 regions
Problem definition (example)
Top 1 region Top 3 regions
Top 5 regions Top 100 regions
Proposed Solution
Better regions with Bayesian
Optimization
Now the goal becomes sampling new solution 𝑦 𝑛+1 with
high chance that it will maximizes the value of 𝑓𝑛+1
Better regions with Bayesian
Optimization
Given the ability to query a our CNN for region scores
we can repeat the following:
1. Given existing regions/scores •
Better regions with Bayesian
Optimization
Given the ability to query a our CNN for region scores
we can repeat the following:
1. Given existing regions/scores •
2. Wefit a model
Given the ability to query a our CNN for region scores
we can repeat the following:
Better regions with Bayesian
Optimization
1. Given existing regions/scores •
2. Wefit a model
3. Introduce the chanceutility function
Given the ability to query a our CNN for region scores
we can repeat the following:
Better regions with Bayesian
Optimization
1. Given existing regions/scores •
2. Wefit a model
3. Introduce the chanceutility function
4. Locatethe maximum of the utility
Given the ability to query a our CNN for region scores
we can repeat the following:
Better regions with Bayesian
Optimization
1. Given existing regions/scores •
2. Wefit a model
3. Introduce the chanceutility function
4. Locatethe maximum of the utility
5. Observe the new regionscore
Given the ability to query a our CNN for region scores
we can repeat the following:
Better regions with Bayesian
Optimization
1. Given existing regions/scores •
2. Wefit a model
3. Introduce the chanceutility function
4. Locatethe maximum of the utility
5. Observe the new regionscore
6. Update the model.
Given the ability to query a our CNN for region scores
we can repeat the following:
Better regions with Bayesian
Optimization
1. Given existing regions/scores •
2. Wefit a model
3. Introduce the chanceutility function
4. Locatethe maximum of the utility
5. Observe the new regionscore
6. Update the model.
7. Repeatstep 2.
Given the ability to query a our CNN for region scores
we can repeat the following:
Better regions with Bayesian
Optimization
1. Given existing regions/scores •
2. Wefit a model
3. Introduce the chanceutility function
4. Locatethe maximum of the utility
5. Observe the new regionscore
6. Update the model.
7. Repeatstep 2.
Given the ability to query a our CNN for region scores
we can repeat the following:
Better regions with Bayesian
Optimization
Example of BO applied
to R-CNN
Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, and Honglak Lee.
Originalimage
Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, and Honglak Lee.
Initial regionproposals
Initial detection(localoptima)
Initialdetection&Groundtruth
Neither gives
good
localization
Iter1:Boxesinsidethelocalsearchregion
Iter1:Heat mapofexpectedimprovement(EI)
• A box has 4Ncoordinates:
(centerX, centerY, height,width)
• The height and widthare marginN
alized by max to visualize EI in2D
Iter1:Heat mapofexpectedimprovement(EI)
Iter1:Maximum ofEI–thenewlyproposedbox
Iter 1:Complete
Iteration 2: local optimum &searchregion
Iteration2:EIheat map&newproposal
Iteration2:Newlyproposedbox& itsactual score
Iteration 3: local optimum &searchregion
Iteration3:EIheatmap & newproposal
Iteration3:Newlyproposedbox& itsactual score
Iteration4
Iteration5
Iteration6
Iteration7
Iteration8
Finalresults
Final results &Ground truth
Baseline
Questions

Más contenido relacionado

La actualidad más candente

How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...Dongmin Choi
 
Evaluation of bandwidth performance for interactive spherical video
Evaluation of bandwidth performance for interactive spherical videoEvaluation of bandwidth performance for interactive spherical video
Evaluation of bandwidth performance for interactive spherical videoAlpen-Adria-Universität
 
Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Lu Jiang
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level FeatureDongmin Choi
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...changedaeoh
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsIRJET Journal
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIWanjin Yu
 
1907555 ant colony optimization for simulated dynamic multi-objective railway...
1907555 ant colony optimization for simulated dynamic multi-objective railway...1907555 ant colony optimization for simulated dynamic multi-objective railway...
1907555 ant colony optimization for simulated dynamic multi-objective railway...Mamun Hasan
 
Motion estimation overview
Motion estimation overviewMotion estimation overview
Motion estimation overviewYoss Cohen
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionKai-Wen Zhao
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Poolingivaderivader
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]taeseon ryu
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedSushant Gautam
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelKoichi Shirahata
 

La actualidad más candente (20)

How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
 
Evaluation of bandwidth performance for interactive spherical video
Evaluation of bandwidth performance for interactive spherical videoEvaluation of bandwidth performance for interactive spherical video
Evaluation of bandwidth performance for interactive spherical video
 
Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection Algorithms
 
ECCV WS 2012 (Frank)
ECCV WS 2012 (Frank)ECCV WS 2012 (Frank)
ECCV WS 2012 (Frank)
 
Aerial detection1
Aerial detection1Aerial detection1
Aerial detection1
 
Tldr
TldrTldr
Tldr
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet II
 
1907555 ant colony optimization for simulated dynamic multi-objective railway...
1907555 ant colony optimization for simulated dynamic multi-objective railway...1907555 ant colony optimization for simulated dynamic multi-objective railway...
1907555 ant colony optimization for simulated dynamic multi-objective railway...
 
Motion estimation overview
Motion estimation overviewMotion estimation overview
Motion estimation overview
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming Model
 

Similar a Improving region based CNN object detector using bayesian optimization

Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxfahmi324663
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNsAuro Tripathy
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
위성이미지 객체 검출 대회 - 2등
위성이미지 객체 검출 대회 - 2등위성이미지 객체 검출 대회 - 2등
위성이미지 객체 검출 대회 - 2등DACON AI 데이콘
 
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...CodeOps Technologies LLP
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper reviewYoonho Na
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsKen Chatfield
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer visionMarcin Jedyk
 
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchFast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchYuichiro Yasui
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsNAVER Engineering
 

Similar a Improving region based CNN object detector using bayesian optimization (20)

Detection
DetectionDetection
Detection
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Temporal Segment Network
Temporal Segment NetworkTemporal Segment Network
Temporal Segment Network
 
위성이미지 객체 검출 대회 - 2등
위성이미지 객체 검출 대회 - 2등위성이미지 객체 검출 대회 - 2등
위성이미지 객체 검출 대회 - 2등
 
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image Collections
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchFast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environments
 

Más de Amgad Muhammad

CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
Auto-Encoders and PCA, a brief psychological background
Auto-Encoders and PCA, a brief psychological backgroundAuto-Encoders and PCA, a brief psychological background
Auto-Encoders and PCA, a brief psychological backgroundAmgad Muhammad
 
Android Performance Best Practices
Android Performance Best Practices Android Performance Best Practices
Android Performance Best Practices Amgad Muhammad
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature LearningAmgad Muhammad
 

Más de Amgad Muhammad (6)

CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Auto-Encoders and PCA, a brief psychological background
Auto-Encoders and PCA, a brief psychological backgroundAuto-Encoders and PCA, a brief psychological background
Auto-Encoders and PCA, a brief psychological background
 
Android Performance Best Practices
Android Performance Best Practices Android Performance Best Practices
Android Performance Best Practices
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature Learning
 
Google File System
Google File SystemGoogle File System
Google File System
 
Python
PythonPython
Python
 

Último

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 

Último (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 

Improving region based CNN object detector using bayesian optimization

  • 1. Improving Region based CNN object detector using Bayesian Optimization AMGAD MUHAMMAD
  • 2. Agenda • Background • Problem definition • Proposed solution • Baseline with an example
  • 4. Background: Deformable Parts Model • Strong low-level features based on histograms of oriented gradients (HOG) • Efficient matching algorithms for deformable part- based models (pictorial structures) • Discriminative learning with latent variables (latent SVM) • Where to look? Every where (the sliding window approach) • mean Average Precision (mAP): 33.7% - 33.4% P.F. Felzenszwalb et al., “Object Detection with Discriminatively Trained Part-Based Models”, PAMI 2010. J.J. Lim et al., “Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection”, CVPR 2013. X. Ren et al., “Histograms of Sparse Codes for Object Detection”, CVPR 2013.
  • 5. Background: Selective search • Alternative to exhaustive search with sliding window. • Starting with over-segmentation, merge similar regions and produce region proposals. van de Sande et al., “Segmentation as Selective Search for Object Recognition”, ICCV 2011.
  • 6. Deep Learning happened, again! Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012. ImageNet 2012 :whole-image classification with 1000 categories Model Top-1(val) Top-5(val) Top-5(test) 1 CNN 40.7% 18.2% - 5 CNNs 38.1% 16.4% 16.4% 1 CNN (pre-trained) 39.0% 16.6% - 7 CNNs (pre-trained) 36.7% 15.4% 15.3% • Can it be used in object recognition? • Problems: • localization: Where is the object? • annotation: Labeled data is scarce. • Expensive Computation for dense search.
  • 7. R-CNN: Region proposals + CNN localization featureextraction classification Approach Summery selective search deep learning CNN binary linear SVM
  • 9. Regions of Interest (RoI) from a proposal method (~2k) Input image R-CNN Girshick et al. CVPR14.
  • 10. Warped image regions Regions of Interest (RoI) from a proposal method (~2k) Input image R-CNN Girshick et al. CVPR14.
  • 11. ConvNet ConvNet ConvNet Warped image regions Forward each region through ConvNet Regions of Interest (RoI) from a proposal method (~2k) Input image R-CNN Girshick et al. CVPR14.
  • 12. ConvNet ConvNet ConvNet SVMs SVMs SVMs Warped image regions Forward each region through ConvNet Classify regions withSVMs Regions of Interest (RoI) from a proposal method (~2k) Input image R-CNN Girshick et al. CVPR14.
  • 13. ConvNet ConvNet ConvNet SVMs Warped image regions Forward each region through ConvNet Bbox reg Bbox reg Bbox reg SVMs SVMs Apply boundingboxregressors Classify regions withSVMs Regions of Interest (RoI) from a proposal method (~2k) Input image R-CNN Girshick et al. CVPR14.
  • 15. • Ad hoc training objectives • Fine-tune network with softmax classifier (log loss) • Train post-hoc linear SVMs (hingeloss) • Train post-hoc bounding-box regressors (squaredloss) What’s wrong with R-CNN?
  • 16. • Ad hoc training objectives • FineHtunenetwork with softmax classifier (log loss) • Train postHhoclinear SVMs (hingeloss) • Train postHhocboundingHbox regressors (squaredloss) • Training is slow (84h), takes a lot of disk space What’s wrong with R-CNN?
  • 17. • Ad hoc training objectives • FineHtune network with softmax classifier (log loss) • Train postHhoclinear SVMs (hingeloss) • Train postHhocboundingHboxregressions (least squares) • Training is slow (84h), takes a lot of disk space • Inference (detection) is slow • 47s / image with VGG16 [Simonyan & Zisserman. ICLR15] • Fixed by SPP-net[He et al. ECCV14] ~2000 ConvNet forward passes per image What’s wrong with R-CNN?
  • 19. ConvNet Input image “conv5” feature map of image Forward whole image through ConvNet SPP-net He et al. ECCV14.
  • 20. ConvNet Input image Forward whole image through ConvNet “conv5” feature map of imageRegions of Interest (RoIs) from a proposal method SPP-net He et al. ECCV14.
  • 21. ConvNet Input image Forward whole image through ConvNet “conv5” feature map of imageRegions of Interest (RoIs) from a proposal method Spatial Pyramid Pooling (SPP) layer SPP-net He et al. ECCV14.
  • 22. Input image Regions of Interest (RoIs) from a proposal method ConvNet SVMs Classify regions withSVMs FullyHconnected layers Spatial Pyramid Pooling (SPP) layer “conv5” feature map of image Forward whole image through ConvNet FCs SPP-net He et al. ECCV14.
  • 23. Input image Regions of Interest (RoIs) from a proposal method ConvNet SVMs Classify regions withSVMs FullyHconnected layers Spatial Pyramid Pooling (SPP) layer “conv5” feature map of image Forward whole image through ConvNet FCs Bbox reg Apply boundingbox regressorsSPP-net He et al. ECCV14.
  • 24. What’s good about SPP-net? • Fixes one issue with R-CNN:makes testing fast ConvNet SVMs FCs Bbox reg Region-wise computation Image-wise computation (shared)
  • 25. What’s wrong with SPP-net? • Inherits the rest of R-CNN’sproblems • Ad hoc trainingobjectives • Training is slow (25h), takes a lot of disk space • Introduces a new problem: cannot update parameters below SPP layer during training
  • 26. SPP-net: the main limitation ConvNet He et al. ECCV14. SVMs Trainable (3 layers) Frozen (13 layers) FCs Bbox reg SPPisnotdifferentiable
  • 27. Fast R-CNN • Fast test-time,like SPP-net
  • 28. Fast R-CNN • Fast test-time,like SPP-net • One network, trained in one stage
  • 29. Fast R-CNN • Fast test-time,like SPP-net • One network, trained in one stage • Higher mean average precision than R-CNN and SPP-net
  • 30. Fast R-CNN (test time) ConvNet Forward whole image through ConvNet “conv5” feature map of imageRegions of Interest (RoIs) from a proposal method Input image
  • 31. ConvNet Forward whole image through ConvNet “conv5” feature map of image “RoI Pooling” (singleHlevel SPP) layer Input image Regions of Interest (RoIs) from a proposal method Fast R-CNN (test time)
  • 32. Linear + softmax FCs FullyHconnected layers “RoI Pooling” (singleHlevel SPP) layer “conv5” feature map of image Forward whole image through ConvNet Input image Softmax classifier Regions of Interest (RoIs) from a proposal method ConvNet Fast R-CNN (test time)
  • 33. ConvNet Forward whole image through ConvNet “conv5” feature map of image “RoI Pooling” (single-level SPP) layer Linear + softmax FCs FullyHconnected layers Softmax classifier Regions of Interest (RoIs) from a proposal method Linear Input image Bounding-box regressors Fast R-CNN (test time)
  • 34. Fast R-CNN (training) Linear + softmax FCs Linear ConvNet
  • 35. Log loss + smooth L1 loss Linear + softmax FCs Linear ConvNet Multi-taskloss Fast R-CNN (training)
  • 36. Log loss + smooth L1 loss Linear + softmax FCs Linear Trainable Multi-taskloss ConvNet Fast R-CNN (training)
  • 37. What is missing from the previous architectures? • All the previous architectures relies on an external region proposal algorithm. • Proposed regions are independent from the network loss. • No control over the regions quality.
  • 38. • Fast test-time,like FastR-CNN Faster R-CNN
  • 39. Faster R-CNN • Fast test-time,like FastR-CNN • One network, trained in one stage
  • 40. • Fast test-time,like FastR-CNN • One network, trained in one stage • Higher mean average precision than R-CNN,SPP-net, Fast-RCNN Faster R-CNN
  • 41. • Fast test-time,like FastR-CNN • One network, trained in one stage • Higher mean average precision than R-CNN , SPP- net, Fast-RCNN • HaveadedicatedRegionProposalNetwork(RPN)trainedto optimizethenetworkloss. Faster R-CNN
  • 42. ConvNet Forward whole image through ConvNet Input image Faster R-CNN
  • 43. ConvNet Forward whole image through ConvNet Input image Forward whole image through RPN ConNet Faster R-CNN ConvNet
  • 44. ConvNet Forward whole image through ConvNet Input image Linear + softmax Linear Faster R-CNN Forward whole image through RPN ConNet ConvNet
  • 45. ConvNet Forward whole image through ConvNet Input image Linear + softmax Softmax classifier Linear Bounding-box regressors Faster R-CNN Forward whole image through RPN ConNet ConvNet
  • 46. ConvNet Forward whole image through ConvNet Input image “conv5” feature map of image Linear + softmax Softmax classifier Linear Bounding-box regressors Faster R-CNN Forward whole image through RPN ConNet ConvNet
  • 47. ConvNet Forward whole image through ConvNet Input image “conv5” feature map of image “RoI Pooling” (single-level SPP) layer FCs FullyHconnected layers Linear + softmax Softmax classifier Linear Bounding-box regressors Faster R-CNN Forward whole image through RPN ConNet ConvNet
  • 48. ConvNet Forward whole image through ConvNet Input image “conv5” feature map of image “RoI Pooling” (single-level SPP) layer Linear + softmax FCs FullyHconnected layers Softmax classifier Linear Bounding-box regressors Linear + softmax Softmax classifier Linear Bounding-box regressors Faster R-CNN Forward whole image through RPN ConNet ConvNet
  • 49. ConvNet Linear + softmax FCs Linear Linear + softmax Linear Faster R-CNN Trainable ConvNet Super efficient: shared weightsbetween detection andRegion Proposal network Trainable
  • 51. Problem definition • All region based CNN object detector are dependent on the quality of the region proposal algorithm. • Although in the Faster R-CNN, the region proposal network was trained to minimize a multi-task loss function (log-loss and bounding-box regression), still ,in my experiments, the best proposed regions are ill- localized.
  • 53. Problem definition (example) Top 1 region Top 3 regions
  • 54. Problem definition (example) Top 1 region Top 3 regions Top 5 regions
  • 55. Problem definition (example) Top 1 region Top 3 regions Top 5 regions Top 100 regions
  • 57. Better regions with Bayesian Optimization Now the goal becomes sampling new solution 𝑦 𝑛+1 with high chance that it will maximizes the value of 𝑓𝑛+1
  • 58. Better regions with Bayesian Optimization Given the ability to query a our CNN for region scores we can repeat the following:
  • 59. 1. Given existing regions/scores • Better regions with Bayesian Optimization Given the ability to query a our CNN for region scores we can repeat the following:
  • 60. 1. Given existing regions/scores • 2. Wefit a model Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  • 61. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  • 62. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function 4. Locatethe maximum of the utility Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  • 63. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function 4. Locatethe maximum of the utility 5. Observe the new regionscore Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  • 64. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function 4. Locatethe maximum of the utility 5. Observe the new regionscore 6. Update the model. Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  • 65. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function 4. Locatethe maximum of the utility 5. Observe the new regionscore 6. Update the model. 7. Repeatstep 2. Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  • 66. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function 4. Locatethe maximum of the utility 5. Observe the new regionscore 6. Update the model. 7. Repeatstep 2. Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  • 67. Example of BO applied to R-CNN Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, and Honglak Lee.
  • 68. Originalimage Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, and Honglak Lee.
  • 73. Iter1:Heat mapofexpectedimprovement(EI) • A box has 4Ncoordinates: (centerX, centerY, height,width) • The height and widthare marginN alized by max to visualize EI in2D
  • 77. Iteration 2: local optimum &searchregion
  • 80. Iteration 3: local optimum &searchregion