SlideShare a Scribd company logo
1 of 89
Lecture 2 Overview on object recognition and  one practical example 6.870 Object Recognition and Scene Understanding  http://people.csail.mit.edu/torralba/courses/6.870/6.870.recognition.htm
Wednesday ,[object Object],[object Object]
Object recognition Is it really so hard? Output of normalized correlation This is a chair Find the chair in this image
Object recognition Is it really so hard? My biggest concern while making this slide was: how do I justify 50 years of research, and this course, if this experiment did work? Pretty much garbage Simple template matching is not going to make it Find the chair in this image
Object recognition Is it really so hard? A “popular method is that of template matching, by point to point correlation of a model pattern with the image pattern. These techniques are inadequate for three-dimensional scene analysis for many reasons, such as occlusion, changes in viewing angle, and articulation of parts.” Nivatia & Binford, 1977. Find the chair in this image
So, let’s make the problem simpler: Block world Nice framework to develop fancy math, but too far from reality… Object Recognition in the Geometric Era: a Retrospective. Joseph L. Mundy. 2006
Binford and generalized cylinders Object Recognition in the Geometric Era: a Retrospective. Joseph L. Mundy. 2006
Binford and generalized cylinders
Recognition by components Irving Biederman Recognition-by-Components: A Theory of Human Image Understanding.  Psychological Review, 1987.
Recognition by components ,[object Object],[object Object]
A do-it-yourself example ,[object Object],[object Object],[object Object],[object Object]
Hypothesis ,[object Object],[object Object],[object Object]
Constraints on possible models of recognition ,[object Object],[object Object],[object Object]
Stages of processing “ Parsing is performed, primarily at concave regions, simultaneously with a detection of nonaccidental properties.”
Non accidental properties Certain properties of edges in a two-dimensional image are taken by the visual system as strong evidence that the edges in the three-dimensional world contain those same properties. Non accidental properties, (Witkin  &  Tenenbaum,1983): Rarely be produced by accidental alignments of viewpoint and object features and consequently are generally unaffected by slight variations in viewpoint. e.g., Freeman, “the generic viewpoint assumption”, 1994 ? image
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The high speed and accuracy of determining a given nonaccidental relation {e.g., whether some pattern is symmetrical) should be contrasted with performance in making absolute quantitative judgments of variations in a single physical attribute, such as length of a segment or degree of tilt or curvature. Object recognition is performed by humans in around 100ms.
“If contours are deleted at a vertex they can be restored, as long as there is no accidental filling-in. The greater disruption from vertex deletion is expected on the basis of their importance as diagnostic image features for the components.” Recoverable Unrecoverable
From generalized cylinders to GEONS “ From variation over only two or three levels in the nonaccidental relations of four attributes of generalized cylinders, a set of 36 GEONS can be generated.” Geons represent a restricted form of generalized cylinders.
More GEONS
Objects and their geons
Scenes and geons Mezzanotte & Biederman
The importance of spatial arrangement
Supercuadrics Introduced in computer vision by A. Pentland, 1986.
What is missing? ,[object Object],[object Object]
Parts and Structure approaches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Figure from [Fischler  & Elschlager  73]
Representation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],We will discuss these models more in depth next week
[object Object],[object Object],[object Object]
A simple object detector ,[object Object],[object Object],Most of the slides are from the ICCV 05 short course http://people.csail.mit.edu/torralba/shortCourseRLOC/
Discriminative vs. generative x = data ,[object Object],(The lousy painter) 0 10 20 30 40 50 60 70 0 0.05 0.1 0 10 20 30 40 50 60 70 0 0.5 1 x = data ,[object Object],0 10 20 30 40 50 60 70 80 -1 1 x = data ,[object Object],( The artist )
Discriminative methods Object detection and recognition is formulated as a classification problem.  Decision boundary … and a decision is taken at each window about if it contains a target object or not. Where are the screens? The image is partitioned into a set of overlapping windows Bag of image patches Computer screen Background In some feature space
Discriminative methods 10 6  examples Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005 … Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines and Kernels Conditional Random Fields McCallum, Freitag, Pereira 2000 Kumar, Hebert 2003 … Guyon, Vapnik Heisele, Serre, Poggio, 2001 …
[object Object],Formulation +1 -1 x 1 x 2 x 3 x N … … x N+1 x N+2 x N+M -1 -1 ? ? ? … Training data: each image patch is labeled as containing the object or background Test data Features  x = Labels y = ,[object Object],[object Object],Where  belongs to some family of functions ,[object Object]
Overview of section ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A simple object detector with Boosting  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://people.csail.mit.edu/torralba/iccv2005/
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Why boosting?
Boosting ,[object Object],Strong  classifier Weak classifier Weight Features vector
[object Object],Boosting ,[object Object],Strong  classifier Weak classifier Weight Features vector from a family of weak classifiers
Boosting ,[object Object],Each data point has a class label: w t  =1 and a weight: x t=1 x t=2 x t +1 (  ) -1 (  ) y t  =
Toy example Weak learners from the family of lines h => p(error) = 0.5  it is at chance Each data point has a class label: w t  =1 and a weight: +1 (  ) -1 (  ) y t  =
Toy example This one seems to be the best Each data point has a class label: w t  =1 and a weight: This is a ‘ weak classifier ’: It performs slightly better than chance. +1 (  ) -1 (  ) y t  =
Toy example We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: w t  w t  exp{-y t  H t } We update the weights: +1 (  ) -1 (  ) y t  =
Toy example We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: w t  w t  exp{-y t  H t } We update the weights: +1 (  ) -1 (  ) y t  =
Toy example We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: w t  w t  exp{-y t  H t } We update the weights: +1 (  ) -1 (  ) y t  =
Toy example We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: w t  w t  exp{-y t  H t } We update the weights: +1 (  ) -1 (  ) y t  =
Toy example The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers. f 1 f 2 f 3 f 4
Boosting  ,[object Object],[object Object]
Overview of section ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Boosting  ,[object Object],by minimizing the exponential loss Training samples The exponential loss is a differentiable upper bound to the misclassification error.
Exponential loss -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 3.5 4 Squared error Exponential loss yF(x) = margin Misclassification error Loss Squared error Exponential loss
Boosting  Sequential procedure. At each step we add For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998) to minimize the residual loss  input Desired output Parameters weak classifier
gentleBoosting  For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998) We chose  that minimizes the cost: At each iterations we just need to solve a weighted least squares problem Weights at this iteration ,[object Object],Instead of doing exact optimization, gentle Boosting minimizes a Taylor approximation of the error:
Weak classifiers  ,[object Object],[object Object],Four parameters: b=E w (y [x>   ]) a=E w (y [x<   ]) x f m (x)  fitRegressionStump.m
gentleBoosting.m function  classifier =  gentleBoost (x, y, Nrounds) … for  m = 1:Nrounds fm =  selectBestWeakClassifier (x, y, w); w = w .* exp(- y .* fm); % store parameters of  fm  in classifier … end Solve weighted least-squares Re-weight training samples Initialize weights w = 1
Demo  gentleBoosting > demoGentleBoost.m Demo using Gentle boost and stumps with hand selected 2D data:
Flavors of boosting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview of section ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
From images to features: Weak detectors ,[object Object],Takes image as input and the output is binary response. The output is a weak detector.
Object recognition Is it really so hard? But what if we use smaller patches? Just a part of the chair? Find the chair in this image
Parts But what if we use smaller patches? Just a part of the chair? Seems to fire on legs… not so bad Find a chair in this image
Weak detectors ,[object Object],[object Object],Every combination of three filters generates a different feature This gives thousands of features. Boosting selects a sparse subset, so computations on test time are very efficient. Boosting also avoids overfitting to some extend.
Weak detectors ,[object Object],[object Object],The average intensity in the block is computed with four sums independently of the block size.
Haar wavelets Papageorgiou & Poggio (2000) Polynomial SVM
Edges and chamfer distance Gavrila, Philomin, ICCV 1999
Edge fragments Weak detector = k edge fragments and threshold. Chamfer distance uses 8 orientation planes Opelt, Pinz, Zisserman, ECCV 2006
Histograms of oriented gradients ,[object Object],[object Object],[object Object],[object Object]
Weak detectors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Weak detectors ,[object Object],Car model Screen model These features are used for the detector on the course web site.
Weak detectors First we collect a set of part templates from a set of training objects. Vidal-Naquet, Ullman (2003) …
Weak detectors We now define a family of “weak detectors” as: = = Better than chance *
Weak detectors We can do a better job using filtered images Still a weak detector but better than before * * = = =
Training First we evaluate all the N features on all the training images. Then, we sample the feature outputs on the object center and at random locations in the background:
Representation and object model Selected features for the screen detector Lousy painter  … 4 10 1 2 3 … 100
Representation and object model Selected features for the car detector 1 2 3 4 10 100 … …
Overview of section ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: screen detection Feature  output
Example: screen detection Feature  output Thresholded  output Weak ‘detector’ Produces many false alarms.
Example: screen detection Feature  output Thresholded  output Strong classifier  at iteration 1
Example: screen detection Feature  output Thresholded  output Strong classifier Second weak ‘detector’ Produces a different set of false alarms.
Example: screen detection + Feature  output Thresholded  output Strong classifier Strong classifier  at iteration 2
Example: screen detection + … Feature  output Thresholded  output Strong classifier Strong classifier  at iteration 10
Example: screen detection + … Feature  output Thresholded  output Strong classifier Adding  features Final classification Strong classifier  at iteration 200
Maximal suppression Detect local maximum of the response. We are only allowed detecting each object once. The rest will be considered false alarms. This post-processing stage can have a very strong impact in the final performance.
Evaluation ,[object Object],[object Object],When do we have a correct  detection? Is this correct? Area intersection Area union > 0.5
Demo > runDetector.m Demo of screen and car detectors using parts, Gentle boost, and stumps:
Probabilistic interpretation ,[object Object],[object Object],[object Object],It can be a set of arbitrary functions of the image This provides a great flexibility, difficult to beat by current generative models. But also there is the danger of not understanding what are they really doing.
Weak detectors ,[object Object],f i , P i g i Image Feature Part template Relative position wrt object center ,[object Object],[object Object]
Object models  ,[object Object],[object Object],Here, invariance in translation and scale is achieved by the search strategy: the classifier is evaluated at all locations (by translating the image) and at all scales (by scaling the image in small steps). The search cost can be reduced using a cascade. f i , P i g i
Wednesday ,[object Object],[object Object]

More Related Content

What's hot

iNEDI - Accuracy Improvements and Artifacts Removal in Edge Based Image Inter...
iNEDI - Accuracy Improvements and Artifacts Removal in Edge Based Image Inter...iNEDI - Accuracy Improvements and Artifacts Removal in Edge Based Image Inter...
iNEDI - Accuracy Improvements and Artifacts Removal in Edge Based Image Inter...Tecnick.com LTD
 
Final year University Dissertation.
Final year University Dissertation.Final year University Dissertation.
Final year University Dissertation.Jason Hobman
 
Object classification in far field and low resolution videos
Object classification in far field and low resolution videosObject classification in far field and low resolution videos
Object classification in far field and low resolution videosInsaf Setitra
 
Multiple representations and visual mental imagery in artificial cognitive sy...
Multiple representations and visual mental imagery in artificial cognitive sy...Multiple representations and visual mental imagery in artificial cognitive sy...
Multiple representations and visual mental imagery in artificial cognitive sy...University of Huddersfield
 
Mit6870 orsu lecture11
Mit6870 orsu lecture11Mit6870 orsu lecture11
Mit6870 orsu lecture11zukun
 
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaComputational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaAalto University
 
An assistive model of obstacle detection based on deep learning: YOLOv3 for v...
An assistive model of obstacle detection based on deep learning: YOLOv3 for v...An assistive model of obstacle detection based on deep learning: YOLOv3 for v...
An assistive model of obstacle detection based on deep learning: YOLOv3 for v...IJECEIAES
 

What's hot (8)

iNEDI - Accuracy Improvements and Artifacts Removal in Edge Based Image Inter...
iNEDI - Accuracy Improvements and Artifacts Removal in Edge Based Image Inter...iNEDI - Accuracy Improvements and Artifacts Removal in Edge Based Image Inter...
iNEDI - Accuracy Improvements and Artifacts Removal in Edge Based Image Inter...
 
Final year University Dissertation.
Final year University Dissertation.Final year University Dissertation.
Final year University Dissertation.
 
Object classification in far field and low resolution videos
Object classification in far field and low resolution videosObject classification in far field and low resolution videos
Object classification in far field and low resolution videos
 
Multiple representations and visual mental imagery in artificial cognitive sy...
Multiple representations and visual mental imagery in artificial cognitive sy...Multiple representations and visual mental imagery in artificial cognitive sy...
Multiple representations and visual mental imagery in artificial cognitive sy...
 
A04570106
A04570106A04570106
A04570106
 
Mit6870 orsu lecture11
Mit6870 orsu lecture11Mit6870 orsu lecture11
Mit6870 orsu lecture11
 
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaComputational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
 
An assistive model of obstacle detection based on deep learning: YOLOv3 for v...
An assistive model of obstacle detection based on deep learning: YOLOv3 for v...An assistive model of obstacle detection based on deep learning: YOLOv3 for v...
An assistive model of obstacle detection based on deep learning: YOLOv3 for v...
 

Viewers also liked

Dynamics of perception
Dynamics of perceptionDynamics of perception
Dynamics of perceptionSandesh Bhat
 
Constellation Models and Unsupervised Learning for Object Class Recognition
Constellation Models and Unsupervised Learning for Object Class RecognitionConstellation Models and Unsupervised Learning for Object Class Recognition
Constellation Models and Unsupervised Learning for Object Class Recognitionwolf
 
cvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introductioncvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introductionzukun
 
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)npinto
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)nikhilus85
 
Object recognition
Object recognitionObject recognition
Object recognitionsaniacorreya
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognitionIntel Nervana
 
AAAI08 tutorial: visual object recognition
AAAI08 tutorial: visual object recognitionAAAI08 tutorial: visual object recognition
AAAI08 tutorial: visual object recognitionzukun
 
Human activity recognition
Human activity recognitionHuman activity recognition
Human activity recognitionRandhir Gupta
 
Human activity recognition
Human activity recognition Human activity recognition
Human activity recognition srikanthgadam
 
Lbp based edge-texture features for object recoginition
Lbp based edge-texture features for object recoginitionLbp based edge-texture features for object recoginition
Lbp based edge-texture features for object recoginitionIGEEKS TECHNOLOGIES
 

Viewers also liked (15)

Pc Seminar Jordi
Pc Seminar JordiPc Seminar Jordi
Pc Seminar Jordi
 
Dynamics of perception
Dynamics of perceptionDynamics of perception
Dynamics of perception
 
Constellation Models and Unsupervised Learning for Object Class Recognition
Constellation Models and Unsupervised Learning for Object Class RecognitionConstellation Models and Unsupervised Learning for Object Class Recognition
Constellation Models and Unsupervised Learning for Object Class Recognition
 
cvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introductioncvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introduction
 
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
 
Part1
Part1Part1
Part1
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
Object recognition
Object recognitionObject recognition
Object recognition
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognition
 
AAAI08 tutorial: visual object recognition
AAAI08 tutorial: visual object recognitionAAAI08 tutorial: visual object recognition
AAAI08 tutorial: visual object recognition
 
Human activity recognition
Human activity recognitionHuman activity recognition
Human activity recognition
 
Human activity recognition
Human activity recognition Human activity recognition
Human activity recognition
 
Lbp based edge-texture features for object recoginition
Lbp based edge-texture features for object recoginitionLbp based edge-texture features for object recoginition
Lbp based edge-texture features for object recoginition
 
Local binary pattern
Local binary patternLocal binary pattern
Local binary pattern
 
5. perception
5. perception5. perception
5. perception
 

Similar to Mit6870 orsu lecture2

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
GROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCEGROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCEijaia
 
3 d recognition via 2-d stage associative memory kunal
3 d recognition via 2-d stage associative memory kunal3 d recognition via 2-d stage associative memory kunal
3 d recognition via 2-d stage associative memory kunalKunal Kishor Nirala
 
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesCSCJournals
 
OBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEYOBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEYJournal For Research
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptxssuser2023c6
 
Cvpr2007 object category recognition p3 - discriminative models
Cvpr2007 object category recognition   p3 - discriminative modelsCvpr2007 object category recognition   p3 - discriminative models
Cvpr2007 object category recognition p3 - discriminative modelszukun
 
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1zukun
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxK Manjunath
 
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...Nexgen Technology
 
REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES
REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES
REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES IAEME Publication
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Aalto University
 
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...AnuragVijayAgrawal
 
auto-assistance system for visually impaired person
auto-assistance system for visually impaired personauto-assistance system for visually impaired person
auto-assistance system for visually impaired personshahsamkit73
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptAbdullah Gubbi
 
The deep learning technology on coco framework full report
The deep learning technology on coco framework full reportThe deep learning technology on coco framework full report
The deep learning technology on coco framework full reportJIEMS Akkalkuwa
 

Similar to Mit6870 orsu lecture2 (20)

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
GROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCEGROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCE
 
3 d recognition via 2-d stage associative memory kunal
3 d recognition via 2-d stage associative memory kunal3 d recognition via 2-d stage associative memory kunal
3 d recognition via 2-d stage associative memory kunal
 
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real Images
 
OBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEYOBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEY
 
160713
160713160713
160713
 
Clusterix at VDS 2016
Clusterix at VDS 2016Clusterix at VDS 2016
Clusterix at VDS 2016
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
 
Pca seminar final report
Pca seminar final reportPca seminar final report
Pca seminar final report
 
Cvpr2007 object category recognition p3 - discriminative models
Cvpr2007 object category recognition   p3 - discriminative modelsCvpr2007 object category recognition   p3 - discriminative models
Cvpr2007 object category recognition p3 - discriminative models
 
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...
 
REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES
REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES
REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"
 
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
 
auto-assistance system for visually impaired person
auto-assistance system for visually impaired personauto-assistance system for visually impaired person
auto-assistance system for visually impaired person
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition ppt
 
The deep learning technology on coco framework full report
The deep learning technology on coco framework full reportThe deep learning technology on coco framework full report
The deep learning technology on coco framework full report
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Mit6870 orsu lecture2

  • 1. Lecture 2 Overview on object recognition and one practical example 6.870 Object Recognition and Scene Understanding http://people.csail.mit.edu/torralba/courses/6.870/6.870.recognition.htm
  • 2.
  • 3. Object recognition Is it really so hard? Output of normalized correlation This is a chair Find the chair in this image
  • 4. Object recognition Is it really so hard? My biggest concern while making this slide was: how do I justify 50 years of research, and this course, if this experiment did work? Pretty much garbage Simple template matching is not going to make it Find the chair in this image
  • 5. Object recognition Is it really so hard? A “popular method is that of template matching, by point to point correlation of a model pattern with the image pattern. These techniques are inadequate for three-dimensional scene analysis for many reasons, such as occlusion, changes in viewing angle, and articulation of parts.” Nivatia & Binford, 1977. Find the chair in this image
  • 6. So, let’s make the problem simpler: Block world Nice framework to develop fancy math, but too far from reality… Object Recognition in the Geometric Era: a Retrospective. Joseph L. Mundy. 2006
  • 7. Binford and generalized cylinders Object Recognition in the Geometric Era: a Retrospective. Joseph L. Mundy. 2006
  • 9. Recognition by components Irving Biederman Recognition-by-Components: A Theory of Human Image Understanding. Psychological Review, 1987.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Stages of processing “ Parsing is performed, primarily at concave regions, simultaneously with a detection of nonaccidental properties.”
  • 15. Non accidental properties Certain properties of edges in a two-dimensional image are taken by the visual system as strong evidence that the edges in the three-dimensional world contain those same properties. Non accidental properties, (Witkin & Tenenbaum,1983): Rarely be produced by accidental alignments of viewpoint and object features and consequently are generally unaffected by slight variations in viewpoint. e.g., Freeman, “the generic viewpoint assumption”, 1994 ? image
  • 16.
  • 17. The high speed and accuracy of determining a given nonaccidental relation {e.g., whether some pattern is symmetrical) should be contrasted with performance in making absolute quantitative judgments of variations in a single physical attribute, such as length of a segment or degree of tilt or curvature. Object recognition is performed by humans in around 100ms.
  • 18. “If contours are deleted at a vertex they can be restored, as long as there is no accidental filling-in. The greater disruption from vertex deletion is expected on the basis of their importance as diagnostic image features for the components.” Recoverable Unrecoverable
  • 19. From generalized cylinders to GEONS “ From variation over only two or three levels in the nonaccidental relations of four attributes of generalized cylinders, a set of 36 GEONS can be generated.” Geons represent a restricted form of generalized cylinders.
  • 22. Scenes and geons Mezzanotte & Biederman
  • 23. The importance of spatial arrangement
  • 24. Supercuadrics Introduced in computer vision by A. Pentland, 1986.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. Discriminative methods Object detection and recognition is formulated as a classification problem. Decision boundary … and a decision is taken at each window about if it contains a target object or not. Where are the screens? The image is partitioned into a set of overlapping windows Bag of image patches Computer screen Background In some feature space
  • 32. Discriminative methods 10 6 examples Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005 … Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines and Kernels Conditional Random Fields McCallum, Freitag, Pereira 2000 Kumar, Hebert 2003 … Guyon, Vapnik Heisele, Serre, Poggio, 2001 …
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. Toy example Weak learners from the family of lines h => p(error) = 0.5 it is at chance Each data point has a class label: w t =1 and a weight: +1 ( ) -1 ( ) y t =
  • 41. Toy example This one seems to be the best Each data point has a class label: w t =1 and a weight: This is a ‘ weak classifier ’: It performs slightly better than chance. +1 ( ) -1 ( ) y t =
  • 42. Toy example We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: w t w t exp{-y t H t } We update the weights: +1 ( ) -1 ( ) y t =
  • 43. Toy example We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: w t w t exp{-y t H t } We update the weights: +1 ( ) -1 ( ) y t =
  • 44. Toy example We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: w t w t exp{-y t H t } We update the weights: +1 ( ) -1 ( ) y t =
  • 45. Toy example We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: w t w t exp{-y t H t } We update the weights: +1 ( ) -1 ( ) y t =
  • 46. Toy example The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers. f 1 f 2 f 3 f 4
  • 47.
  • 48.
  • 49.
  • 50. Exponential loss -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 3.5 4 Squared error Exponential loss yF(x) = margin Misclassification error Loss Squared error Exponential loss
  • 51. Boosting Sequential procedure. At each step we add For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998) to minimize the residual loss input Desired output Parameters weak classifier
  • 52.
  • 53.
  • 54. gentleBoosting.m function classifier = gentleBoost (x, y, Nrounds) … for m = 1:Nrounds fm = selectBestWeakClassifier (x, y, w); w = w .* exp(- y .* fm); % store parameters of fm in classifier … end Solve weighted least-squares Re-weight training samples Initialize weights w = 1
  • 55. Demo gentleBoosting > demoGentleBoost.m Demo using Gentle boost and stumps with hand selected 2D data:
  • 56.
  • 57.
  • 58.
  • 59. Object recognition Is it really so hard? But what if we use smaller patches? Just a part of the chair? Find the chair in this image
  • 60. Parts But what if we use smaller patches? Just a part of the chair? Seems to fire on legs… not so bad Find a chair in this image
  • 61.
  • 62.
  • 63. Haar wavelets Papageorgiou & Poggio (2000) Polynomial SVM
  • 64. Edges and chamfer distance Gavrila, Philomin, ICCV 1999
  • 65. Edge fragments Weak detector = k edge fragments and threshold. Chamfer distance uses 8 orientation planes Opelt, Pinz, Zisserman, ECCV 2006
  • 66.
  • 67.
  • 68.
  • 69. Weak detectors First we collect a set of part templates from a set of training objects. Vidal-Naquet, Ullman (2003) …
  • 70. Weak detectors We now define a family of “weak detectors” as: = = Better than chance *
  • 71. Weak detectors We can do a better job using filtered images Still a weak detector but better than before * * = = =
  • 72. Training First we evaluate all the N features on all the training images. Then, we sample the feature outputs on the object center and at random locations in the background:
  • 73. Representation and object model Selected features for the screen detector Lousy painter … 4 10 1 2 3 … 100
  • 74. Representation and object model Selected features for the car detector 1 2 3 4 10 100 … …
  • 75.
  • 76. Example: screen detection Feature output
  • 77. Example: screen detection Feature output Thresholded output Weak ‘detector’ Produces many false alarms.
  • 78. Example: screen detection Feature output Thresholded output Strong classifier at iteration 1
  • 79. Example: screen detection Feature output Thresholded output Strong classifier Second weak ‘detector’ Produces a different set of false alarms.
  • 80. Example: screen detection + Feature output Thresholded output Strong classifier Strong classifier at iteration 2
  • 81. Example: screen detection + … Feature output Thresholded output Strong classifier Strong classifier at iteration 10
  • 82. Example: screen detection + … Feature output Thresholded output Strong classifier Adding features Final classification Strong classifier at iteration 200
  • 83. Maximal suppression Detect local maximum of the response. We are only allowed detecting each object once. The rest will be considered false alarms. This post-processing stage can have a very strong impact in the final performance.
  • 84.
  • 85. Demo > runDetector.m Demo of screen and car detectors using parts, Gentle boost, and stumps:
  • 86.
  • 87.
  • 88.
  • 89.

Editor's Notes

  1. The first question that we might have, specially after seeing the other parts of this course, is why to use a discriminative approach? This slide shows one of the arguments some times used to illustrate the advantage that a discriminative framework might have over a generative framework. It is important to not that most of the time, generative models are also lousy painters. Generative models do not try to have a full generative model up to the pixel level. In fact, those cases, as the ones reviewed before in which the generative model is applied only to a set of sparse features are in fact a mixture of a discriminative method (for the low level vision part) and a generative method (for the high level vision part). Even if there is no discriminative training, the choice of the representation is given by our experience as vision researchers exploring alternatives until one works: just as a discriminative method will do.
  2. Object detection and recognition is formulated as a classification problem. The image is partitioned into a set of overlapping windows, and a decision is taken at each window about if it contains a target object or not. Each window is represented by extracting a large number of features that encode information such as boundaries, textures, color, spatial structure. The classification function, that maps an image window into a binary decision, is learnt using methods such as SVMs, boosting or neural networks.
  3. In many cases, F(x) will give a real value and the classification is perform with a Threshold. If F(x)&gt;0 then y =+1
  4. In the next part I will focus on particular technique, and I will provide a detailed description of things you need to know to make it work. This part is accompanied by a Matlab toolbox that you can find on the web site of the class. Is entirely self contained and written in matlab. It has an implementation of gentle boosting, and an object detector. The demo shows results on detecting screens and cars and can be easily trained to detect other objects.
  5. One of the first papers to use boosting for a vision task. Take one filter, filter the image, take the absolute value and downsample. Take another filter, take absolute value of the output and downsample. Do this once more. Then, for each color channel independently, sum the value of all the pixels and the resulting number will be one of the features. If you do this for all triples of filters, you’ll end up with 46875 features.
  6. No need of image pyramid in order to achieve scale invariance. Very efficient. Works well for faces which are objects well defined by the intensity pattern. Combined with a cascade, achieves real time performance for face detection even on large images.
  7. This will be better than a full object template because it allows for some flexibility in the location of the parts. It is still easy to compute.
  8. The simplest weak detector will be using template matching. In this case we use template matching to detect the presence of a part of the object. Once the presence is detected it will vote for the object in the center of the object coordinates frame. In this example, we are trying to detect a computer screen. We use the top left corner of the screen. The first thing we do is compute a normalize correlation between the image and the template. This will give a pick in the response near the top-left corner of the screen and, of course, many other matches that do not correspond to screen. Now we displace the output of the correlation taking into account the relative positions between the top left corner of the screen and the center of the screen. We also blur the output. Now, if we apply a threshold we can see that the center of the screen has a value above the threshold while most of the background gets suppressed. There are a lot of other locations that also are above the threshold. Therefore, this is not a good detector. But it is better than chance!
  9. The simplest weak detector will be using template matching. In this case we use template matching to detect the presence of a part of the object. Once the presence is detected it will vote for the object in the center of the object coordinates frame. In this example, we are trying to detect a computer screen. We use the top left corner of the screen. The first thing we do is compute a normalize correlation between the image and the template. This will give a pick in the response near the top-left corner of the screen and, of course, many other matches that do not correspond to screen. Now we displace the output of the correlation taking into account the relative positions between the top left corner of the screen and the center of the screen. We also blur the output. Now, if we apply a threshold we can see that the center of the screen has a value above the threshold while most of the background gets suppressed. There are a lot of other locations that also are above the threshold. Therefore, this is not a good detector. But it is better than chance!
  10. So, now we have to train. We need positive and negative examples. So we do the next thing: get positive samples by taking the values of all the features in the center of the target. Then take negative samples from the background.
  11. In a probabilistic framework, opposed to the generative model, here we fit a conditional probability. Boosting is a way of fitting a logistic regression.
  12. Finally, the object model is very similar to the one built with the generative model. Here, invariance in translation and scale is achieved by the search strategy: the classifier is evaluated at all locations (by translating the image) and at all scales (by reescaling the image in small steps).