SlideShare una empresa de Scribd logo
1 de 134
Descargar para leer sin conexión
Visual Object Recognition
Vi l Obj R          ii
Perceptual Computing Seminar
Perceptual Computing Seminar
Sergio Escalera,  Xavier Baró, Jordi Vitrià, Petia Radeva, Oriol Pujol
                   BCN Perceptual Computing Lab
Index
1. Introduction
2. Recognition with Local Features: Basics. 
3.
3  Invariant representations: SIFT
   I    i               i      SIFT
4. Recognition as a Classification Problem: 
        g
   FERNS
5. Very large databases: Hashing
5 Very large databases Hashing




           Visual Object Recognition                 Perceptual Computing Seminar                        Page 2
Introduction
The recognition of object categories in images
  is one of the most challenging problems in
  computer vision especially when the number
             vision,
  of categories is large.

Humans are able to recognize thousands of
 object types, whereas most of the existing
 object recognition systems are trained to
    j        g         y
 recognize only a few.

           Visual Object Recognition                 Perceptual Computing Seminar                        Page 3
Introduction




Invariance t i
I    i     to viewpoint, illumination, “shape”, color, scale, texture, etc.
                    i t ill i ti       “h    ” l          l t t         t

            Visual Object Recognition                 Perceptual Computing Seminar                        Page 4
Introduction
Why do we care about recognition? (theoretical question)
  y                      g        (            q       )

Perception of function: We can perceive the
      p                        p
  3D shape, texture, material properties,
  without knowing about objects But the
                          objects. But,
  concept of category encapsulates also
  information about what can we d with
  i f    ti      b t h t           do ith
  those objects.


              Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:
                     Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.
              Visual Object Recognition                 Perceptual Computing Seminar                        Page 5
Introduction
Why it is hard?
  y
                       Find the chair in this image                                 Output of correlation




 This is a chair




                   Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:
                          Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.
                   Visual Object Recognition                 Perceptual Computing Seminar                        Page 6
Introduction
Why it is hard?
  y




    Find the chair in this image                       Pretty much garbage; Simple template 
                                                       P tt      h     b     Si l t      l t
                                                          matching is not going to make it

                  Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:
                         Year 2009, ICCV 2009 Kyoto, Short Course, September 24.
                  Visual Object Recognition                 Perceptual Computing Seminar                        Page 7
Introduction
Why do we care about recognition? (practical question)




              Visual Object Recognition                 Perceptual Computing Seminar                        Page 8
Introduction
Why do we care about recognition? (practical question)




              Visual Object Recognition                 Perceptual Computing Seminar                        Page 9
Introduction
Why do we care about recognition (practical question)?




        Query     Results from 5k Flickr images (demo available for 100k set)


                James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, Andrew Zisserman: Object retrieval with
                large vocabularies and fast spatial matching. CVPR 2007
                Visual Object Recognition                 Perceptual Computing Seminar                        Page 10
Recognition with Local Features
      g

It is known that the visual system can use local,
   informative image «fragments» of a given
   object, rather than the whole object, to
      j ,                                j ,
   classify it into a familiar category.

This approach has some advantages over holistic
  methods...
  methods


           Visual Object Recognition                 Perceptual Computing Seminar                        Page 11
Recognition with Local Features
    g




    Holistic                                                        Fragment‐based
                                                                       g

          Visual Object Recognition                 Perceptual Computing Seminar                        Page 12
Recognition with Local Features
    g




      Jay Hegde, Evgeniy Bart, and Daniel Kersten, "Fragment‐based learning of visual object categories", Current
      Biology, 2008.
      Visual Object Recognition                 Perceptual Computing Seminar                        Page 13
Recognition with Local Features
      g
The most basic approach is called the “bag of
  words” approach (it was inspired in
                            as
  techniques used by the natural language
  processing community).




           Visual Object Recognition                 Perceptual Computing Seminar                        Page 14
Recognition with Local Features
      g
Assumptions:
• Independent features.
    d     d   f                                                                             Fragments 
                                                                                            Fragments
                                                                                            vocabulary
• Histogram representation.                                                               (generic/class‐
                                                                                            based, etc.)
                                                                                            based etc )


                                                                                               Image 
                                                                                               Image
                                                                                                  =
                                                                                             Fragments 
                                                                                             histogram




            Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:
                   Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.
            Visual Object Recognition                 Perceptual Computing Seminar                        Page 15
Recognition with Local Features
    g
A more advanced approach involves several 
  steps:
  steps
• Stage 0: Find image locations where we can
  reliably find correspondences with other images.
• Stage 1: Image content is transformed into local
      g          g
  features     (that are invariant to translation,
  rotation, and scale).
• Stage 2: Verify if they belong to a consistent
  configuration


          Slide credit: David Lowe
          Visual Object Recognition                 Perceptual Computing Seminar                        Page 16
SIFT
A wonderful example of these stages can be found in
David Lowe’s (2004) “Distinctive image features from
       Lowe s         Distinctive
scale‐invariant keypoints” paper, which describes the
development and refinement of his Scale Invariant
Feature Transform (SIFT).




                              Local Features, e.g. SIFT
                              L   lF t


            Visual Object Recognition                 Perceptual Computing Seminar                        Page 17
Recognition with Local Features
    g
                 Which local features?




                                    ?


                                                Slide credit: A. Efros
      Visual Object Recognition                 Perceptual Computing Seminar                        Page 18
SIFT
Stage 0: How can we find image locations where we can reliably find
correspondences with other images?



A “good” location has one stable sharp extremum.


                                                                            f                    Good !
f                            f

        bad                                   bad
           x                                          x                                              x




                Visual Object Recognition                 Perceptual Computing Seminar                        Page 19
SIFT




Visual Object Recognition                 Perceptual Computing Seminar                        Page 20
SIFT
Stage 0: How can we find image locations where we can reliably find
correspondences with other images?
How to compute extrema at a given scale:

1) We apply a Gaussian filter:



2) We compute a difference‐of‐Gaussians




3) We look for 3D extrema in the resulting structure. 

                Visual Object Recognition                 Perceptual Computing Seminar                        Page 21
SIFT




Visual Object Recognition                 Perceptual Computing Seminar                        Page 22
SIFT
These features are invariant to location and scale




 Visual Object Recognition                 Perceptual Computing Seminar                        Page 23
SIFT
Stage 1: Image content is transformed into local features (that are invariant
to translation, rotation, and scale).


In addition to dealing with scale changes, we need to
deal with (at least) in‐plane image rotation.

One way to deal with this problem is to design
descriptors that are rotationally invariant, but such
descriptors have poor discriminability, i.e. they map
different looking patches to the same descriptor.



                  Visual Object Recognition                 Perceptual Computing Seminar                        Page 24
SIFT

A better method is to estimate a dominant
orientation at each detected keypoint.

1.Calculate histogram of local gradients in the window

2.Take the dominant orientation gradient as “up”

3.Rotate local area for computing descriptor



             Visual Object Recognition                 Perceptual Computing Seminar                        Page 25
SIFT
Lowe:
• computes a 36‐bin histogram of edge orientations
weighted by both gradient magnitude and Gaussian
distance to the center,

• finds all peaks within 80% of the global maximum,
and then

• computes a more accurate orientation estimate
using a 3‐bin parabolic fit.


            Visual Object Recognition                 Perceptual Computing Seminar                        Page 26
SIFT




Visual Object Recognition                 Perceptual Computing Seminar                        Page 27
SIFT




Local patch around descriptor             Gradient magnitude                             Gradient orientation
   from Gaussian pyramid




                         Visual Object Recognition                 Perceptual Computing Seminar                        Page 28
SIFT




Visual Object Recognition                 Perceptual Computing Seminar                        Page 29
SIFT




Visual Object Recognition                 Perceptual Computing Seminar                        Page 30
SIFT
Even after compensating for translation,
rotation,
rotation and scale changes the local
                        changes,
appearance of image patches will usually still
vary from image to image.

How can we make the descriptor that we match
more invariant to such changes while still
                           changes,
preserving discriminability between different
(non‐corresponding)
(non corresponding) patches?


           Visual Object Recognition                 Perceptual Computing Seminar                        Page 31
SIFT
SIFT features are formed by computing the gradient at
each pixel in a 16x16 window around the d
   h      l                d          d h detected  d
keypoint, using the appropriate level of the Gaussian
pyramid at which the k
       id    hi h h keypoint was d
                           i       detected.
                                          d




The
Th gradient magnitudes are d
       di t        it d       downweighted b a G
                                       i ht d by Gaussian f ll ff f ti
                                                        i fall‐off function
in order to reduce the influence of gradients far from the center, as these
are more affected by small misregistrations.

                  Visual Object Recognition                 Perceptual Computing Seminar                        Page 32
SIFT
In each 4x4 quadrant, a gradient orientation
histogram is formed b (concept all ) adding
                      by (conceptually)
the weighted gradient value to one of 8
orientation histogram bins.




          Visual Object Recognition                 Perceptual Computing Seminar                        Page 33
SIFT
The resulting 128 non negative values form a
                   non‐negative
raw version of the SIFT descriptor vector.

To reduce the effects of contrast/gain (additive
variations are already removed by the
gradient), the 128‐D vector is normalized to
               128 D
unit length.



           Visual Object Recognition                 Perceptual Computing Seminar                        Page 34
SIFT
Once we have extracted features and their descriptors
from two or more images the next step is to establish
                 images,
some preliminary feature matches between these
images.
images




            Visual Object Recognition                 Perceptual Computing Seminar                        Page 35
SIFT
Once we have extracted features and their descriptors
from two or more images the next step is to establish
                 images,
some preliminary feature matches between these
images.
images
SIFT uses a nearest neighbor classifier with a distance ratio
matching criterion We can define this nearest neighbor
           criterion.
distance ratio as



where d1 and d2 are the nearest and second nearest neighbor
distances, and DA…..DC are the target descriptor along with its
closest two neighbors
            neighbors.
               Visual Object Recognition                 Perceptual Computing Seminar                        Page 36
SIFT




Visual Object Recognition                 Perceptual Computing Seminar                        Page 37
SIFT
Linear method:
The simplest way to find all corresponding
feature points is to compare all features
against all other features in each pair of
potentially matching images.
Unfortunately, this is quadratic in the
  f          l    h           d         h
number of extracted features, which makes it
impractical for some applications.

         Visual Object Recognition                 Perceptual Computing Seminar                        Page 38
SIFT
Nearest‐neighbor matching is the major
computational bottleneck:

  • Linear search performs dn2 operations for n
  feature points and d dimensions
  • No exact NN methods are faster than linear
  search for d>10
  • Approximate methods can be much faster, but
  at the cost of missing some correct matches
                                            matches.
  Failure rate gets worse for large datasets.

           Visual Object Recognition                 Perceptual Computing Seminar                        Page 39
SIFT
A better approach is to devise an indexing structure
such as a multi‐dimensional search tree or a hash
table to rapidly search for features near a given
feature.

For extremely large databases (millions of images or
more), even more efficient structures based on
ideas from document retrieval (e.g., vocabulary
trees) can be used.



           Visual Object Recognition                 Perceptual Computing Seminar                        Page 40
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration
The first step is to establish a set of putative
  correspondences.




          Visual Object Recognition                 Perceptual Computing Seminar                        Page 41
SIFT




How can we discard erroneous correspondences?



         Visual Object Recognition                 Perceptual Computing Seminar                        Page 42
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration
Once we have some hypothetical (putative)
 matches, we can use geometric alignment
 to
 t verify which matches are i li
       if   hi h     t h      inliers and
                                        d
 which ones are outliers.




         Visual Object Recognition                 Perceptual Computing Seminar                        Page 43
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration




•   Extract features
•   Compute putative matches

            Visual Object Recognition                 Perceptual Computing Seminar                        Page 44
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration




•   Loop:
    – Hypothesize transformation T (using a small group of putative 
      matches that are related by T)
      matches that are related by T)

              Visual Object Recognition                 Perceptual Computing Seminar                        Page 45
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration




•   Loop:
    – Hypothesize transformation T (small group of putative matches that 
      are related by T)
    – Verify transformation (search for other matches consistent with T)
              Visual Object Recognition                 Perceptual Computing Seminar                        Page 46
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration




         Visual Object Recognition                 Perceptual Computing Seminar                        Page 47
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration
2D transformation models:
• Similarity
  (translation, 
  (translation,
  scale, rotation)

• Affine

• Projective
  (homography)

            Visual Object Recognition                 Perceptual Computing Seminar                        Page 48
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration
Fitting an affine transformation (given the point
   correspondences):
           ( xi , yi )
                                                                      ( xi, yi)




                                                    Slide credit: S. Lazebnik
          Visual Object Recognition                 Perceptual Computing Seminar                        Page 49
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration
Fitting an affine transformation (given the point
   correspondences):

                                                                                       m1 
                                                                                       
                                                                                   m2  
 xi   m1   m2   xi   t1                 x       yi     0 0             1 0  m3   xi 
 y   m         y   t                   i                                    
 i  3       m4   i   2                   0       0      xi yi           0 1 m4   yi 
                                                                                      
                                                                                   t1     
                                                                                       
                                                                                       
                                                                                       t2 

                                                          Slide credit: S. Lazebnik
                Visual Object Recognition                 Perceptual Computing Seminar                        Page 50
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration
Fitting an affine transformation (given the point
  correspondences):

• Linear system with six unknowns
• Each match gives us two linearly independent equations: 
  need at least three to solve for the transformation 
      d l        h         l f h            f
  parameters
• C
  Can solve Ax=b using pseduo‐inverse:
         l A b i            d i
                     x = (ATA)‐1ATb      
                                                      Slide credit: S. Lazebnik
            Visual Object Recognition                 Perceptual Computing Seminar                        Page 51
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration
Fitting an affine transformation (given the point
  correspondences):

• Linear system with six unknowns
• Each match gives us two linearly independent equations: 
  need at least three to solve for the transformation 
      d l        h         l f h            f
  parameters
• C
  Can solve Ax=b using pseduo‐inverse:
         l A b i            d i
                     x = (ATA)‐1ATb      
                                                      Slide credit: S. Lazebnik
            Visual Object Recognition                 Perceptual Computing Seminar                        Page 52
SIFT
Stage 2: Verify if they belong to a consistent
  configuration.
  config ration

The process of selecting a small set of seed
  matches and then verifying a larger set is
                         y g       g
  often called random sampling or RANSAC.




         Visual Object Recognition                 Perceptual Computing Seminar                        Page 53
RANSAC
RANSAC was originally formulated in Martin A. Fischler and Robert C. Bolles (June
  1981). "Random Sample Consensus: A Paradigm for Model Fitting with
  Applications to Image Analysis and Automated Cartography". Comm. of the
   pp                g       y                           g p y
  ACM 24: 381–395.




                    Visual Object Recognition                 Perceptual Computing Seminar                        Page 54
RANSAC
“We approached the fitting problem in the opposite way from most previous
  techniques. Instead of averaging all the measurements and then trying to
  throw out bad ones we used the smallest number of measurements to
                 ones,
  compute a model’s unknown parameters and then evaluated the
  instantiated model by counting the number of consistent samples”




                  From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.
                  Visual Object Recognition                 Perceptual Computing Seminar                        Page 55
RANSAC
It’s easy to understand and it’s effective

• It helps solve a common problem (i.e., filter out gross errors
  introduced by automatic techniques)

• The number of trials to “guarantee” a high level of success
  (e.g., 99.99
  (e g 99 99 probability) is surprisingly small

• The dramatic increase in computation speed made it possible
  to do a large number of trials (100s or 1000s)

• The algorithm can stop as soon as a good match is computed
  (unlike Hough techniques that typically compute a large
  number of examples and then identify matches)
                 From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.
                 Visual Object Recognition                 Perceptual Computing Seminar                        Page 56
RANSAC
The basic idea is to repeat M times the following process:
1. A model is fitted to the hypothetical inliers, i.e. all free parameters of the
                             yp                   ,             p
model are reconstructed from the data set.
2. All other data are then tested against the fitted model and, if a point fits
well to the estimated model also considered as a hypothetical inlier
                      model,                                      inlier.
3. The estimated model is reasonably good if sufficiently many points have
been classified as hypothetical inliers.
4. The model is reestimated from all hypothetical inliers, because it has only
been estimated from the initial set of hypothetical inliers.
5. Finally,
5 Finally the model is evaluated by estimating the error of the inliers relative
to the model.
This procedure is repeated a fixed number of times, each time producing
either a model which is rejected because too few points are classified as inliers
or a refined model together with a corresponding error measure. In the latter
case, we keep the refined model if its error is lower than the last saved model.
     ,       p
                    From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.
                    Visual Object Recognition                 Perceptual Computing Seminar                        Page 57
RANSAC




Visual Object Recognition                 Perceptual Computing Seminar                        Page 58
RANSAC
Line fitting example:
Line fitting example:




                                                               Task:
                                                         Estimate best line
                                                          st ate best e
             Visual Object Recognition                 Perceptual Computing Seminar                        Page 59
RANSAC
Line fitting example:
Line fitting example:




                                                        Sample two points

             Visual Object Recognition                 Perceptual Computing Seminar                        Page 60
RANSAC
Line fitting example:
Line fitting example:




                                                                       Fit Line

             Visual Object Recognition                 Perceptual Computing Seminar                        Page 61
RANSAC
Line fitting example:
Line fitting example:




                                                     Total number of points 
                                                    within a threshold of line.

             Visual Object Recognition                 Perceptual Computing Seminar                        Page 62
RANSAC
Line fitting example:
Line fitting example:




                                                         Repeat, until get a 
                                                            good esu t
                                                            good result
             Visual Object Recognition                 Perceptual Computing Seminar                        Page 63
RANSAC
Line fitting example:
Line fitting example:




                                                         Repeat, until get a 
                                                            good esu t
                                                            good result
             Visual Object Recognition                 Perceptual Computing Seminar                        Page 64
RANSAC




Visual Object Recognition                 Perceptual Computing Seminar                        Page 65
RANSAC example: translation
           p




                        Putative matches


                                              Slide credit: A. Efros
    Visual Object Recognition                 Perceptual Computing Seminar                        Page 66
RANSAC example: translation
           p




              Select one match, count inliers


                                              Slide credit: A. Efros
    Visual Object Recognition                 Perceptual Computing Seminar                        Page 67
RANSAC example: translation
           p




            Find “average” translation vector


                                              Slide credit: A. Efros
    Visual Object Recognition                 Perceptual Computing Seminar                        Page 68
RANSAC
                                                  Interest points
                                                  (500/image)
                                                  (    /      )




                                                  Putative correspondences 
                                                  (268)

                                                  Outliers (117)

                                                  Inliers (151)

                                                  Final inliers (262)



Visual Object Recognition                 Perceptual Computing Seminar                        Page 69
SIFT Applications
        pp




Visual Object Recognition                 Perceptual Computing Seminar                        Page 70
SIFT Applications
        pp




Visual Object Recognition                 Perceptual Computing Seminar                        Page 71
SIFT Applications
        pp




                                                       HDRSoft
Visual Object Recognition                 Perceptual Computing Seminar                        Page 72
SIFT Applications
        pp




Visual Object Recognition                 Perceptual Computing Seminar                        Page 73
Matching and Classification
          g

SIFT allows reliable real‐time recognition but
  at a computational cost that severely limits
  the number of points that can be handled.

A standard implementation requires 1 ms per
  feature point which limits the number of
          point,
  feature points to 50 per frame if one‐
  requires frame rate performance
           frame‐rate performance.


         Visual Object Recognition                 Perceptual Computing Seminar                        Page 74
Matching and Classification
           g

An alternative is to rely on statistical learning
  techniques to model the set of possible
  appearances of a patch.

The major challenge is to use simple models
  to allow for real time efficient recognition
               real‐time,          recognition.




          Visual Object Recognition                 Perceptual Computing Seminar                        Page 75
Matching and Classification
           g

Can we match keypoints using simpler
  features without intensive preprocessing?


       ?:{                                                    …                                 }
We will assume that we have the possibility
                                     p          y
 to train a classifier for each keypoint class.

          Visual Object Recognition                 Perceptual Computing Seminar                        Page 76
Matching and Classification
          g
           Simple binary features                                                           I(mi,1)

                                                                                            I(m
                                                                                            I( i,2)




The test compares the intensities of two
  pixels around the keypoint:

             1 if I(mii,1 )  I(mii,2 )
        fi  
              0 otherwise
        Visual Object Recognition                 Perceptual Computing Seminar                        Page 77
Matching and Classification
          g
     Without intensive preprocessing
We can synthetically generate the set of
 keypoint’s possible appearances under
 various perspective, lighting, noise, etc.




         Visual Object Recognition                 Perceptual Computing Seminar                        Page 78
Matching and Classification
          g
                  FERN Formulation

We model the class conditional probabilities
 of a large number of binary features which
 are estimated by a training phase.
                y          gp

At run time, these probabilities are used to
  select the best match for a given image
  patch.
  patch


         Visual Object Recognition                 Perceptual Computing Seminar                        Page 79
Matching and Classification
             g
                          FERN Formulation

fi : Binary feature.
Nf : Total number of features in the model.
Ck : Class representing all views of an image patch
   around a keypoint.

Given f1 ,..., f Nf select the class k such that

k  arg max P(Ck | f1 , f 2 ,  , f N f )  arg max P( f1 , f 2 , , f N f | Ck )
         k                                                          k


                Mustafa Ozuysal, Michael Calonder, Vincent Lepetit, Pascal Fua, "Fast Keypoint Recognition Using Random
                Ferns," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 99, , 2009
                Visual Object Recognition                 Perceptual Computing Seminar                        Page 80
Matching and Classification
           g
                      FERN Formulation

However, it is not practical to model the joint
  distribution of all features. We group features
  into small sets (fern) and assume independence
  between these sets (Semi‐Naïve Bayesian
  Classifier):
Fj : A fern is defined to be the set of S binary
  features {fr ,..., fr+S }.
                       +S

M is the number of ferns, Nf = S X M.

             Visual Object Recognition                 Perceptual Computing Seminar                        Page 81
Matching and Classification
           g
                       FERN Formulation

P( f1 , f 2 ,  , f N f | Ck )  2
                                              Nf
                                                   p
                                                   parameters!
                                         Nf


P( f1 , f 2 ,  , f N f | Ck )   P ( f i | Ck ) N f parameters,
                                                       p
                                         i 1

                                                                               but too simple.
                                         M
P( f1 , f 2 ,  , f N f | Ck )   P ( F j | Ck )  M  2 s parameters.
                                         j 1




              Visual Object Recognition                 Perceptual Computing Seminar                        Page 82
Matching and Classification
          g
          FERN Implementation
We generate a random set of binary features.
        A binary feature outputs a binary number
               y            p           y


                                                        2 
                                                   possibilities




                                                                                               8
                                                                                          possibilities
                                                                                              ibili i


        A fern with S nodes outputs a number between o and 2S‐1
        A fern with S nodes outputs a number between o and 2 ‐1.

         Visual Object Recognition                 Perceptual Computing Seminar                        Page 83
Matching and Classification
       g
        FERN Implementation




   When we have multiple patches of the same                                         Probability 
   class we can model the output of a fern with                                      for each 
   a multinomial distribution.                                                       possibility.




    Visual Object Recognition                 Perceptual Computing Seminar                        Page 84
Matching and Classification
       g




    Slide Credit: V.Lepetit
    Visual Object Recognition                 Perceptual Computing Seminar                        Page 85
Matching and Classification
           g
     0


     1


     1




6
         Slide Credit: V.Lepetit
         Visual Object Recognition                 Perceptual Computing Seminar                        Page 86
Matching and Classification
           g
     0                                       1


     1                                       0


     1                                       0




                             1




6
         Slide Credit: V.Lepetit
         Visual Object Recognition                 Perceptual Computing Seminar                        Page 87
Matching and Classification
           g
     0                                       1                                             1


     1                                       0                                             0


     1                                       0                                             1




                             1




                                                                        5
6
         Slide Credit: V.Lepetit
         Visual Object Recognition                 Perceptual Computing Seminar                        Page 88
Matching and Classification
       g




    Slide Credit: V.Lepetit
    Visual Object Recognition                 Perceptual Computing Seminar                        Page 89
Matching and Classification
       g




       Normalize:
       N    li
          P( f , f     1     2   , , f n | C  c i )  1
          000
          001
              
          
          111




    Slide Credit: V.Lepetit
    Visual Object Recognition                 Perceptual Computing Seminar                        Page 90
Matching and Classification
          g
             FERN Implementation

At the end of the training we have
  distributions over possible fern outputs for
  each class
        class.




         Visual Object Recognition                 Perceptual Computing Seminar                        Page 91
Matching and Classification
          g
             FERN Implementation

To recognize a new patch the outputs selects
rows of distributions for each fern and these
are then combined assuming independence
between distributions.




         Visual Object Recognition                 Perceptual Computing Seminar                        Page 92
Matching and Classification
       g




    Visual Object Recognition                 Perceptual Computing Seminar                        Page 93
Matching and Classification
         g
             FERN Implementation
             …in 10 lines of code….

 1: for(int i = 0; i < H; i++) P[i ] = 0.;
 2: for(int k = 0; k < M; k++) {
 3:   int index = 0, * d = D + k * 2 * S;
 4:   for(int j = 0; j < S; j++) {
 5:     index <<= 1;
 6:     if (*(K + d[0]) < *(K + d[1]))
 7:       index++;
 8:     d += 2;
      }
 9:   p = PF + k * shift2 + index * shift1;
10:   for(int i = 0; i < H; i++) P[i] += p[i];
     }


         Visual Object Recognition                 Perceptual Computing Seminar                        Page 94
Matching and Classification
       g




    Visual Object Recognition                 Perceptual Computing Seminar                        Page 95
Matching and Classification
       g




    Visual Object Recognition                 Perceptual Computing Seminar                        Page 96
Matching and Classification
       g




    Visual Object Recognition                 Perceptual Computing Seminar                        Page 97
Matching and Classification
       g




    Visual Object Recognition                 Perceptual Computing Seminar                        Page 98
Matching and Classification
          g

The FERN technique speeds‐up keypoint
matching but the training is slow and
performed offline.

Hence, it is not suited for applications that
require real‐time online learning or
          real time
incremental addition of arbitrary numbers
of keypoints (f e SLAM)
             (f.e. SLAM).


         Visual Object Recognition                 Perceptual Computing Seminar                        Page 99
Matching and Classification
          g

This limitation can be removed if we train a
FERN classifier to recognize a number of
keypoints extracted from a reference
database and all other keypoints are
characterized in terms of their response to
these classification ferns (signature)
                           (signature).




         Visual Object Recognition                 Perceptual Computing Seminar                        Page 100
Matching and Classification
       g




    M. Calonder, V. Lepetit, and P. Fua, Keypoint Signatures for Fast Learning and Recognition. 
    In Proceedings of European Conference on Computer Vision, 2008.
    Visual Object Recognition                 Perceptual Computing Seminar                        Page 101
Matching and Classification
          g
It can be empirically shown that these
signatures are stable under changes in
viewing conditions
        conditions.
Signatures are sparse in nature if we apply a
threshold function.
Signatures do not need a training phase and
scale well with the number of classes
(nearest neighbor).

         Visual Object Recognition                 Perceptual Computing Seminar                        Page 102
Matching and Classification
          g
However, matching signatures still involves
many more elementary operations than
absolutely necessary
           necessary.

Moreover, evaluating the signatures requires
M             l i      h i                i
storing many distributions of the same size as
themselves and, therefore, large amounts of
memory. y


         Visual Object Recognition                 Perceptual Computing Seminar                        Page 103
Matching and Classification
           g
The full response vector r(p) for all J Ferns is taken
            p             (p)
to be:                                 Vectors storing the 
                                                                       probability that p is one of 
                                                                       the N reference points.
                                                                       the N reference points


where Z is a normalizer s.t. its elements sum to one.
In practice, when p truly corresponds to one of the
reference keypoints r(p) contains one element that is close
          keypoints,
to one where all others are close to zero.
Otherwise,
Otherwise it contains a few relatively large values that
correspond to reference keypoints that are similar in
appearance and small values elsewhere.
 pp

            Visual Object Recognition                 Perceptual Computing Seminar                        Page 104
Matching and Classification
           g
We can compute a sparse signature by applting a
             p        p      g          y pp g
point wise threshold function with a θ value.




It is an N‐dimensional vector with only a few non‐
                                      y
zero elements that is mostly invariant to different
imaging conditions and therefore presents a useful
     g g                          p
descriptor for matching purposes.


           Visual Object Recognition                 Perceptual Computing Seminar                        Page 105
Matching and Classification
                     g
                                                  The patch


                                                                     J Ferns




Vectors storing 
Vectors storing
the probability 
that p is one of 
the N reference 
points.




                                    Typical parameters: 
                                    J 50; d 10; N 500
                                    J=50; d=10; N=500
                    Visual Object Recognition                 Perceptual Computing Seminar                        Page 106
Matching and Classification
        g




                               Typical parameters: 
                               J 50; d 10; N 500
                               J=50; d=10; N=500

We need for each of the 2d leaves in each of the J Ferns an N‐
dimensional vector of floats
                      floats.
The total memory requirement is M=bJ2d N bytes, where b is the
number of bytes to store a float (8) In practice 100MB!
                                 (8). practice,

          Visual Object Recognition                 Perceptual Computing Seminar                        Page 107
Matching and Classification
           g
Compressive Sensing literature:
• High‐dimensional sparse vectors can be
    g                p
reconstructed from their linear projections into
much lower‐dimensional spaces.
                        p
• The Johnson–Lindenstrauss lemma states that a
small set of points in a h h d
    ll     f             high‐dimensional space can
                                        l
be embedded into a space of much lower
dimension i such a way that di
di     i in       h          h distances b between
the points are nearly preserved.

           Visual Object Recognition                 Perceptual Computing Seminar                        Page 108
Matching and Classification
          g
Many kinds of matrices can be used for this
purpouse.

Random Ortho‐Projection (ROP) matrices
are a good choice and can be easily
constructed by applying a Gram‐Schmidt
             y pp y g
orthonormalization process to a random
matrix.
matrix


         Visual Object Recognition                 Perceptual Computing Seminar                        Page 109
Matching and Classification
           g

In
I mathematics th G
      th    ti the Gram–Schmidt process i a
                          S h idt          is
method for orthonormalizing a set of vectors in
an i inner product space, most commonly
                d t               t           l
the Euclidean space Rn.

The Gram–Schmidt process takes a finite, linearly
independent set S = { 1, …, vk} f k ≤ n and
i d     d t t        {v          for            d
generates an orthogonal set S' = {u1, …, uk} that
              k‐dimensional subspace of Rn as S
spans th same k di
      the              i   l b         f      S.


           Visual Object Recognition                 Perceptual Computing Seminar                        Page 110
Matching and Classification
       g




    M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐
    speed Interest Point Description and Matching. In Proceedings of International Conference on Computer 
    Vision, 2009.
    Visual Object Recognition                 Perceptual Computing Seminar                        Page 111
Matching and Classification
       g




    M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐
    speed Interest Point Description and Matching. In Proceedings of International Conference on Computer 
    Vision, 2009.
    Visual Object Recognition                 Perceptual Computing Seminar                        Page 112
Matching and Classification
       g




    M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐
    speed Interest Point Description and Matching. In Proceedings of International Conference on Computer 
    Vision, 2009.
    Visual Object Recognition                 Perceptual Computing Seminar                        Page 113
Matching and Classification
       g




This approach reduces the memory requirement when
storing the models: for N=512, M=176, the
requirements change from 93.75MB to 175B!
The CPU time is 6.3ms per an exhaustive NN matching
of 256 points (256x256)
              (256x256).
       Visual Object Recognition                 Perceptual Computing Seminar                        Page 114
Internet‐scale image databases
                  g




      Visual Object Recognition                 Perceptual Computing Seminar                        Page 115
Min HASH
How can we find similar images in 
How can we find similar images in
very large datasets? 

Can we get clusters from these
        g
images?




      Visual Object Recognition                 Perceptual Computing Seminar                        Page 116
Min HASH
Let s suppose that we choose a LARGE bag
Let’s suppose that we choose a LARGE bag‐
of‐words representation of our images and 
that we use a binary histogram.
that we use a binary histogram




       Visual Object Recognition                 Perceptual Computing Seminar                        Page 117
Min HASH
Given two different images, we can
compute their histogram intersection:




      Visual Object Recognition                 Perceptual Computing Seminar                        Page 118
Min HASH
…and their histogram union:
…and their histogram union:




       Visual Object Recognition                 Perceptual Computing Seminar                        Page 119
Min HASH
Then we can define a set similarity
measure in the following way:




That is, the number of times both images have a given
keypoint in common divided by the total number of
keypoints that are present in both images.


         Visual Object Recognition                 Perceptual Computing Seminar                        Page 120
Min HASH




Visual Object Recognition                 Perceptual Computing Seminar                        Page 121
Min HASH
We can perform clustering or matching
of an unordered set of i
 f         d d        f images with this
                                  h h
measure, but this can be used only with
a limited amount of data!
                                              The method requires 
                                                           w

                                                          d
                                                          i1
                                                                i
                                                                 2


                                              similarity evaluations, where w is 
                                              the size of the vocabulary and di is 
                                              the number of regions assigned to 
                                              th       b     f    i      i dt
                                              the i‐th visual word. 
                                              Vocabulary commonly used is 
                                              w=1.000.000. 
                                              w=1 000 000

       Visual Object Recognition                 Perceptual Computing Seminar                        Page 122
Min HASH
From can perform clustering or
matching of an unordered set of images
with this measure but this can be used
          measure,
only with a limited amount of data!

Observation: histograms for an
                   g
image are highly sparse!



      Visual Object Recognition                 Perceptual Computing Seminar                        Page 123
Min HASH
The key idea of min‐hash is to map
                 min hash
(“hash”) each row/histogram to a small
amount of data Sig(A) (the signature)
such that:

• Sig(A) is small enough.
• Rows A1 and A2 are highly similar if
Sig(A1) is highly similar to Sig(A2).
  g          g y               g


      Visual Object Recognition                 Perceptual Computing Seminar                        Page 124
Min HASH
Useful convention: we will refer to columns as
being of four types:
       A1:         1010
       A2:         1100
       Type:
        yp         abcd
We will also use “a” as the number of columns 
of type a. 
    yp
Notes:  
• Sim (A1 , A2)=a/(a+b+c)
  Sim (A A
• Most columns are type d.  


          Visual Object Recognition                 Perceptual Computing Seminar                        Page 125
Min HASH
• Imagine the columns permuted randomly in
order.
   d
• Hash each row A to h(A), the number of the
first l
fi column i which row A h a 1.
            in hi h      has

  1    0     0      1       0         π          0       1       0      0       1               h(A1) 2
                                                                                                    )=2
  1    0     0      0       0                    0       1       0      0       0               h(A2)=2



The probability that h(A1) = h(A2) is
a/(a+b+c) = Sim (A1 , A2) (the hash agree if the
first column with a 1 is a and disagree if it is of type b or c).

             Visual Object Recognition                 Perceptual Computing Seminar                        Page 126
Min HASH
If we repeat the experiment with a new
permutation of columns a l
               f l        large number of
                                   b    f
times, say 512, we get a signature
consisting of 512 column numbers for each
row.
The “similarity” of these lists (fraction of
positions in which they agree) will be very
close to the similarity of the rows (=    (
similar signatures mean similar rows!).

       Visual Object Recognition                 Perceptual Computing Seminar                        Page 127
Min HASH
In fact, it is not necessary to permute the columns: we
can hash each original column with 512 different hash
functions and keep for each row the lowest hash value of
a row in which that column has a 1, independently for
each of the 512 hash functions. Then we look for the
coincidences.

                                                                                 signature
        row                  1       0       0       1      0
        h1                   5       1       3       2      4            h1(row)=  2
        h2                   1       2       5       3      4            h2(row)=  1
        h3                   3       4       1       5      2            h3(row)= 3
                                                                           (row)=  3
        h4                   2       5       4       1      3            h4(row)=  1



         Visual Object Recognition                 Perceptual Computing Seminar                        Page 128
Min HASH

Row 1                1       0      1       1       0
Row 2                0       1      0       0       1
Row 3
R 3                  1       1      0       1       0
h1                   1       2      3       4       5           h1(row)=  1 ,  2 , 1
h2                   5       4      3       2       1           h2(row)=  2 ,  1 , 2    
                                                                  (row) 2 1 2
h3                   3       4      5       1       2           h3(row)=  1 ,  2 , 1


                         Similarities:
                         Row Row
                         Row‐Row                     Sig Sig
                                                     Sig‐Sig
          1‐2:           0/5                         0/3
          1‐3:           2/4                         3/3
          2‐3:           1/4  
                          /                          0/3
                                                      /

        Visual Object Recognition                 Perceptual Computing Seminar                        Page 129
Min Hash
For efficient retrieval, the min hashes are
grouped into n‐tuples. In this example, we can
form the following 2‐tuples:
                        h1(row)=  1 ,  2 , 1
                        h2(row)= 2 1 2
                          (row)=  2 ,  1 , 2    
                        h3(row)=  1 ,  2 , 1
                        h4(row)=  3 ,  2 , 3
                          (row) 3 , 2 , 3

The retrieval procedure then estimates the full
similarity for only those image pairs that have at
least h identical tuples out of k tuples.

        Visual Object Recognition                 Perceptual Computing Seminar                        Page 130
Min Hash
        From 100k images....
        From 100k images




Visual Object Recognition                 Perceptual Computing Seminar                        Page 131
Min Hash
        From 100k images....
        From 100k images




Visual Object Recognition                 Perceptual Computing Seminar                        Page 132
Min Hash
      From 100k images....
      From 100k images




  Representatives of the largest clusters




Visual Object Recognition                 Perceptual Computing Seminar                        Page 133
Min Hash




  Automatic localization of different buildings


Visual Object Recognition                 Perceptual Computing Seminar                        Page 134

Más contenido relacionado

La actualidad más candente

Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectIOSR Journals
 
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...ijsrd.com
 
26.motion and feature based person tracking
26.motion and feature based person tracking26.motion and feature based person tracking
26.motion and feature based person trackingsajit1975
 
Shadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective ViewShadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective Viewijtsrd
 
Object Detection and tracking in Video Sequences
Object Detection and tracking in Video SequencesObject Detection and tracking in Video Sequences
Object Detection and tracking in Video SequencesIDES Editor
 
A Novel Edge Detection Technique for Image Classification and Analysis
A Novel Edge Detection Technique for Image Classification and AnalysisA Novel Edge Detection Technique for Image Classification and Analysis
A Novel Edge Detection Technique for Image Classification and AnalysisIOSR Journals
 
IT6005 digital image processing question bank
IT6005   digital image processing question bankIT6005   digital image processing question bank
IT6005 digital image processing question bankGayathri Krishnamoorthy
 
color detection using open cv
color detection using open cvcolor detection using open cv
color detection using open cvAnil Pokhrel
 
2D FEATURES-BASED DETECTOR AND DESCRIPTOR SELECTION SYSTEM FOR HIERARCHICAL R...
2D FEATURES-BASED DETECTOR AND DESCRIPTOR SELECTION SYSTEM FOR HIERARCHICAL R...2D FEATURES-BASED DETECTOR AND DESCRIPTOR SELECTION SYSTEM FOR HIERARCHICAL R...
2D FEATURES-BASED DETECTOR AND DESCRIPTOR SELECTION SYSTEM FOR HIERARCHICAL R...gerogepatton
 
Image processing
Image processingImage processing
Image processingAnil kumar
 
Review of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging ApproachReview of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging ApproachEditor IJMTER
 
Object tracking a survey
Object tracking a surveyObject tracking a survey
Object tracking a surveyHaseeb Hassan
 

La actualidad más candente (19)

Image recognition
Image recognitionImage recognition
Image recognition
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an Object
 
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
 
26.motion and feature based person tracking
26.motion and feature based person tracking26.motion and feature based person tracking
26.motion and feature based person tracking
 
Shadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective ViewShadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective View
 
Object Detection and tracking in Video Sequences
Object Detection and tracking in Video SequencesObject Detection and tracking in Video Sequences
Object Detection and tracking in Video Sequences
 
A Novel Edge Detection Technique for Image Classification and Analysis
A Novel Edge Detection Technique for Image Classification and AnalysisA Novel Edge Detection Technique for Image Classification and Analysis
A Novel Edge Detection Technique for Image Classification and Analysis
 
PPT s09-machine vision-s2
PPT s09-machine vision-s2PPT s09-machine vision-s2
PPT s09-machine vision-s2
 
Dip chapter 2
Dip chapter 2Dip chapter 2
Dip chapter 2
 
IT6005 digital image processing question bank
IT6005   digital image processing question bankIT6005   digital image processing question bank
IT6005 digital image processing question bank
 
A350111
A350111A350111
A350111
 
color detection using open cv
color detection using open cvcolor detection using open cv
color detection using open cv
 
PPT s06-machine vision-s2
PPT s06-machine vision-s2PPT s06-machine vision-s2
PPT s06-machine vision-s2
 
Image segmentation using wvlt trnsfrmtn and fuzzy logic. ppt
Image segmentation using wvlt trnsfrmtn and fuzzy logic. pptImage segmentation using wvlt trnsfrmtn and fuzzy logic. ppt
Image segmentation using wvlt trnsfrmtn and fuzzy logic. ppt
 
2D FEATURES-BASED DETECTOR AND DESCRIPTOR SELECTION SYSTEM FOR HIERARCHICAL R...
2D FEATURES-BASED DETECTOR AND DESCRIPTOR SELECTION SYSTEM FOR HIERARCHICAL R...2D FEATURES-BASED DETECTOR AND DESCRIPTOR SELECTION SYSTEM FOR HIERARCHICAL R...
2D FEATURES-BASED DETECTOR AND DESCRIPTOR SELECTION SYSTEM FOR HIERARCHICAL R...
 
Image processing
Image processingImage processing
Image processing
 
Multimedia searching
Multimedia searchingMultimedia searching
Multimedia searching
 
Review of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging ApproachReview of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging Approach
 
Object tracking a survey
Object tracking a surveyObject tracking a survey
Object tracking a survey
 

Destacado

Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2zukun
 
Constellation Models and Unsupervised Learning for Object Class Recognition
Constellation Models and Unsupervised Learning for Object Class RecognitionConstellation Models and Unsupervised Learning for Object Class Recognition
Constellation Models and Unsupervised Learning for Object Class Recognitionwolf
 
cvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introductioncvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introductionzukun
 
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)npinto
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)nikhilus85
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognitionIntel Nervana
 
AAAI08 tutorial: visual object recognition
AAAI08 tutorial: visual object recognitionAAAI08 tutorial: visual object recognition
AAAI08 tutorial: visual object recognitionzukun
 
Human activity recognition
Human activity recognitionHuman activity recognition
Human activity recognitionRandhir Gupta
 
Human activity recognition
Human activity recognition Human activity recognition
Human activity recognition srikanthgadam
 
Lbp based edge-texture features for object recoginition
Lbp based edge-texture features for object recoginitionLbp based edge-texture features for object recoginition
Lbp based edge-texture features for object recoginitionIGEEKS TECHNOLOGIES
 

Destacado (12)

Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2
 
Constellation Models and Unsupervised Learning for Object Class Recognition
Constellation Models and Unsupervised Learning for Object Class RecognitionConstellation Models and Unsupervised Learning for Object Class Recognition
Constellation Models and Unsupervised Learning for Object Class Recognition
 
cvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introductioncvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introduction
 
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
 
Part1
Part1Part1
Part1
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognition
 
AAAI08 tutorial: visual object recognition
AAAI08 tutorial: visual object recognitionAAAI08 tutorial: visual object recognition
AAAI08 tutorial: visual object recognition
 
Human activity recognition
Human activity recognitionHuman activity recognition
Human activity recognition
 
Human activity recognition
Human activity recognition Human activity recognition
Human activity recognition
 
Lbp based edge-texture features for object recoginition
Lbp based edge-texture features for object recoginitionLbp based edge-texture features for object recoginition
Lbp based edge-texture features for object recoginition
 
Local binary pattern
Local binary patternLocal binary pattern
Local binary pattern
 

Similar a Pc Seminar Jordi

Iccv2009 recognition and learning object categories p0 c00 - introduction
Iccv2009 recognition and learning object categories   p0 c00 - introductionIccv2009 recognition and learning object categories   p0 c00 - introduction
Iccv2009 recognition and learning object categories p0 c00 - introductionzukun
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Visionbutest
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Visionbutest
 
Fcv cross hebert
Fcv cross hebertFcv cross hebert
Fcv cross hebertzukun
 
Mit6870 template matching and histograms
Mit6870 template matching and histogramsMit6870 template matching and histograms
Mit6870 template matching and histogramszukun
 
Fcv taxo zisserman
Fcv taxo zissermanFcv taxo zisserman
Fcv taxo zissermanzukun
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2zukun
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015Jia-Bin Huang
 
Perception, representation, structure, and recognition
Perception, representation, structure, and recognitionPerception, representation, structure, and recognition
Perception, representation, structure, and recognitionZahra Sadeghi
 
426 Lecture 9: Research Directions in AR
426 Lecture 9: Research Directions in AR426 Lecture 9: Research Directions in AR
426 Lecture 9: Research Directions in ARMark Billinghurst
 
Discovering Thematic Object in a Video
Discovering Thematic Object in a VideoDiscovering Thematic Object in a Video
Discovering Thematic Object in a VideoIOSR Journals
 
E Cognition User Summit2009 S Lang Zgis Object Validity
E Cognition User Summit2009 S Lang Zgis Object ValidityE Cognition User Summit2009 S Lang Zgis Object Validity
E Cognition User Summit2009 S Lang Zgis Object ValidityTrimble Geospatial Munich
 
Common Understanding about YOLO
Common Understanding about YOLOCommon Understanding about YOLO
Common Understanding about YOLO재민 임
 
Wits presentation 2_19052015
Wits presentation 2_19052015Wits presentation 2_19052015
Wits presentation 2_19052015Beatrice van Eden
 
Aj2418721874
Aj2418721874Aj2418721874
Aj2418721874IJMER
 
cvpr2011: human activity recognition - part 6: applications
cvpr2011: human activity recognition - part 6: applicationscvpr2011: human activity recognition - part 6: applications
cvpr2011: human activity recognition - part 6: applicationszukun
 
Defence session
Defence sessionDefence session
Defence sessionAli Borji
 
Natural Interaction for Augmented Reality Applications
Natural Interaction for Augmented Reality ApplicationsNatural Interaction for Augmented Reality Applications
Natural Interaction for Augmented Reality ApplicationsMark Billinghurst
 

Similar a Pc Seminar Jordi (20)

Iccv2009 recognition and learning object categories p0 c00 - introduction
Iccv2009 recognition and learning object categories   p0 c00 - introductionIccv2009 recognition and learning object categories   p0 c00 - introduction
Iccv2009 recognition and learning object categories p0 c00 - introduction
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Vision
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Vision
 
Promising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in visionPromising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in vision
 
Fcv cross hebert
Fcv cross hebertFcv cross hebert
Fcv cross hebert
 
Mit6870 template matching and histograms
Mit6870 template matching and histogramsMit6870 template matching and histograms
Mit6870 template matching and histograms
 
Fcv taxo zisserman
Fcv taxo zissermanFcv taxo zisserman
Fcv taxo zisserman
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015
 
Perception, representation, structure, and recognition
Perception, representation, structure, and recognitionPerception, representation, structure, and recognition
Perception, representation, structure, and recognition
 
426 Lecture 9: Research Directions in AR
426 Lecture 9: Research Directions in AR426 Lecture 9: Research Directions in AR
426 Lecture 9: Research Directions in AR
 
Discovering Thematic Object in a Video
Discovering Thematic Object in a VideoDiscovering Thematic Object in a Video
Discovering Thematic Object in a Video
 
E Cognition User Summit2009 S Lang Zgis Object Validity
E Cognition User Summit2009 S Lang Zgis Object ValidityE Cognition User Summit2009 S Lang Zgis Object Validity
E Cognition User Summit2009 S Lang Zgis Object Validity
 
Common Understanding about YOLO
Common Understanding about YOLOCommon Understanding about YOLO
Common Understanding about YOLO
 
Wits presentation 2_19052015
Wits presentation 2_19052015Wits presentation 2_19052015
Wits presentation 2_19052015
 
Aj2418721874
Aj2418721874Aj2418721874
Aj2418721874
 
1.pdf
1.pdf1.pdf
1.pdf
 
cvpr2011: human activity recognition - part 6: applications
cvpr2011: human activity recognition - part 6: applicationscvpr2011: human activity recognition - part 6: applications
cvpr2011: human activity recognition - part 6: applications
 
Defence session
Defence sessionDefence session
Defence session
 
Natural Interaction for Augmented Reality Applications
Natural Interaction for Augmented Reality ApplicationsNatural Interaction for Augmented Reality Applications
Natural Interaction for Augmented Reality Applications
 

Más de Universitat de Barcelona (14)

Els informàtics o val la pena estudiar informàtica?
Els informàtics o val la pena estudiar informàtica?Els informàtics o val la pena estudiar informàtica?
Els informàtics o val la pena estudiar informàtica?
 
Classe 7 Visió
Classe 7  VisióClasse 7  Visió
Classe 7 Visió
 
Classe 10 Visió
Classe 10 VisióClasse 10 Visió
Classe 10 Visió
 
Classe 9 Visió
Classe 9 VisióClasse 9 Visió
Classe 9 Visió
 
Classe 8 Visió
Classe 8 VisióClasse 8 Visió
Classe 8 Visió
 
Classe 11 Visió
Classe 11 VisióClasse 11 Visió
Classe 11 Visió
 
Classe 5 Visió
Classe 5 VisióClasse 5 Visió
Classe 5 Visió
 
Classe 4 Visió
Classe 4 VisióClasse 4 Visió
Classe 4 Visió
 
Classe 6 Visió
Classe 6 VisióClasse 6 Visió
Classe 6 Visió
 
Classe 2 Visió
Classe 2 VisióClasse 2 Visió
Classe 2 Visió
 
Classe 3 Visió
Classe 3 VisióClasse 3 Visió
Classe 3 Visió
 
The Last Frontier
The Last FrontierThe Last Frontier
The Last Frontier
 
Bits, àtoms i màquines virtuals
Bits, àtoms i màquines virtualsBits, àtoms i màquines virtuals
Bits, àtoms i màquines virtuals
 
Computación y señales sociales
Computación y señales socialesComputación y señales sociales
Computación y señales sociales
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Pc Seminar Jordi

  • 1. Visual Object Recognition Vi l Obj R ii Perceptual Computing Seminar Perceptual Computing Seminar Sergio Escalera,  Xavier Baró, Jordi Vitrià, Petia Radeva, Oriol Pujol BCN Perceptual Computing Lab
  • 2. Index 1. Introduction 2. Recognition with Local Features: Basics.  3. 3 Invariant representations: SIFT I i i SIFT 4. Recognition as a Classification Problem:  g FERNS 5. Very large databases: Hashing 5 Very large databases Hashing Visual Object Recognition                 Perceptual Computing Seminar                        Page 2
  • 3. Introduction The recognition of object categories in images is one of the most challenging problems in computer vision especially when the number vision, of categories is large. Humans are able to recognize thousands of object types, whereas most of the existing object recognition systems are trained to j g y recognize only a few. Visual Object Recognition                 Perceptual Computing Seminar                        Page 3
  • 4. Introduction Invariance t i I i to viewpoint, illumination, “shape”, color, scale, texture, etc. i t ill i ti “h ” l l t t t Visual Object Recognition                 Perceptual Computing Seminar                        Page 4
  • 5. Introduction Why do we care about recognition? (theoretical question) y g ( q ) Perception of function: We can perceive the p p 3D shape, texture, material properties, without knowing about objects But the objects. But, concept of category encapsulates also information about what can we d with i f ti b t h t do ith those objects. Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories: Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24. Visual Object Recognition                 Perceptual Computing Seminar                        Page 5
  • 6. Introduction Why it is hard? y Find the chair in this image Output of correlation This is a chair Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories: Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24. Visual Object Recognition                 Perceptual Computing Seminar                        Page 6
  • 7. Introduction Why it is hard? y Find the chair in this image  Pretty much garbage; Simple template  P tt h b Si l t l t matching is not going to make it Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories: Year 2009, ICCV 2009 Kyoto, Short Course, September 24. Visual Object Recognition                 Perceptual Computing Seminar                        Page 7
  • 8. Introduction Why do we care about recognition? (practical question) Visual Object Recognition                 Perceptual Computing Seminar                        Page 8
  • 9. Introduction Why do we care about recognition? (practical question) Visual Object Recognition                 Perceptual Computing Seminar                        Page 9
  • 10. Introduction Why do we care about recognition (practical question)? Query Results from 5k Flickr images (demo available for 100k set) James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, Andrew Zisserman: Object retrieval with large vocabularies and fast spatial matching. CVPR 2007 Visual Object Recognition                 Perceptual Computing Seminar                        Page 10
  • 11. Recognition with Local Features g It is known that the visual system can use local, informative image «fragments» of a given object, rather than the whole object, to j , j , classify it into a familiar category. This approach has some advantages over holistic methods... methods Visual Object Recognition                 Perceptual Computing Seminar                        Page 11
  • 12. Recognition with Local Features g Holistic Fragment‐based g Visual Object Recognition                 Perceptual Computing Seminar                        Page 12
  • 13. Recognition with Local Features g Jay Hegde, Evgeniy Bart, and Daniel Kersten, "Fragment‐based learning of visual object categories", Current Biology, 2008. Visual Object Recognition                 Perceptual Computing Seminar                        Page 13
  • 14. Recognition with Local Features g The most basic approach is called the “bag of words” approach (it was inspired in as techniques used by the natural language processing community). Visual Object Recognition                 Perceptual Computing Seminar                        Page 14
  • 15. Recognition with Local Features g Assumptions: • Independent features. d d f Fragments  Fragments vocabulary • Histogram representation. (generic/class‐ based, etc.) based etc ) Image  Image = Fragments  histogram Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories: Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24. Visual Object Recognition                 Perceptual Computing Seminar                        Page 15
  • 16. Recognition with Local Features g A more advanced approach involves several  steps: steps • Stage 0: Find image locations where we can reliably find correspondences with other images. • Stage 1: Image content is transformed into local g g features (that are invariant to translation, rotation, and scale). • Stage 2: Verify if they belong to a consistent configuration Slide credit: David Lowe Visual Object Recognition                 Perceptual Computing Seminar                        Page 16
  • 17. SIFT A wonderful example of these stages can be found in David Lowe’s (2004) “Distinctive image features from Lowe s Distinctive scale‐invariant keypoints” paper, which describes the development and refinement of his Scale Invariant Feature Transform (SIFT). Local Features, e.g. SIFT L lF t Visual Object Recognition                 Perceptual Computing Seminar                        Page 17
  • 18. Recognition with Local Features g Which local features? ? Slide credit: A. Efros Visual Object Recognition                 Perceptual Computing Seminar                        Page 18
  • 19. SIFT Stage 0: How can we find image locations where we can reliably find correspondences with other images? A “good” location has one stable sharp extremum. f Good ! f f bad bad x x x Visual Object Recognition                 Perceptual Computing Seminar                        Page 19
  • 21. SIFT Stage 0: How can we find image locations where we can reliably find correspondences with other images? How to compute extrema at a given scale: 1) We apply a Gaussian filter: 2) We compute a difference‐of‐Gaussians 3) We look for 3D extrema in the resulting structure.  Visual Object Recognition                 Perceptual Computing Seminar                        Page 21
  • 24. SIFT Stage 1: Image content is transformed into local features (that are invariant to translation, rotation, and scale). In addition to dealing with scale changes, we need to deal with (at least) in‐plane image rotation. One way to deal with this problem is to design descriptors that are rotationally invariant, but such descriptors have poor discriminability, i.e. they map different looking patches to the same descriptor. Visual Object Recognition                 Perceptual Computing Seminar                        Page 24
  • 25. SIFT A better method is to estimate a dominant orientation at each detected keypoint. 1.Calculate histogram of local gradients in the window 2.Take the dominant orientation gradient as “up” 3.Rotate local area for computing descriptor Visual Object Recognition                 Perceptual Computing Seminar                        Page 25
  • 26. SIFT Lowe: • computes a 36‐bin histogram of edge orientations weighted by both gradient magnitude and Gaussian distance to the center, • finds all peaks within 80% of the global maximum, and then • computes a more accurate orientation estimate using a 3‐bin parabolic fit. Visual Object Recognition                 Perceptual Computing Seminar                        Page 26
  • 28. SIFT Local patch around descriptor  Gradient magnitude Gradient orientation from Gaussian pyramid Visual Object Recognition                 Perceptual Computing Seminar                        Page 28
  • 31. SIFT Even after compensating for translation, rotation, rotation and scale changes the local changes, appearance of image patches will usually still vary from image to image. How can we make the descriptor that we match more invariant to such changes while still changes, preserving discriminability between different (non‐corresponding) (non corresponding) patches? Visual Object Recognition                 Perceptual Computing Seminar                        Page 31
  • 32. SIFT SIFT features are formed by computing the gradient at each pixel in a 16x16 window around the d h l d d h detected d keypoint, using the appropriate level of the Gaussian pyramid at which the k id hi h h keypoint was d i detected. d The Th gradient magnitudes are d di t it d downweighted b a G i ht d by Gaussian f ll ff f ti i fall‐off function in order to reduce the influence of gradients far from the center, as these are more affected by small misregistrations. Visual Object Recognition                 Perceptual Computing Seminar                        Page 32
  • 33. SIFT In each 4x4 quadrant, a gradient orientation histogram is formed b (concept all ) adding by (conceptually) the weighted gradient value to one of 8 orientation histogram bins. Visual Object Recognition                 Perceptual Computing Seminar                        Page 33
  • 34. SIFT The resulting 128 non negative values form a non‐negative raw version of the SIFT descriptor vector. To reduce the effects of contrast/gain (additive variations are already removed by the gradient), the 128‐D vector is normalized to 128 D unit length. Visual Object Recognition                 Perceptual Computing Seminar                        Page 34
  • 35. SIFT Once we have extracted features and their descriptors from two or more images the next step is to establish images, some preliminary feature matches between these images. images Visual Object Recognition                 Perceptual Computing Seminar                        Page 35
  • 36. SIFT Once we have extracted features and their descriptors from two or more images the next step is to establish images, some preliminary feature matches between these images. images SIFT uses a nearest neighbor classifier with a distance ratio matching criterion We can define this nearest neighbor criterion. distance ratio as where d1 and d2 are the nearest and second nearest neighbor distances, and DA…..DC are the target descriptor along with its closest two neighbors neighbors. Visual Object Recognition                 Perceptual Computing Seminar                        Page 36
  • 38. SIFT Linear method: The simplest way to find all corresponding feature points is to compare all features against all other features in each pair of potentially matching images. Unfortunately, this is quadratic in the f l h d h number of extracted features, which makes it impractical for some applications. Visual Object Recognition                 Perceptual Computing Seminar                        Page 38
  • 39. SIFT Nearest‐neighbor matching is the major computational bottleneck: • Linear search performs dn2 operations for n feature points and d dimensions • No exact NN methods are faster than linear search for d>10 • Approximate methods can be much faster, but at the cost of missing some correct matches matches. Failure rate gets worse for large datasets. Visual Object Recognition                 Perceptual Computing Seminar                        Page 39
  • 40. SIFT A better approach is to devise an indexing structure such as a multi‐dimensional search tree or a hash table to rapidly search for features near a given feature. For extremely large databases (millions of images or more), even more efficient structures based on ideas from document retrieval (e.g., vocabulary trees) can be used. Visual Object Recognition                 Perceptual Computing Seminar                        Page 40
  • 41. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration The first step is to establish a set of putative correspondences. Visual Object Recognition                 Perceptual Computing Seminar                        Page 41
  • 42. SIFT How can we discard erroneous correspondences? Visual Object Recognition                 Perceptual Computing Seminar                        Page 42
  • 43. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration Once we have some hypothetical (putative) matches, we can use geometric alignment to t verify which matches are i li if hi h t h inliers and d which ones are outliers. Visual Object Recognition                 Perceptual Computing Seminar                        Page 43
  • 44. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration • Extract features • Compute putative matches Visual Object Recognition                 Perceptual Computing Seminar                        Page 44
  • 45. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration • Loop: – Hypothesize transformation T (using a small group of putative  matches that are related by T) matches that are related by T) Visual Object Recognition                 Perceptual Computing Seminar                        Page 45
  • 46. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration • Loop: – Hypothesize transformation T (small group of putative matches that  are related by T) – Verify transformation (search for other matches consistent with T) Visual Object Recognition                 Perceptual Computing Seminar                        Page 46
  • 47. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration Visual Object Recognition                 Perceptual Computing Seminar                        Page 47
  • 48. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration 2D transformation models: • Similarity (translation,  (translation, scale, rotation) • Affine • Projective (homography) Visual Object Recognition                 Perceptual Computing Seminar                        Page 48
  • 49. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration Fitting an affine transformation (given the point correspondences): ( xi , yi ) ( xi, yi) Slide credit: S. Lazebnik Visual Object Recognition                 Perceptual Computing Seminar                        Page 49
  • 50. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration Fitting an affine transformation (given the point correspondences):  m1       m2    xi   m1 m2   xi   t1  x yi 0 0 1 0  m3   xi   y   m   y   t   i       i  3 m4   i   2  0 0 xi yi 0 1 m4   yi          t1       t2  Slide credit: S. Lazebnik Visual Object Recognition                 Perceptual Computing Seminar                        Page 50
  • 51. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration Fitting an affine transformation (given the point correspondences): • Linear system with six unknowns • Each match gives us two linearly independent equations:  need at least three to solve for the transformation  d l h l f h f parameters • C Can solve Ax=b using pseduo‐inverse: l A b i d i x = (ATA)‐1ATb       Slide credit: S. Lazebnik Visual Object Recognition                 Perceptual Computing Seminar                        Page 51
  • 52. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration Fitting an affine transformation (given the point correspondences): • Linear system with six unknowns • Each match gives us two linearly independent equations:  need at least three to solve for the transformation  d l h l f h f parameters • C Can solve Ax=b using pseduo‐inverse: l A b i d i x = (ATA)‐1ATb       Slide credit: S. Lazebnik Visual Object Recognition                 Perceptual Computing Seminar                        Page 52
  • 53. SIFT Stage 2: Verify if they belong to a consistent configuration. config ration The process of selecting a small set of seed matches and then verifying a larger set is y g g often called random sampling or RANSAC. Visual Object Recognition                 Perceptual Computing Seminar                        Page 53
  • 54. RANSAC RANSAC was originally formulated in Martin A. Fischler and Robert C. Bolles (June 1981). "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography". Comm. of the pp g y g p y ACM 24: 381–395. Visual Object Recognition                 Perceptual Computing Seminar                        Page 54
  • 55. RANSAC “We approached the fitting problem in the opposite way from most previous techniques. Instead of averaging all the measurements and then trying to throw out bad ones we used the smallest number of measurements to ones, compute a model’s unknown parameters and then evaluated the instantiated model by counting the number of consistent samples” From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006. Visual Object Recognition                 Perceptual Computing Seminar                        Page 55
  • 56. RANSAC It’s easy to understand and it’s effective • It helps solve a common problem (i.e., filter out gross errors introduced by automatic techniques) • The number of trials to “guarantee” a high level of success (e.g., 99.99 (e g 99 99 probability) is surprisingly small • The dramatic increase in computation speed made it possible to do a large number of trials (100s or 1000s) • The algorithm can stop as soon as a good match is computed (unlike Hough techniques that typically compute a large number of examples and then identify matches) From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006. Visual Object Recognition                 Perceptual Computing Seminar                        Page 56
  • 57. RANSAC The basic idea is to repeat M times the following process: 1. A model is fitted to the hypothetical inliers, i.e. all free parameters of the yp , p model are reconstructed from the data set. 2. All other data are then tested against the fitted model and, if a point fits well to the estimated model also considered as a hypothetical inlier model, inlier. 3. The estimated model is reasonably good if sufficiently many points have been classified as hypothetical inliers. 4. The model is reestimated from all hypothetical inliers, because it has only been estimated from the initial set of hypothetical inliers. 5. Finally, 5 Finally the model is evaluated by estimating the error of the inliers relative to the model. This procedure is repeated a fixed number of times, each time producing either a model which is rejected because too few points are classified as inliers or a refined model together with a corresponding error measure. In the latter case, we keep the refined model if its error is lower than the last saved model. , p From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006. Visual Object Recognition                 Perceptual Computing Seminar                        Page 57
  • 59. RANSAC Line fitting example: Line fitting example: Task: Estimate best line st ate best e Visual Object Recognition                 Perceptual Computing Seminar                        Page 59
  • 60. RANSAC Line fitting example: Line fitting example: Sample two points Visual Object Recognition                 Perceptual Computing Seminar                        Page 60
  • 61. RANSAC Line fitting example: Line fitting example: Fit Line Visual Object Recognition                 Perceptual Computing Seminar                        Page 61
  • 62. RANSAC Line fitting example: Line fitting example: Total number of points  within a threshold of line. Visual Object Recognition                 Perceptual Computing Seminar                        Page 62
  • 63. RANSAC Line fitting example: Line fitting example: Repeat, until get a  good esu t good result Visual Object Recognition                 Perceptual Computing Seminar                        Page 63
  • 64. RANSAC Line fitting example: Line fitting example: Repeat, until get a  good esu t good result Visual Object Recognition                 Perceptual Computing Seminar                        Page 64
  • 66. RANSAC example: translation p Putative matches Slide credit: A. Efros Visual Object Recognition                 Perceptual Computing Seminar                        Page 66
  • 67. RANSAC example: translation p Select one match, count inliers Slide credit: A. Efros Visual Object Recognition                 Perceptual Computing Seminar                        Page 67
  • 68. RANSAC example: translation p Find “average” translation vector Slide credit: A. Efros Visual Object Recognition                 Perceptual Computing Seminar                        Page 68
  • 69. RANSAC Interest points (500/image) ( / ) Putative correspondences  (268) Outliers (117) Inliers (151) Final inliers (262) Visual Object Recognition                 Perceptual Computing Seminar                        Page 69
  • 70. SIFT Applications pp Visual Object Recognition                 Perceptual Computing Seminar                        Page 70
  • 71. SIFT Applications pp Visual Object Recognition                 Perceptual Computing Seminar                        Page 71
  • 72. SIFT Applications pp HDRSoft Visual Object Recognition                 Perceptual Computing Seminar                        Page 72
  • 73. SIFT Applications pp Visual Object Recognition                 Perceptual Computing Seminar                        Page 73
  • 74. Matching and Classification g SIFT allows reliable real‐time recognition but at a computational cost that severely limits the number of points that can be handled. A standard implementation requires 1 ms per feature point which limits the number of point, feature points to 50 per frame if one‐ requires frame rate performance frame‐rate performance. Visual Object Recognition                 Perceptual Computing Seminar                        Page 74
  • 75. Matching and Classification g An alternative is to rely on statistical learning techniques to model the set of possible appearances of a patch. The major challenge is to use simple models to allow for real time efficient recognition real‐time, recognition. Visual Object Recognition                 Perceptual Computing Seminar                        Page 75
  • 76. Matching and Classification g Can we match keypoints using simpler features without intensive preprocessing? ?:{ … } We will assume that we have the possibility p y to train a classifier for each keypoint class. Visual Object Recognition                 Perceptual Computing Seminar                        Page 76
  • 77. Matching and Classification g Simple binary features I(mi,1) I(m I( i,2) The test compares the intensities of two pixels around the keypoint: 1 if I(mii,1 )  I(mii,2 ) fi    0 otherwise Visual Object Recognition                 Perceptual Computing Seminar                        Page 77
  • 78. Matching and Classification g Without intensive preprocessing We can synthetically generate the set of keypoint’s possible appearances under various perspective, lighting, noise, etc. Visual Object Recognition                 Perceptual Computing Seminar                        Page 78
  • 79. Matching and Classification g FERN Formulation We model the class conditional probabilities of a large number of binary features which are estimated by a training phase. y gp At run time, these probabilities are used to select the best match for a given image patch. patch Visual Object Recognition                 Perceptual Computing Seminar                        Page 79
  • 80. Matching and Classification g FERN Formulation fi : Binary feature. Nf : Total number of features in the model. Ck : Class representing all views of an image patch around a keypoint. Given f1 ,..., f Nf select the class k such that k  arg max P(Ck | f1 , f 2 ,  , f N f )  arg max P( f1 , f 2 , , f N f | Ck ) k k Mustafa Ozuysal, Michael Calonder, Vincent Lepetit, Pascal Fua, "Fast Keypoint Recognition Using Random Ferns," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 99, , 2009 Visual Object Recognition                 Perceptual Computing Seminar                        Page 80
  • 81. Matching and Classification g FERN Formulation However, it is not practical to model the joint distribution of all features. We group features into small sets (fern) and assume independence between these sets (Semi‐Naïve Bayesian Classifier): Fj : A fern is defined to be the set of S binary features {fr ,..., fr+S }. +S M is the number of ferns, Nf = S X M. Visual Object Recognition                 Perceptual Computing Seminar                        Page 81
  • 82. Matching and Classification g FERN Formulation P( f1 , f 2 ,  , f N f | Ck )  2 Nf p parameters! Nf P( f1 , f 2 ,  , f N f | Ck )   P ( f i | Ck ) N f parameters, p i 1 but too simple. M P( f1 , f 2 ,  , f N f | Ck )   P ( F j | Ck )  M  2 s parameters. j 1 Visual Object Recognition                 Perceptual Computing Seminar                        Page 82
  • 83. Matching and Classification g FERN Implementation We generate a random set of binary features. A binary feature outputs a binary number y p y 2  possibilities 8 possibilities ibili i A fern with S nodes outputs a number between o and 2S‐1 A fern with S nodes outputs a number between o and 2 ‐1. Visual Object Recognition                 Perceptual Computing Seminar                        Page 83
  • 84. Matching and Classification g FERN Implementation When we have multiple patches of the same Probability  class we can model the output of a fern with for each  a multinomial distribution. possibility. Visual Object Recognition                 Perceptual Computing Seminar                        Page 84
  • 85. Matching and Classification g Slide Credit: V.Lepetit Visual Object Recognition                 Perceptual Computing Seminar                        Page 85
  • 86. Matching and Classification g 0 1 1 6 Slide Credit: V.Lepetit Visual Object Recognition                 Perceptual Computing Seminar                        Page 86
  • 87. Matching and Classification g 0 1 1 0 1 0 1 6 Slide Credit: V.Lepetit Visual Object Recognition                 Perceptual Computing Seminar                        Page 87
  • 88. Matching and Classification g 0 1 1 1 0 0 1 0 1 1 5 6 Slide Credit: V.Lepetit Visual Object Recognition                 Perceptual Computing Seminar                        Page 88
  • 89. Matching and Classification g Slide Credit: V.Lepetit Visual Object Recognition                 Perceptual Computing Seminar                        Page 89
  • 90. Matching and Classification g Normalize: N li  P( f , f 1 2 , , f n | C  c i )  1 000 001   111 Slide Credit: V.Lepetit Visual Object Recognition                 Perceptual Computing Seminar                        Page 90
  • 91. Matching and Classification g FERN Implementation At the end of the training we have distributions over possible fern outputs for each class class. Visual Object Recognition                 Perceptual Computing Seminar                        Page 91
  • 92. Matching and Classification g FERN Implementation To recognize a new patch the outputs selects rows of distributions for each fern and these are then combined assuming independence between distributions. Visual Object Recognition                 Perceptual Computing Seminar                        Page 92
  • 93. Matching and Classification g Visual Object Recognition                 Perceptual Computing Seminar                        Page 93
  • 94. Matching and Classification g FERN Implementation …in 10 lines of code…. 1: for(int i = 0; i < H; i++) P[i ] = 0.; 2: for(int k = 0; k < M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j < S; j++) { 5: index <<= 1; 6: if (*(K + d[0]) < *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1; 10: for(int i = 0; i < H; i++) P[i] += p[i]; } Visual Object Recognition                 Perceptual Computing Seminar                        Page 94
  • 95. Matching and Classification g Visual Object Recognition                 Perceptual Computing Seminar                        Page 95
  • 96. Matching and Classification g Visual Object Recognition                 Perceptual Computing Seminar                        Page 96
  • 97. Matching and Classification g Visual Object Recognition                 Perceptual Computing Seminar                        Page 97
  • 98. Matching and Classification g Visual Object Recognition                 Perceptual Computing Seminar                        Page 98
  • 99. Matching and Classification g The FERN technique speeds‐up keypoint matching but the training is slow and performed offline. Hence, it is not suited for applications that require real‐time online learning or real time incremental addition of arbitrary numbers of keypoints (f e SLAM) (f.e. SLAM). Visual Object Recognition                 Perceptual Computing Seminar                        Page 99
  • 100. Matching and Classification g This limitation can be removed if we train a FERN classifier to recognize a number of keypoints extracted from a reference database and all other keypoints are characterized in terms of their response to these classification ferns (signature) (signature). Visual Object Recognition                 Perceptual Computing Seminar                        Page 100
  • 101. Matching and Classification g M. Calonder, V. Lepetit, and P. Fua, Keypoint Signatures for Fast Learning and Recognition.  In Proceedings of European Conference on Computer Vision, 2008. Visual Object Recognition                 Perceptual Computing Seminar                        Page 101
  • 102. Matching and Classification g It can be empirically shown that these signatures are stable under changes in viewing conditions conditions. Signatures are sparse in nature if we apply a threshold function. Signatures do not need a training phase and scale well with the number of classes (nearest neighbor). Visual Object Recognition                 Perceptual Computing Seminar                        Page 102
  • 103. Matching and Classification g However, matching signatures still involves many more elementary operations than absolutely necessary necessary. Moreover, evaluating the signatures requires M l i h i i storing many distributions of the same size as themselves and, therefore, large amounts of memory. y Visual Object Recognition                 Perceptual Computing Seminar                        Page 103
  • 104. Matching and Classification g The full response vector r(p) for all J Ferns is taken p (p) to be: Vectors storing the  probability that p is one of  the N reference points. the N reference points where Z is a normalizer s.t. its elements sum to one. In practice, when p truly corresponds to one of the reference keypoints r(p) contains one element that is close keypoints, to one where all others are close to zero. Otherwise, Otherwise it contains a few relatively large values that correspond to reference keypoints that are similar in appearance and small values elsewhere. pp Visual Object Recognition                 Perceptual Computing Seminar                        Page 104
  • 105. Matching and Classification g We can compute a sparse signature by applting a p p g y pp g point wise threshold function with a θ value. It is an N‐dimensional vector with only a few non‐ y zero elements that is mostly invariant to different imaging conditions and therefore presents a useful g g p descriptor for matching purposes. Visual Object Recognition                 Perceptual Computing Seminar                        Page 105
  • 106. Matching and Classification g The patch J Ferns Vectors storing  Vectors storing the probability  that p is one of  the N reference  points. Typical parameters:  J 50; d 10; N 500 J=50; d=10; N=500 Visual Object Recognition                 Perceptual Computing Seminar                        Page 106
  • 107. Matching and Classification g Typical parameters:  J 50; d 10; N 500 J=50; d=10; N=500 We need for each of the 2d leaves in each of the J Ferns an N‐ dimensional vector of floats floats. The total memory requirement is M=bJ2d N bytes, where b is the number of bytes to store a float (8) In practice 100MB! (8). practice, Visual Object Recognition                 Perceptual Computing Seminar                        Page 107
  • 108. Matching and Classification g Compressive Sensing literature: • High‐dimensional sparse vectors can be g p reconstructed from their linear projections into much lower‐dimensional spaces. p • The Johnson–Lindenstrauss lemma states that a small set of points in a h h d ll f high‐dimensional space can l be embedded into a space of much lower dimension i such a way that di di i in h h distances b between the points are nearly preserved. Visual Object Recognition                 Perceptual Computing Seminar                        Page 108
  • 109. Matching and Classification g Many kinds of matrices can be used for this purpouse. Random Ortho‐Projection (ROP) matrices are a good choice and can be easily constructed by applying a Gram‐Schmidt y pp y g orthonormalization process to a random matrix. matrix Visual Object Recognition                 Perceptual Computing Seminar                        Page 109
  • 110. Matching and Classification g In I mathematics th G th ti the Gram–Schmidt process i a S h idt is method for orthonormalizing a set of vectors in an i inner product space, most commonly d t t l the Euclidean space Rn. The Gram–Schmidt process takes a finite, linearly independent set S = { 1, …, vk} f k ≤ n and i d d t t {v for d generates an orthogonal set S' = {u1, …, uk} that k‐dimensional subspace of Rn as S spans th same k di the i l b f S. Visual Object Recognition                 Perceptual Computing Seminar                        Page 110
  • 111. Matching and Classification g M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐ speed Interest Point Description and Matching. In Proceedings of International Conference on Computer  Vision, 2009. Visual Object Recognition                 Perceptual Computing Seminar                        Page 111
  • 112. Matching and Classification g M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐ speed Interest Point Description and Matching. In Proceedings of International Conference on Computer  Vision, 2009. Visual Object Recognition                 Perceptual Computing Seminar                        Page 112
  • 113. Matching and Classification g M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐ speed Interest Point Description and Matching. In Proceedings of International Conference on Computer  Vision, 2009. Visual Object Recognition                 Perceptual Computing Seminar                        Page 113
  • 114. Matching and Classification g This approach reduces the memory requirement when storing the models: for N=512, M=176, the requirements change from 93.75MB to 175B! The CPU time is 6.3ms per an exhaustive NN matching of 256 points (256x256) (256x256). Visual Object Recognition                 Perceptual Computing Seminar                        Page 114
  • 115. Internet‐scale image databases g Visual Object Recognition                 Perceptual Computing Seminar                        Page 115
  • 116. Min HASH How can we find similar images in  How can we find similar images in very large datasets?  Can we get clusters from these g images? Visual Object Recognition                 Perceptual Computing Seminar                        Page 116
  • 117. Min HASH Let s suppose that we choose a LARGE bag Let’s suppose that we choose a LARGE bag‐ of‐words representation of our images and  that we use a binary histogram. that we use a binary histogram Visual Object Recognition                 Perceptual Computing Seminar                        Page 117
  • 118. Min HASH Given two different images, we can compute their histogram intersection: Visual Object Recognition                 Perceptual Computing Seminar                        Page 118
  • 119. Min HASH …and their histogram union: …and their histogram union: Visual Object Recognition                 Perceptual Computing Seminar                        Page 119
  • 120. Min HASH Then we can define a set similarity measure in the following way: That is, the number of times both images have a given keypoint in common divided by the total number of keypoints that are present in both images. Visual Object Recognition                 Perceptual Computing Seminar                        Page 120
  • 122. Min HASH We can perform clustering or matching of an unordered set of i f d d f images with this h h measure, but this can be used only with a limited amount of data! The method requires  w d i1 i 2 similarity evaluations, where w is  the size of the vocabulary and di is  the number of regions assigned to  th b f i i dt the i‐th visual word.  Vocabulary commonly used is  w=1.000.000.  w=1 000 000 Visual Object Recognition                 Perceptual Computing Seminar                        Page 122
  • 123. Min HASH From can perform clustering or matching of an unordered set of images with this measure but this can be used measure, only with a limited amount of data! Observation: histograms for an g image are highly sparse! Visual Object Recognition                 Perceptual Computing Seminar                        Page 123
  • 124. Min HASH The key idea of min‐hash is to map min hash (“hash”) each row/histogram to a small amount of data Sig(A) (the signature) such that: • Sig(A) is small enough. • Rows A1 and A2 are highly similar if Sig(A1) is highly similar to Sig(A2). g g y g Visual Object Recognition                 Perceptual Computing Seminar                        Page 124
  • 125. Min HASH Useful convention: we will refer to columns as being of four types: A1: 1010 A2: 1100 Type: yp abcd We will also use “a” as the number of columns  of type a.  yp Notes:   • Sim (A1 , A2)=a/(a+b+c) Sim (A A • Most columns are type d.   Visual Object Recognition                 Perceptual Computing Seminar                        Page 125
  • 126. Min HASH • Imagine the columns permuted randomly in order. d • Hash each row A to h(A), the number of the first l fi column i which row A h a 1. in hi h has 1 0 0 1 0 π 0 1 0 0 1 h(A1) 2 )=2 1 0 0 0 0 0 1 0 0 0 h(A2)=2 The probability that h(A1) = h(A2) is a/(a+b+c) = Sim (A1 , A2) (the hash agree if the first column with a 1 is a and disagree if it is of type b or c). Visual Object Recognition                 Perceptual Computing Seminar                        Page 126
  • 127. Min HASH If we repeat the experiment with a new permutation of columns a l f l large number of b f times, say 512, we get a signature consisting of 512 column numbers for each row. The “similarity” of these lists (fraction of positions in which they agree) will be very close to the similarity of the rows (= ( similar signatures mean similar rows!). Visual Object Recognition                 Perceptual Computing Seminar                        Page 127
  • 128. Min HASH In fact, it is not necessary to permute the columns: we can hash each original column with 512 different hash functions and keep for each row the lowest hash value of a row in which that column has a 1, independently for each of the 512 hash functions. Then we look for the coincidences. signature row 1 0 0 1 0 h1 5 1 3 2 4 h1(row)=  2 h2 1 2 5 3 4 h2(row)=  1 h3 3 4 1 5 2 h3(row)= 3 (row)=  3 h4 2 5 4 1 3 h4(row)=  1 Visual Object Recognition                 Perceptual Computing Seminar                        Page 128
  • 129. Min HASH Row 1 1 0 1 1 0 Row 2 0 1 0 0 1 Row 3 R 3 1 1 0 1 0 h1 1 2 3 4 5 h1(row)=  1 ,  2 , 1 h2 5 4 3 2 1 h2(row)=  2 ,  1 , 2     (row) 2 1 2 h3 3 4 5 1 2 h3(row)=  1 ,  2 , 1 Similarities: Row Row Row‐Row Sig Sig Sig‐Sig 1‐2:   0/5 0/3 1‐3:  2/4 3/3 2‐3:  1/4   / 0/3 / Visual Object Recognition                 Perceptual Computing Seminar                        Page 129
  • 130. Min Hash For efficient retrieval, the min hashes are grouped into n‐tuples. In this example, we can form the following 2‐tuples: h1(row)=  1 ,  2 , 1 h2(row)= 2 1 2 (row)=  2 ,  1 , 2     h3(row)=  1 ,  2 , 1 h4(row)=  3 ,  2 , 3 (row) 3 , 2 , 3 The retrieval procedure then estimates the full similarity for only those image pairs that have at least h identical tuples out of k tuples. Visual Object Recognition                 Perceptual Computing Seminar                        Page 130
  • 131. Min Hash From 100k images.... From 100k images Visual Object Recognition                 Perceptual Computing Seminar                        Page 131
  • 132. Min Hash From 100k images.... From 100k images Visual Object Recognition                 Perceptual Computing Seminar                        Page 132
  • 133. Min Hash From 100k images.... From 100k images Representatives of the largest clusters Visual Object Recognition                 Perceptual Computing Seminar                        Page 133