SlideShare una empresa de Scribd logo
1 de 174
Descargar para leer sin conexión
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg




                    ICVSS 2011: Selected Presentations

                                Angel Cruz and Andrea Rueda

                 BioIngenium Research Group, Universidad Nacional de Colombia


                                           August 25, 2011




                        Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg

    Outline


    1 ICVSS 2011


    2 A Trillion Photos - Steven Seitz

    3 Efficient Novel Class Recognition and Search - Lorenzo
       Torresani

    4 The Life of Structured Learned Dictionaries - Guillermo Sapiro


    5 Image Rearrangement & Video Synopsis - Shmuel Peleg




                        Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg

    Outline


    1 ICVSS 2011


    2 A Trillion Photos - Steven Seitz

    3 Efficient Novel Class Recognition and Search - Lorenzo
       Torresani

    4 The Life of Structured Learned Dictionaries - Guillermo Sapiro


    5 Image Rearrangement & Video Synopsis - Shmuel Peleg




                        Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg

    ICVSS 2011
    International Computer Vision Summer School




         15 speakers, from USA, France, UK, Italy, Prague and Israel


                        Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg

    ICVSS 2011
    International Computer Vision Summer School




                        Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg

    ICVSS 2011
    International Computer Vision Summer School




                        Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg

    Outline


    1 ICVSS 2011


    2 A Trillion Photos - Steven Seitz

    3 Efficient Novel Class Recognition and Search - Lorenzo
       Torresani

    4 The Life of Structured Learned Dictionaries - Guillermo Sapiro


    5 Image Rearrangement & Video Synopsis - Shmuel Peleg




                        Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
A Trillion Photos

         Steve Seitz
  University of Washington
           Google

Sicily Computer Vision Summer School
            July 11, 2011
Facebook




   >3 billion uploaded each month




    ~ trillion photos taken each year
What do you do with a trillion photos?



         Digital Shoebox
        (hard drives, iphoto, facebook...)
?
Comparing images




    Detect features using SIFT [Lowe, IJCV 2004]
Comparing images




Extraordinarily robust image matching
  – Across viewpoint (~60 degree out-of-plane rotations)
  – Varying illumination
  – Real-time implementations
Edges
Scale Invariant Feature Transform




                                    0               2π
                                        angle histogram




         Adapted from slide by David Lowe
NASA Mars Rover images
NASA Mars Rover images
with SIFT feature matches
 Figure by Noah Snavely
Coliseum
                                       (outside)




St. Peters (inside)
                         Coliseum
                                          St. Peters (outside)
                          (inside)




                                     Il Vittoriano
Trevi Fountain

                      Forum
Structure from motion




   Matched photos       3D structure
Structure from motion
aka “bundle adjustment”     (texts: Zisserman; Faugeras)
                            p4
                 p1                         p3         minimize
                                  p2
                                                    f (R, T, P)
                p5                           p7
                            p6




 Camera 1                                                  Camera 3
  R1,t1          Camera 2
                                                            R3,t3
                  R2,t2
?
Reconstructing Rome
In a day...

From ~1M images
Using ~1000 cores

Sameer Agarwal, Noah Snavely, Rick Szeliski, Steve Seitz
http://grail.cs.washington.edu/rome
Rome 150K: Colosseum
Rome: St. Peters
Venice (250K images)
Venice: Canal
Dubrovnik
From Sparse to Dense




      Sparse output from the SfM system
From Sparse to Dense




   Furukawa, Curless, Seitz, Szeliski, CVPR 2010
Most of our photos don’t look like this
recognition + alignment
Your Life in 30 Seconds




    path optimization
Picasa Integration
• As “Face Movies” feature in v3.8
 – Rahul Garg, Ira Kemelmacher
Conclusion

trillions of photos
         +    computer vision breakthroughs

      = new ways to see the world
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg

    Outline


    1 ICVSS 2011


    2 A Trillion Photos - Steven Seitz

    3 Efficient Novel Class Recognition and Search - Lorenzo
       Torresani

    4 The Life of Structured Learned Dictionaries - Guillermo Sapiro


    5 Image Rearrangement & Video Synopsis - Shmuel Peleg




                        Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
Efficient Novel-Class
Recognition and Search
    Lorenzo Torresani
Problem statement:
           novel object-class search
• Given:            image database              user-provided images
                (e.g., 1 million photos)          of an object class



                                           +

• Want:
 database                                  •   no text/tags available
   images                                  •   query images may
of this class                                  represent a novel class
Application: Web-powered visual search
     in unlabeled personal photos
                     Goal: Find “soccer camp”
                             pictures on my computer
1                     1 Search the Web for images
                         of “soccer camp”
                      2 Find images of this visual class
                         on my computer
               2
Application: product search

•   Search of aesthetic products
RBM predictedpredicted labels (47%)
                                      RBM labels (47%)


               Relation to other tasks   sky      sky

                                     building building
                                                tree
                                                bed
                                                          tree
                                                          bed
                                       car       car

                                      novel class
                                        road      road


                                     Input search Ground truth neighbors
                                           image image
                                              Input        Ground truth neighbors 32−RBM 32−RBM                     16384-gist
                                                                                                                             1


                                    query                     retrieved
     image retrieval                                                object categorizationshowingitperce
                                                                                 Figure 6. 6. Curves showing per
                                                                                   Figure Curves
                                                                                 query images that make it int
                                                                                   query images that make into
                                                                                              ofof the query for 1400 image
                                                                                                  the query for a a 1400 imag
                                                                                                   to 5% of the database size.
                                                                                              upup to 5% of the database siz
analogies:                    RBM predictedpredicted labels (56%)
                                         RBM labels (56%)                                     crucial for scalable retrieval th
                                                                                                 crucial for scalable retrieval
- large databases                   tree
                                                from [Nister and Stewenius, ’07]
                                              tree   sky      sky
                                                                                              database make it it to the very
                                                                                                 database make to the very to
                                                                                              is is feasible only for a tiny f
                                                                                                  feasible only for a tiny fra
- efficient indexing                                                                           database grows large. Hence, w
                                                                                                 database grows large. Hence,
                                            building building                                 the curves meet the y-axis. T
                                                                                                 the curves meet the y-axis.
- compact representation      (a)           car        car                                    given in in Table 1 for larger n
                                                                                                 given Table 1 for a a larger
                                sidewalk sidewalkcrosswalkcrosswalk                           conclusions can bebe drawn from
                                                                                                 conclusions can drawn from
                                           road       road                                    improves retrieval performance
                                                                                                 improves retrieval performan
differences:                                                      from neighbors et al., ’07] performance than vocabularies.1
                                                                                                 performance than 2 -norm. En
                                                                                                                  L L2 -norm.
                                      Input image imageGround truth [Philbinneighbors 32−RBM 32−RBM vocabularies. O
                                                Input                                            least for smaller 16384-gist
- simple notions of visual                                           Ground truth             least for smaller
                                                                                              gives much better performance th
                                                                                                 gives much better performance
                              (b)
  relevancy                                                                                   is is setting T.
                                                                                                 setting T.

  (e.g., near-duplicate,
   same object instance,                                                                          settings used by [17].
                                                                                                    settings used by [17].
                                                                                                     The performance with vav
                                                                                                        The performance with
   same spatial layout)       (c)
                               RBM predictedpredicted labels (63%) [Torralba et al., ’08]
                                         RBM labels (63%)          from                           on the full 6376 image databa
                                                                                                    on the full 6376 image data
                                                                                                  the scores decrease with inc
                                                                                                    the scores decrease with in
                                        ceiling     ceiling
                                                                                                  are more images toto confus
                                                                                                    are more images confuse
                               Figure Thewall retrieval performance is is evaluated using a large
                                                     wall performance evaluated using a large
                             Figure 5. 5. The retrieval                                           ofof the vocabulary tree is sh
                                                                                                     the vocabulary tree is show
                             ground truth database (6376 images) with groups ofof four images
                               ground truth database (6376 images) with groups four images
                                  door        door                                                defining the vocabulary tree
                                                                                                    defining the vocabulary tre
                                                  poster    poster
Relation to other tasks
                             novel class
                               search

     image retrieval                         object classification
analogies:                                 analogies:
- large databases                          - recognition of object
- efficient indexing                          classes from a few examples
- compact representation
                                           differences:
differences:                               - classes to recognize are
- simple notions of visual                   defined a priori
  relevancy                                - training and recognition
  (e.g., near-duplicate,                     time is unimportant
   same object instance,                   - storage of features is not an
   same spatial layout)                      issue
Technical requirements of
          novel class-search
• The object classifier must be learned on the fly from
  few examples


• Recognition in the database must have low
  computational cost


• Image descriptors must be compact to allow
  storage in memory
State-of-the-art in
               object classification
Winning recipe: many features + non-linear classifiers
(e.g. [Gehler and Nowozin, CVPR’09])

                                                non-linear
      !"#$%
                                            decision boundary
                                                         !"#$%&#'()*
                                                        +&,-)&.&#(#/*
     ...




                                                          01#-2"#*


      &'()*+),%%
      -'.,()*+/%
       #"0$%
Model evaluation on Caltech256
                45

                40
                     gist
                35   phog
                     phog2pi
                30
 accuracy (%)




                     ssim
                25   bow5000

                20
                                                                    !"#$%&'()*$+'
                15
                                                                          ,
                10                                               '"#*"-"*.%+'/$%0.&$1
                5

                0
                 0   5      10        15       20      25   30
                         number of training examples
Model evaluation on Caltech256
                45

                40   gist
                     phog
                35
                     phog2pi
                30   ssim
 accuracy (%)




                     bow5000                                        !"#$%&'()*$+',
                25   linear combination
                                                                 /$%0.&$'2)(3"#%4)#
                20
                                                                    !"#$%&'()*$+'
                15
                                                                          ,
                10                                               '"#*"-"*.%+'/$%0.&$1
                5

                0
                 0   5      10        15       20      25   30
                         number of training examples
Model evaluation on Caltech256
                                                                     5)#6+"#$%&'()*$+',
                45                                                  /$%0.&$'2)(3"#%4)#'
                40                                                 7%898%8':.+4;+$'<$&#$+'
                                     gist                                 !$%&#"#=>'
                35                   phog                         ?@$A+$&'B'5)C)D"#E'FGH
                                     phog2pi
                30
 accuracy (%)




                                     ssim
                25                   bow5000
                                                                      !"#$%&'()*$+',
                                     linear combination            /$%0.&$'2)(3"#%4)#
                20                   nonlinear combination
                                                                       !"#$%&'()*$+'
                15
                                                                             ,
                10                                                  '"#*"-"*.%+'/$%0.&$1
                5

                0
                 0   5      10        15       20      25    30
                         number of training examples
Multiple kernel combiners
Classification output is obtained by combining many features via
non-linear kernels:
                       F
                                    N
                                     
              h(x) =            βf         kf (x, xn )αn + b
                       f =1          n=1

  sum over features                            sum over training examples



                       !#$%
                      ...




     where
                       '()*+),%%
                       -'.,()*+/%
                        #0$%
m=1
 s. For a kernel function k between         a SVM.
he short-hand notation
                                            Training Same as for averaging.
= k(fm (x), fm (x )),
           Multiple con- 4. Methods: Multiple Kernel Learning
                          kernel learning (MKL)
                    


 nel km  : X × X → R only
espect to image feature fal., 2004; Sonnenburg etapproach toVarma and Ray, 2007] is to
                   [Bach et m . If the             Another al., 2006; perform kernel selection
  to a certain aspect, say, it only con-       a kernel combination during the training phase of th
                                               gorithm. jointly optimizing over
            Learning a non-linear SVM by One prominent instance of this class is MKL
on, then the kernel measures simi-
                                                                                 F
                                                                                 a linear combinati
to this aspect. The subscript m of
nderstood as    a linear combinationobjective ∗ (x, x ) k=(x, x ) =β over(x,fx ) x ) the par
            1. indexing into the set of        kernels k
                                                          is to optimize jointly
                                                 of kernels: ∗ F                   β k (x,
                                                                                  km f      and
                                                                                m
                                                                         m=1 f =1
            2. the SVM parameters: α ∈ RN and b ∈ R of an SVM.
                                               ters
notational convenience, we will de-                MKL was originally introduced in [1]. For efficiency
 e of the m’th feature for a given                                                            
                                 F             in order N obtain sparse, F
                                                         to                  interpretable coefficients,
                                                                                             F
raining samples xi , i = 1,  1 . . . , N                                 
                    min             βf αT Kf α stricts βm ≥ 0 and ,imposes thefconstraintT α βm
                                                + C          L yn b +          β Kf (xn ) m=1
                   α,β,b 2                     Since the scope of this paper is to access the applicab
                               f =1                    n=1                 f =1
                                               of MKL to feature combination rather than its optimiz
 ), km (x, x2 ), . . . , km (x, xN )]T .
                                    F         part we opted to present the MKL formulations in a wa
 aining sample, i.e. x = xi , then = 1,lowing for easier 1, . . . , F
                  subject to             βf       βf ≥ 0, f = comparison with the other methods
h column of the m’th kernel matrix.f =1        write its objective function as
                                                        F
 ernel selection In this papert) = max(0, 1 − yt) 1 
                where     L(y, we
                                         min                βm αT Km α
classifiers that aim to combine sev-                  2 m=1
                          Kf (x) = [kf (x, x1 ), kf (x, x2 ), . . . , kf (x, xN )]T
                                         α,β,b
e model. Since we associate image
                                                         N                 F
ctions, kernel combination/selection
                                                     +C       L(yi , b +       βm Km (x)T α)
LP-β: a two-stage approach to MKL
 ! [Gehler and Nowozin, 2009]
• Classification output of traditional MKL:
                    F
                                   N
                                                            
                                   
   hM KL (x) =             βf           kf (x, xn )αn + b
                    f =1          n=1

• Classification function of LP-β:
                                                       
            F
                          N
                           
   h(x) =          βf            kf (x, xn )αf n + bf
            f =1
                          n=1
                                                  
                                 hf (x)
  Two-stage training procedure:
  1. train each hf (x) independently → traditional SVM learning
  2. optimize over β → a simple linear program
LP-β for novel-class search?
The LP-β classifier:
                     F
                                    N
                                                                 
                                    
            h(x) =          βf           kf (x, xn )αf n + bf
                     f =1        n=1

 sum over features                           sum over training examples

Unsuitable for our needs due to:
• large storage requirements (typically over 20K bytes/image)
• costly evaluation (requires query-time kernel distance
  computation for each test image)
• costly training (1+ minute for O(10) training examples)
Classemes: a compact descriptor for
   efficient recognition [Torresani et al., 2010]
                      !
Key-idea: represent each image x in terms of its “closeness”
          to a set of basis classes (“classemes”)
        x
                          Φ(x) = [φ1 (x), . . . , φC (x)]T
                                                          F
                                                                         N
                                                                          
                         φc (x) = hclassemec (x) =                    c
                                                                     βf         kf (x, xc )αn + bc
                                                                                        n
                                                                                            c

                                                          f =1            n=1
                                   output of a pre-learned LP-β for the c-th basis class
                                               Φ(x1 )          ...         Φ(xN )
Query-time learning:                                                                     training
                                                                                       examples of
train a linear classifier on Φ(x)                                                       novel  class
                                                   
                                         C
                                                       F
                                                                    N
                                                                     
 g duck (Φ(x); wduck ) = Φ(x)T wduck =         wc 
                                                duck            c
                                                               βf          kf (x, xc )αn + bc 
                                                                                   n
                                                                                       c

                                         c=1
                                                        
                                                        f =1         n=1
                                                                                          
                                                          LP-β trained before the
                    trained at query-time
                                                          creation of the database
How this works...
                                   Efficient Object Category Recognition Using Classemes                                 777

  • Accurate weighted classemes. Five classemes with the highest LP-β weights
Table 1. Highly
                    semantic labels are not required...

to
  •make semantic sense, but it should bejust used that detectors may create
for the retrieval experiment, for a selection of Caltech 256 categories. Somefor appear
     Classeme classifiers are emphasized as our goal is simply to
        specific patterns of texture, color, shape, etc.
a useful feature vector, not to assign semantic labels. The somewhat peculiar classeme
labels reflect the ontology used as a source of base categories.

!#$%'()*+$                                                 ,-(./+$#-(.'0$%/1121$
%)#3)+4.'$      !#$%              '()*%'+%*,-.     -,.+(,/                -)##-%01#     $2330/+(,/

05%6$            1)$1*+(#,/       1)45+)3+6,%* '60$$*                    6,#.0/7         '%*,07!%
                                      12##+$,#+!*4+
/6$             3072*+'.,%*                         -,%%#                   7*,8'0%       4,4+1)45
                                      ,/0$,#
7*-13$         6,%*-*,3%+'2*3,- '-'0+-,1#       ,#,*$+-#)-.             !0/42           '*80/7+%*,5
                                                                                                    6'%*/+!$0'(!*+
'*-/)3-'4898$   -)/89+%!0/7       $0/4+,*,       -4(#,5*                 *),'%0/7+(,/
                                                                                                    (*')/
                 %,.0/7+-,*+)3+                                                                 -)/%,0/*+(*''2*+
#./3**)#$                          1,77,7+()*%* -,/)(5+-#)'2*+)(/ *)60/7+'!##
                 ')$%!0/7                                                                         1,**0*




   Large-scale recognition benefits from a compact descriptor for each image,
for example allowing databases to be stored in memory rather than on disk. The
bject Classes by Between-Class Attribute Transfer
      Hannes Nickisch       Stefan Harmeling

                                          Related work
or Biological Cybernetics, T¨ bingen, Germany
                            u
 me.lastname}@tuebingen.mpg.de




               •
                   otter


 when train-
                   Attribute-based recognition:
                    black:
                    white:
                               yes
                               no
                    brown:     yes
examples of         stripes:   no
hardly been         water:     yes
                   [Lampert et al., CVPR’09]             [Farhadi et al., CVPR’09]
                    eats fish: yes
 rule rather
ens of thou-       polar bear
                   black:       no
 very few of       white:       yes
d annotated        brown:       no
                   stripes:     no
                   water:       yes
 introducing       eats fish:   yes
ct detection   zebra
ption of the   black:     yes
 description   white:     yes

                       requires hand-specified attribute-class associations
               brown:     no
 hape, color
s. On the left
h properties
               stripes:
               water:
                          yes
                          no
ribute be
 hey can predic-
               eats fish: no

  to
      displayed.       attribute classifiers must be trained with
arethe cur- Figure 1. A description object categories: after learningthe transfer
                                    by high-level attributes allows
ected based   of knowledge between                                     the visual
ed for a new cat-      human-labeled examples
ve across appearance of attributes from any classes with training examples,
and to “engine”,can detect also object classes that do not have any training
 ike facil- we based on which attribute description a test image fits best. randomly selected positively pre
  new large-  images,        Figure 5:            This figure shows
election helps
  30,000 an-                  tributes for 12 typical images from 12 categories in Yahoo set.
nd “rein” that of well-labeled training imageslearnedtechniques
rson’s clas-  lions                                     and is likely out of
                              classifiers are numerous on Pascal train set and tested on Yahoo se
              reach for years to come. Therefore,
 emantic at-
 one class outreducing the number of necessary training imagesattributes from the list of 64 attributes a
              for             domly select 5 predicted have
Method overview
1. Classeme learning

                                     φ”body of water” (x) → 




                             ...
                                     φ”walking” (x) → 


2. Using the classemes for recognition and retrieval
  training examples of novel class
                                                          C
                                                          
                                        g duck (Φ(x)) =         wc φc (x)
                                                                 duck

                                                          c=1

     Φ(x1 )     ...   Φ(xN )
Classeme learning:
         choosing the basis classes
•   Classeme labels desiderata:

     -   must be visual concepts

     -   should span the entire space of visual classes

•   Our selection:
    concepts defined in the Large Scale Ontology for Multimedia
    [LSCOM] to be “useful, observable and feasible for automatic
    detection”.
              2659 classeme labels, after manual elimination of
              plurals, near-duplicates, and inappropriate concepts
Classeme learning:
      gathering the training data
•   We downloaded the top 150 images returned by
    Bing Images for each classeme label
• For each of the 2659 classemes, a one-versus-the-rest
    training set was formed to learn a binary classifier
                     φ”walking” (x)

               yes                    no
Classeme learning:
          training the classifiers
• Each classeme classifier is an LP-β kernel combiner
 [Gehler and Nowozin, 2009]:
                F
                      N                          
                       
         φ(x) =   βf         kf (x, xn )αf,n + bf
                 f =1        n=1

                 linear combination of feature-specific SVMs

• We use 13 kernels based on spatial pyramid histograms
 computed from the following features:
  - color GIST [Oliva and Torralba, 2001]
  - oriented gradients [Dalal and Triggs, 2009]
  - self-similarity descriptors [Schechtman and Irani, 2007]
  - SIFT [Lowe, 2004]
A dimensionality reduction
   
       view of classemes
     
      GIST
     
                                                          
  
  
  
      
       self-similarity
       descriptor                 Φ         
                                                φ1 (x)
                                                  ...    
x=
  
      
      
                                             φ2659 (x)
      oriented
     
      gradients
     
                                     • near state-of-the-art accuracy
         SIFT                            with linear classifiers
                                       • can be quantized down to
 • non-linear kernels are needed         200 bytes/image with almost
  for good classification                 no recognition loss
 • 23K bytes/image
Experiment 1: multiclass
         recognition on Caltech256
               60                                                          LP-β in [Gehler 
                    LPbeta                                                 Nowozin, 2009]
                    LPbeta13                                               using 39 kernels
               50   MKL
                    Csvm                                                   LP-β with our x
                    Cq1svm
               40   Xsvm                                                   our approach:
                                                                           linear SVM with
accuracy (%)




                                                                           classemes Φ(x)
               30
                                                                       linear SVM with
                                                                       binarized classemes,
               20                                                      i.e. (Φ(x)  0)

                                                                          linear SVM with x
               10


               0
                0      10          20              30        40   50
                               number of training examples
Computational cost
                               comparison
                            Training time                            Testing time
                 1500                                      40

                        23 hours                           30
time (minutes)




                 1000




                                               time (ms)
                                                           20

                  500
                                   9 minutes               10


                    0                                       0
                         LPbeta       Csvm                      LPbeta        Csvm
Accuracy vs. compactness
                                      4
                                     10




                                                                                                188 bytes/image
       compactness (images per MB)



                                      3
                                     10

                                                                                                2.5K bytes/image
                                      2
                                     10



                                               LPbeta13                                         23K bytes/image
                                      1        Csvm
                                     10
                                               Cq1svm
                                               nbnn [Boiman et al., 2008]                       128K bytes/image
                                               emk [Bo and Sminchisescu, 2008]
                                               Xsvm
                                      0
                                     10
                                          10   15      20       25        30     35   40   45
                                                                accuracy (%)


Lines link performance at 15 and 30 training examples
Experiment 2:
                         object class retrieval
                                  Efficient Object Category Recognition Using Classemes              787


                    30
                                                                        Csvm
                                                                        Cq1Rocchio (β=1, γ=0)
                    25
                                                                        Cq1Rocchio (β=0.75, γ=0.15)
Precision @ 25 25




                                                                        Bowsvm
Precision (%) @




                    20                                                  BowRocchio (β=1, γ=0)
                                                                        BowRocchio (β=0.75, γ=0.15)
                    15

                                                                  • Random performance is 0.4%
                    10
                                                                  • training Csvm takes 0.6 sec with
                                                                    5*256 training examples
                    5

                    0
                     0   10        20          30       40   50
                              Number of training images


  Fig. 4. Retrieval. Percentage of the top 25 in a 6400-document set which match the
  query class. Random performance is 0.4%.
Analogies with text retrieval
• Classeme representation of an image:
                        presence/absence of visual attributes




• Bag-of-word representation of a text-document:
                           presence/absence of words
Related work
•       Prior work (e.g., [Sivic  Zisserman, 2003; Nister  Stewenius, 2006;
        Philbin et al., 2007]) has exploited a similar analogy for
        object-instance retrieval by representing images as bag of visual words
                              Detect interest patches         Compute SIFT descriptors [Lowe, 2004]

                                                                                            …
             …




                                                                             Quantize
                              Represent image as a sparse
                                                                            descriptors
                               histogram of visual words
                  frequency




                                                        …..
                                        codewords




    •    To extend this methodology to object-class retrieval we need:
         - to use a representation more suited to object class recognition
           (e.g. classemes as opposed to bag of visual words)
         - to train the ranking/retrieval function for every new query-class
Data structures for
                            efficient retrieval
            Incidence matrix:                           Inverted index:
                      features
                f0   f1   f2   f3   f4   f5   f6   f7    f0 f1 f2 f3 f4 f5 f6 f7
             I0: 1    0    1    0    0    1    0    0
             I1: 0    0    1    0    1    0    0    0    I0 I2 I0 I2 I1   I0 I4 I6
documents




             I2: 1    1    0    1    0    0    0    0    I2 I7 I1 I3 I4   I6 I5 I9
             I3: 1    0    1    1    0    0    0    0
             I4: 1    0    0    0    1    0    1    0    I3 I8 I3 I9 I5   I8
             I5: 0    0    0    0    1    0    1    0    I4          I7   I9
             I6: 1    0    0    0    0    1    0    1    I6          I9
             I7: 0    1    0    0    1    0    0    0    I8
             I8: 1    1    0    0    0    1    0    0
             I9: 0    0    0    1    1    1    0    1
                                                        • enables efficient calculation
                                                          of w Φ, as:
                                                                T
                                                                    ∀Φ
            • very compact: only one bit                           
              per feature entry                                                wi Φi
                                                               i s.t. Φi =0
Efficient retrieval via
            inverted index
                                    Inverted index:
                                 w: [1.5 -2   0 -5   0    3 -2   0 ]
                                      f0 f1 f2 f3 f4 f5 f6 f7

                                      I0 I2 I0 I2 I1     I0 I4 I6
                                      I2 I7 I1 I3 I4     I6 I5 I9
                                      I3 I8 I3 I9 I5     I8
                                      I4          I7     I9
                                      I6          I9
                                      I8



Goal:
compute score w T Φ, for all binary vectors Φ in the database
                       ∀Φ
Efficient retrieval via
         inverted index
                          Inverted index:
                       w: [1.5 -2   0 -5   0    3 -2   0 ]
                            f0 f1 f2 f3 f4 f5 f6 f7

                            I0 I2 I0 I2 I1     I0 I4 I6
                            I2 I7 I1 I3 I4     I6 I5 I9
                            I3 I8 I3 I9 I5     I8
                            I4          I7     I9
                            I6          I9
                            I8




Scoring:
           I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
Efficient retrieval via
         inverted index
                          Inverted index:
                       w: [1.5 -2   0 -5   0    3 -2   0 ]
                            f0 f1 f2 f3 f4 f5 f6 f7

                            I0 I2 I0 I2 I1     I0 I4 I6
                            I2 I7 I1 I3 I4     I6 I5 I9
                            I3 I8 I3 I9 I5     I8
                            I4          I7     I9
                            I6          I9
                            I8




Scoring:
           I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
Efficient retrieval via
         inverted index
                          Inverted index:
                       w: [1.5 -2   0 -5   0    3 -2   0 ]
                            f0 f1 f2 f3 f4 f5 f6 f7

                            I0 I2 I0 I2 I1     I0 I4 I6
                            I2 I7 I1 I3 I4     I6 I5 I9
                            I3 I8 I3 I9 I5     I8
                            I4          I7     I9
                            I6          I9
                            I8




Scoring:
           I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
Efficient retrieval via
         inverted index
                          Inverted index:
                       w: [1.5 -2   0 -5   0    3 -2   0 ]
                            f0 f1 f2 f3 f4 f5 f6 f7

                            I0 I2 I0 I2 I1     I0 I4 I6
                            I2 I7 I1 I3 I4     I6 I5 I9
                            I3 I8 I3 I9 I5     I8
                            I4          I7     I9
                            I6          I9
                            I8




Scoring:
           I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
Efficient retrieval via
         inverted index
                          Inverted index:
                       w: [1.5 -2   0 -5   0    3 -2   0 ]
                            f0 f1 f2 f3 f4 f5 f6 f7

                            I0 I2 I0 I2 I1     I0 I4 I6
                            I2 I7 I1 I3 I4     I6 I5 I9
                            I3 I8 I3 I9 I5     I8
                            I4          I7     I9
                            I6          I9
                            I8




Scoring:
           I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
Efficient retrieval via
             inverted index
                                    Inverted index:
                                 w: [1.5 -2   0 -5   0    3 -2   0 ]
                                      f0 f1 f2 f3 f4 f5 f6 f7

                                      I0 I2 I0 I2 I1     I0 I4 I6
                                      I2 I7 I1 I3 I4     I6 I5 I9
                                      I3 I8 I3 I9 I5     I8
                                      I4          I7     I9
                                      I6          I9
                                      I8




Cost of scoring is linear in the sum of the lengths of inverted
lists associated to non-zero weights
Improve efficiency via
              sparse weight vectors
Key-idea: force w to contain as many zeros as possible
                                                     classeme vector      label of
Learning objective                                     of example n
                                  Tomographic inversion with             example n
                                                                       1 wavelet penalization      3
                                                N
              E(w) = R(w) +                 C
                                            N        n=1   L(w; Φn , yn )
     w2
                 regularizer                     loss function
                   w with d = AWT w and smallest 1 -norm

•
                                        T
    L2-SVM:        R(w) d =wT w w and smallestn ,2yn ) = max(0, 1 − yn (wT Φn ))
                   w with = AW
                                , L(w; Φ -norm
                       d = AWT w
•                   2
    Since |wi |  wi for small wi                                               w  2
                                                                                w 2i
                                                                                           |wi |
    and |wi |  wi for large wi , w1
                  2
                            
    choosing R(w) = i |wi | will tend to                                          |w|

    produce a small number of larger
                                                                                           wi
    weights and 2 -ball: wzero2 weights
                more 1 + w2 = constant
                          2
                                                                                       w

               1 -ball:   |w1 | + |w2 | = constant
Improve efficiency via
              sparse weight vectors
Key-idea: force w to contain as many zeros as possible
                                              classeme vector        label of
Learning objective                              of example n        example n
                                          N
              E(w) = R(w) +           C
                                      N      n=1   L(w; Φn , yn )
                 regularizer                        loss function


•   L2-SVM:       R(w) = wT w ,           L(w; Φn , yn ) = max(0, 1 − yn (wT Φn ))
                     
•   L1-LR:     R(w) = i |wi | ,           L(w; Φn , yn ) = log(1 + exp(−yn wT Φn ))

•   FGM (Feature Generating Machine) [Tan et al., 2010]:
               R(w) = wT w ,         L(w; Φn , yn ) = max(0, 1 − yn (w ⊙ d)T Φn )
                   s.t.        1T d ≤ B        d ∈ {0, 1}D             elementwise product
Performance evaluation on
                            ImageNet (10M images)
                     35
                                                                                          ! [Rastegari et al., 2011]
                                                                                   35
                                                                                            Full inner product evaluation L2 SVM
                     30
                                                                                            Full inner product evaluation L1 LR
                                                                                   30
                                                                                            Inverted index L2 SVM
Precision @ 10 (%)




                     25
                                                                                            Inverted index L1 LR




                                                              Precision @ 10 (%)
                                                                                   25
                     20
                                                                                   20   • Performance averaged over 400 object
                     15                                                                 classes used as queries
                                                                                   15 • 10 training examples per query class
                     10
                                                                                   10
                                                                                      • Database includes 450 images of the query
                                                                                        class and 9.7M images of other classes
                     5
                                                                                    5 •
                                                                                        Prec@10 of a random classifiers is 0.005%
                     0
                     20   40     60      80      100    120      140
                           Search time per query (seconds)          0
                                                                     20                      40     60      80      100    120     140
    Each curve is obtained by varying sparsity through C in training objective                Search time per query (seconds)

                                                              N
                               E(w) = R(w) +              C
                                                          N                        n=1     L(w; Φn , yn )
                                    regularizer                                             loss function
Top-k ranking
• Do we need to rank the entire database?
  - users only care about the top-ranked images




• Key idea:
  - for each image iteratively update an upper-bound and
     a lower-bound on the score
  - gradually prune images that cannot rank in the top-k
Top-k pruning
                                             ! [Rastegari et al., 2011]

w: [   3 -2   0 -6      0    3 -2      0 ]
                                               • Highest possible score:
                                                 for binary vector ΦU s.t.
    f0
 I0: 1
         f1
          0
              f2
               1
                   f3
                    0
                        f4
                         0
                             f5
                              1
                                  f6
                                   0
                                       f7
                                        0
                                                      ΦU = 1 iff wi  0
                                                        i
 I1: 0    0    1    0    1    0    0    0
 I2: 1    1    0    1    0    0    0    0        → initial upper bound
 I3: 1    0    1    1    0    0    0    0
 I4: 1    0    0    0    1    0    1    0            u∗ = wT · ΦU (6 in this case)
 I5: 0    0    0    0    1    0    1    0
 I6: 1    0    0    0    0    1    0    1
 I7: 0
 I8: 1
          1
          1
               0
               0
                    0
                    0
                         1
                         0
                              0
                              1
                                   0
                                   0
                                        0
                                        0
                                               • Lowest possible score:
 I9: 0    0    0    1    1    1    0    1        for binary vector ΦL s.t.
                                                      ΦL = 1 iff wi  0
                                                        i
                                                 → initial lower bound
                                                    l∗ = wT · ΦL (-10 in this case)
Top-k pruning
                                             ! [Rastegari et al., 2011]

w: [   3 -2   0 -6      0    3 -2      0 ]       •   Initialization: u∗ , l∗ for all images
                                                        upper bound
    f0   f1   f2   f3   f4   f5   f6   f7
 I0: 1    0    1    0    0    1    0    0
 I1: 0    0    1    0    1    0    0    0
 I2: 1    1    0    1    0    0    0    0
 I3: 1    0    1    1    0    0    0    0
 I4: 1    0    0    0    1    0    1    0
 I5: 0    0    0    0    1    0    1    0    0
 I6: 1    0    0    0    0    1    0    1
 I7: 0    1    0    0    1    0    0    0
 I8: 1    1    0    0    0    1    0    0
 I9: 0    0    0    1    1    1    0    1



                                                 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
                                                      lower bound
Top-k pruning
                                                ! [Rastegari et al., 2011]

w: [     3 -2    0 -6      0    3 -2      0 ]

    f0      f1   f2   f3   f4   f5   f6   f7
 I0: 1       0    1    0    0    1    0    0
 I1: 0       0    1    0    1    0    0    0
 I2: 1       1    0    1    0    0    0    0    0
 I3: 1       0    1    1    0    0    0    0
 I4: 1       0    0    0    1    0    1    0
 I5: 0       0    0    0    1    0    1    0
 I6: 1       0    0    0    0    1    0    1
 I7: 0       1    0    0    1    0    0    0
 I8: 1       1    0    0    0    1    0    0
 I9: 0       0    0    1    1    1    0    1
                                                    I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
 •     Load feature i
 •     Since wi = +3 (0), for each image n:
       - subtract +3 from the upper bound if φn,i = 0
       - add +3 to the lower bound if φn,i = 1
Top-k pruning
                                                ! [Rastegari et al., 2011]

w: [     3 -2    0 -6      0    3 -2      0 ]

    f0      f1   f2   f3   f4   f5   f6   f7
 I0: 1       0    1    0    0    1    0    0
 I1: 0       0    1    0    1    0    0    0
 I2: 1       1    0    1    0    0    0    0    0
 I3: 1       0    1    1    0    0    0    0
 I4: 1       0    0    0    1    0    1    0
 I5: 0       0    0    0    1    0    1    0
 I6: 1       0    0    0    0    1    0    1
 I7: 0       1    0    0    1    0    0    0
 I8: 1       1    0    0    0    1    0    0
 I9: 0       0    0    1    1    1    0    1
                                                    I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
 •     Load feature i
 •     Since wi = -2 (0), for each image n:
       - decrement by 2 the upper bound if φn,i = 1
       - increment by 2 the lower bound if φn,i = 0
Top-k pruning
                                                ! [Rastegari et al., 2011]

w: [     3 -2    0 -6      0    3 -2      0 ]

    f0      f1   f2   f3   f4   f5   f6   f7
 I0: 1       0    1    0    0    1    0    0
 I1: 0       0    1    0    1    0    0    0
 I2: 1       1    0    1    0    0    0    0    0
 I3: 1       0    1    1    0    0    0    0
 I4: 1       0    0    0    1    0    1    0
 I5: 0       0    0    0    1    0    1    0
 I6: 1       0    0    0    0    1    0    1
 I7: 0       1    0    0    1    0    0    0
 I8: 1       1    0    0    0    1    0    0
 I9: 0       0    0    1    1    1    0    1
                                                    I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
 •     Load feature i
 •     Since wi = -6 (0), for each image n:
       - decrement by 6 the upper bound if φn,i = 1
       - increment by 6 the lower bound if φn,i = 0
Top-k pruning
                                                ! [Rastegari et al., 2011]

w: [   3 -2      0 -6      0    3 -2      0 ]

       f0   f1   f2   f3   f4   f5   f6   f7
    I0: 1    0    1    0    0    1    0    0
    I1: 0    0    1    0    1    0    0    0
    I2: 1    1    0    1    0    0    0    0    0
    I3: 1    0    1    1    0    0    0    0
    I4: 1    0    0    0    1    0    1    0
    I5: 0    0    0    0    1    0    1    0
    I6: 1    0    0    0    0    1    0    1
    I7: 0    1    0    0    1    0    0    0
    I8: 1    1    0    0    0    1    0    0
    I9: 0    0    0    1    1    1    0    1
                                                    I0 I1 I2 I3 I4 I5 I6 I7 I8 I9

•   Suppose k = 4:
    we can prune I2,I9 since they cannot rank in the top-k
Distribution of weights and
                                                               pruning rate
CCV
 CV                                                                                                                                                                          IC
1745
 745                                                                                                                                                                          #
                                                                                                                                                                             #1
                                                             ICCV 2011 Submission #1745. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
                                                            ICCV 2011 Submission #1745. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.



540
40
                                                   11                                                          100
                                                                                                                100
                                                                                    L1−LR
                                                                                     L1−LR
           Distribution absolute weight values
          Distribution of absolute weight values




41
541
          normalized of absolute weight values




42
542                                                                                 L2−SVM
                                                                                     L2−SVM
43
543                                          0.8
                                              0.8                                   FGM
                                                                                     FGM                               80
                                                                                                                        80




                                                                                                  % of images pruned
                                                                                                  % of images pruned
44
544                                                                                                                                                 TkP L1−LR, k=10
                                                                                                                                                     TkP L1−LR, k=10
45
545                                                                                                                                                 TkP L1−LR, k=3000
                                                                                                                                                     TkP L1−LR, k=3000
                                             0.6
                                              0.6                                                                      60
                                                                                                                        60
46
546                                                                                                                                                 TkP L2−SVM, k=10
                                                                                                                                                     TkP L2−SVM, k=10
47
547                                                                                                                                                 TkP L2−SVM, k=3000
                                                                                                                                                     TkP L2−SVM, k=3000
48
548                                          0.4
                                              0.4                                                                      40
                                                                                                                        40                          TkP FGM, k=10
                                                                                                                                                     TkP FGM, k=10
49
549                                                                                                                                                 TkP FGM, k=3000
                                                                                                                                                     TkP FGM, k=3000
50
550
                                             0.2
                                              0.2                                                                      20
                                                                                                                        20
51
551
52
552
53
553                               00                                                                                   00
54
554                            aa 00                     500
                                                          500   1000
                                                                 1000   1500
                                                                         1500
                                                                   Dimension
                                                                                2000
                                                                                 2000   2500
                                                                                         2500           bb              00    500
                                                                                                                               500     1000
                                                                                                                                        1000 1500  1500 2000  2000
                                                                                                                                     Number ofof iterations (d)
                                                                                                                                               iterations (d)
                                                                                                                                                                     2500
                                                                                                                                                                      2500
                                                                    Dimension                                                         Number
55
555
56
556    Figure 2. (a) Distribution of weight absolute values for different classifiers (after sorting the weight magnitudes). TkP runs faster with
        Figure 2. (a) Distribution of weight absolute values for different classifiers (after sorting the weight magnitudes). TkP runs faster with
57
557
                                                         Features considered in descending order of |wi |
       sparse, highly skewed weight values. (b) Pruning rate of TkP for various classification model and different values ofof k (k = 10, 3000).
        sparse, highly skewed weight values. (b) Pruning rate of TkP for various classification model and different values k (k = 10, 3000).
58
558
59
559
60
560    aa smaller value of kk allows the method to eliminate more
           smaller value of allows the method to eliminate more
61     images from consideration at aavery early stage.                                                                 20
                                                                                                                         20             v=128
561     images from consideration at very early stage.                                                                                   v=128
                                                                                                                                            8
                                                                                                                                                                  v=256
                                                                                                                                                                    v=256
62                                                                                                                                      w=2 8              v=256
                                                                                                                                                             v=256 w=28 8
562                                                                                                                                       w=2                  6
                                                                                                                               v=64
                                                                                                                                v=64                       w=2 6 w=2
                                                                                                                                                             w=2
63
Performance evaluation on            35

                           ImageNet (10M images)               30

                     35                                                          ! [Rastegari et al., 2011]

                                          Precision @ 10 (%)
                                                               25
                     30                                                          TkP L1−LR
                                                               20
                                                                                 TkP L2−SVM
                                                                                 Inverted index L1−LR
Precision @ 10 (%)




                     25
                                                               15
                                                                                 Inverted index L2−SVM
                     20
                                                               10             • k = 10
                     15
                                                                              • Performance averaged over 400 object
                                                               5      classes used as queries
                     10                                             • 10 training examples per query class
                                             0
                                              0     50              •
                                                          100 150 Database includes 450 images of the query
                     5                   Search time per query (seconds) and 9.7M images of other classes
                                                                      class
                                                                    • Prec@10 of a random classifiers is 0.005%
                     0
                      0          50              100                    150
                           Search time per query (seconds)

Each curve is obtained by varying sparsity through C in training objective
                                                                        N
                             E(w) = R(w) +                          C
                                                                    N    n=1      L(w; Φn , yn )
                                   regularizer                                     loss function
Alternative search strategy:
        approximate ranking
•   Key-idea: approximate the score function with a measure that can
    computed (more) efficiently (related to approximate NN search:
    [Shakhnarovich et al., 2006; Grauman and Darrell, 2007; Chum et al.,
    2008])
•   Approximate ranking via vector quantization:
      wT Φ ≈ wT q(Φ)                                !
                                                     q(!)
    where q(.) is a quantizer returning
    the cluster centroid nearest to Φ

•   Problem:
    - to approximate well the score we need a fine quantization
    - the dimensionality of our space is D=2659:
      too large to enable a fine quantization using k-means clustering
Product quantization
                    !
        Product quantization for nearest neighbor search
                                                                                                                 [Jegou et al., 2011]
 • Split feature vector ! into v subvectors:                                                               !  [ !1 | !2 | ... | !v ]
              Vector split into m subvectors:
 • Subvectors are quantized separately by quantizers
              Subvectors are quantized separately by quantizers
                       q(!) = [ q1(!1) | q2(!2) | ... | qv(!v) ]
      where each qi(.) is learned in a space of dimensionality D/v
                        where each           is learned by k-means with a limited number of centroids

 • Example from [Jegou vector split in 8 subvectors of dimension 16
      Example: y = 128-dim
                           et al., 2011]:
      ! is a 128-dimensional vector split into 8 subvectors of dimension 16
      16 components
16 components
                   y1               y2                     y3              y4                y5             y6              y7             y8
                   !1                  !2                       !3              !4                   !5               !6             !7            !8
                                                    xedni noitazitnauq tib-46
               stib 8

           256 ) 1 y( 1 q
                q
                              ) 2 y( 2 q
                                    q2
                                              ) 3 y( 3 q
                                                           q3
                                                                 )4y(4q
                                                                           q4
                                                                                   )5y(5q
                                                                                             q5
                                                                                                  )6y(6q
                                                                                                            q6
                                                                                                              )7y(7q        )8y(8q
                                                                                                                            q7             q8
 28   = 256
      centroids 1
 centroids
                   q1                 q2
                                       1                    q3
                                                             1                  q4
                                                                                 1                   q5               q6             q7            q8
       sdiortnec   1q            2q              3q                  4q              5q              6q          7q          8q
       652
               q1(y1)           q2(y2)              q3(y3)                q4(y4)            q5(y5)         q6(y6)          q7(y7)         q8(y8)
           q1(!1) q2(!2) q3(!3) q4(!4)
                   1
                   1y 1   1  1   1
                                 2y 1            3y                  4y              5y       q5(!5) q6(!6) q7(!7) q8(!8)
                                                                                                     6y          7y          8y
              8 bits
        stnenopmoc 61
                                                            64-bit quantization index
               8 bits
                                                      64-bit quantization index
               61 noisnemid fo srotcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE

                        hcae erehw         sdiortnec fo rebmun detimil a htiw snaem-k yb denrael si
obhgien tseraen rof noitazitnauq tcudorP
                   :srotcevbus m otni tilps rotceV
                                                                                                          wv
                                                                                                          .
                                                                                                           .   
                                                                                                          .   
tnauq yb yletarapes dezitnauq era srotcevbuS                                                          
                                                                                                         w2   
                                                                                                               
                                                      sub-blocks
                                                                                                          w1
                                                                                                              
 htiw snaem-k yb denrael si
           centroids (r per sub-block)
                                         hcae erehw
                                                                                     1.Filling the look-up table:
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
            look-up table
            can be precomputed and stored in a                     stnenopmoc 61
                                                                                    j=1
5y          4y            3y                2y                  T        1y
                                                               wj qj (Φj )                 wT Φ ≈ wT q(Φ) =
                                                                                    
                                                                                    v
                                                                               652
5q          4q            3q
             Efficient approximate scoring   2q                           1q    sdiortnec
y(5q      )4y(4q        )3y(3q           ) 2 y( 2 q                   ) 1 y( 1 q
                                                                       stib 8
xedni noitazitnauq tib-46
obhgien tseraen rof noitazitnauq tcudorP
                   :srotcevbus m otni tilps rotceV
                                                                                                                wv
                                                                                                                .
                                                                                                                 .   
                                                                                                                .   
tnauq yb yletarapes dezitnauq era srotcevbuS                                                              
                                                                                                               w2   
                                                                                                                     
                                                      sub-blocks
                                           s11                                                                  w1
                                                                                                 in
                                                                                                 ner product        
                                                                        quantization for sub-block 1:
 htiw snaem-k yb denrael si
           centroids (r per sub-block)
                                         hcae erehw
                                                                                     1.Filling the look-up table:
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
            look-up table
            can be precomputed and stored in a                     stnenopmoc 61
                                                                                    j=1
5y          4y            3y                2y                  T        1y
                                                               wj qj (Φj )                 wT Φ ≈ wT q(Φ) =
                                                                                    
                                                                                    v
                                                                               652
5q          4q            3q
             Efficient approximate scoring   2q                           1q    sdiortnec
y(5q      )4y(4q        )3y(3q           ) 2 y( 2 q                   ) 1 y( 1 q
                                                                       stib 8
xedni noitazitnauq tib-46
obhgien tseraen rof noitazitnauq tcudorP
                   :srotcevbus m otni tilps rotceV
                                                                                                                     wv
                                                                                                                     .
                                                                                                                      .   
                                                                                                                     .   
tnauq yb yletarapes dezitnauq era srotcevbuS                                                                 
                                                                                                                    w2   
                                                                                                                          
                                                      sub-blocks
                                                                                                       uct
                                     s11 s12                                                       prod              w1
                                                                                                             inner
                                                                                                                         
                                                                        quantization for sub-block 1:
 htiw snaem-k yb denrael si
           centroids (r per sub-block)
                                         hcae erehw
                                                                                     1.Filling the look-up table:
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
            look-up table
            can be precomputed and stored in a                     stnenopmoc 61
                                                                                    j=1
5y          4y            3y                2y                  T        1y
                                                               wj qj (Φj )                 wT Φ ≈ wT q(Φ) =
                                                                                    
                                                                                    v
                                                                               652
5q          4q            3q
             Efficient approximate scoring   2q                           1q    sdiortnec
y(5q      )4y(4q        )3y(3q           ) 2 y( 2 q                   ) 1 y( 1 q
                                                                       stib 8
xedni noitazitnauq tib-46
obhgien tseraen rof noitazitnauq tcudorP
                    :srotcevbus m otni tilps rotceV
                                                                                                                  wv
                                                                                                                  .
                                                                                                                   .   
                                                                                                                  .   
tnauq yb yletarapes dezitnauq era srotcevbuS                                                                  
                                                                                                                 w2   
                                                                                                                       
                                                       sub-blocks
                                                                                                       duct
        s11 s12 s13 ... ... ... ... ... ... s1r                                                   r pro     i
                                                                                                                  w1
                                                                                                           nne        
                                                                         quantization for sub-block 1:
 htiw snaem-k yb denrael si
           centroids (r per sub-block)
                                          hcae erehw
                                                                                      1.Filling the look-up table:
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
            look-up table
            can be precomputed and stored in a                      stnenopmoc 61
                                                                                     j=1
5y           4y             3y               2y                  T        1y
                                                                wj qj (Φj )                 wT Φ ≈ wT q(Φ) =
                                                                                     
                                                                                     v
                                                                                652
5q          4q              3q
             Efficient approximate scoring    2q                           1q    sdiortnec
y(5q      )4y(4q          )3y(3q          ) 2 y( 2 q                   ) 1 y( 1 q
                                                                        stib 8
xedni noitazitnauq tib-46
obhgien tseraen rof noitazitnauq tcudorP
                    :srotcevbus m otni tilps rotceV
                                                                                                                    wv
                                                                                                                    .
                                                                                                                     .   
                                                                                                                    .   
tnauq yb yletarapes dezitnauq era srotcevbuS                                                                    
                                                                                                                   w2   
                                                                                                                         
        s21                                                                                                in

                                                       sub-blocks
                                                                                                ner prod
                                                                                                        uct         w1
        s11 s12 s13 ... ... ... ... ... ... s1r
                                                                                                                        
                                                                         quantization for sub-block 2:
 htiw snaem-k yb denrael si
           centroids (r per sub-block)
                                          hcae erehw
                                                                                      1.Filling the look-up table:
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
            look-up table
            can be precomputed and stored in a                      stnenopmoc 61
                                                                                     j=1
5y           4y             3y               2y                  T        1y
                                                                wj qj (Φj )                 wT Φ ≈ wT q(Φ) =
                                                                                     
                                                                                     v
                                                                                652
5q          4q              3q
             Efficient approximate scoring    2q                           1q    sdiortnec
y(5q      )4y(4q          )3y(3q          ) 2 y( 2 q                   ) 1 y( 1 q
                                                                        stib 8
xedni noitazitnauq tib-46
xedni noitazitnauq tib-46
                                                     stib 8

                                                     ) 1 y( 1 q                ) 2 y( 2 q          )3y(3q                 )4y(4q            y(5q


                 Efficient approximate scoringsdiortnec
                                             652
                                                         1q                       2q                   3q                      4q            5q


                                          v
                                          
                 wT Φ ≈ wT q(Φ) =                wj qj (Φj )
                                                  T      1y                       2y                   3y                      4y            5y


                                           j=1
                                              stnenopmoc 61            can be precomputed and stored in a
                                                                       look-up table
                                                     tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE

             2.Score each quantized vector q(Φ)
               in the database using the look-up              hcae erehw                    centroids (r per sub-block)
                                                                                              htiw snaem-k yb denrael si
               table:                                                                                                                 s1r
                                                               s11 s12 s13                       ...   ...   ...   ...   ...    ...


                                                                  sub-blocks
                                                               s21 s22 s23                       ...   ...   ...   ...   ...    ...   s2r
            w q(Φ) = w1 q1 (Φ1 ) + w2 q2 (Φ2 ) + . . . + wv qv... ... ) ...
             T            T             T                    T
                                                                 (Φv
                                                     tnauq yb yletarapes dezitnauq era srotcevbuS...   ...   ...   ...   ...    ...   ...
                                                                ... ... ...                      ...   ...   ...   ...   ...    ...   ...
T
    q(Φ)   = w1 q1 (Φ1 ) + w2 q2 (Φ2 ) + . . . + wv qv (Φv )
              T             T                     T
                                                                ... ... ...
                                                     :srotcevbus m otni tilps rotceV             ...   ...   ...   ...   ...    ...   ...
                                                               sv1 sv2 sv3                       ...   ...   ...   ...   ...    ...   svr
                        Only v additions per image!
                                                 obhgien tseraen rof noitazitnauq tcudorP
Choice of parameters
                                                                                             ! [Rastegari et al., 2011]
• Dimensionality is first reduced with PCA from D=2659 to D’  D
• How do we choose D’, v (number of sub-blocks),
 r (number of centroids per sub-block)?
• Effect of parameter choices on a database of 150K images:
                                                      (v,r)
                20
                                                                   8                              8
                         (128,2 )            (256,2 )                                    6
                                    (256,2 )
                                                      6
                                                 (64,2 )
                                 15
            Precision @ 10 (%)




                                                                    6
                                             8
                                                               (64,2 )
                                      (32,2 )
                                                                              (128,28)
                                                                                             D’=512
                                 10      8
                                  (16,2 )                                                    D’=256
                                                           8                  6
                                                  (32,2 )                (64,2 )             D’=128

                                 5                             (32,28)
                                                          8
                                                  (16,2 )
                                        8
                                  (16,2 )
                                 0
                                  0          0.05    0.1     0.15      0.2   0.25                     0.3
                                                Search time per query (seconds)
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations

Más contenido relacionado

Similar a ICVSS2011 Selected Presentations

IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfgrssieee
 
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfgrssieee
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learningNAVER Engineering
 
Iccv2009 recognition and learning object categories p2 c03 - objects and an...
Iccv2009 recognition and learning object categories   p2 c03 - objects and an...Iccv2009 recognition and learning object categories   p2 c03 - objects and an...
Iccv2009 recognition and learning object categories p2 c03 - objects and an...zukun
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonJonathon Hare
 
Computer Vision - cameras
Computer Vision - camerasComputer Vision - cameras
Computer Vision - camerasWael Badawy
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonJonathon Hare
 
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Ingo Frommholz
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2zukun
 
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 
Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)Joris Klerkx
 
Blind Verification of Digital Image Originality: A Statistical Approach
Blind Verification of Digital Image Originality: A Statistical ApproachBlind Verification of Digital Image Originality: A Statistical Approach
Blind Verification of Digital Image Originality: A Statistical ApproachLar21
 
Deep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningDeep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningRenārs Liepiņš
 
CVML2011: human action recognition (Ivan Laptev)
CVML2011: human action recognition (Ivan Laptev)CVML2011: human action recognition (Ivan Laptev)
CVML2011: human action recognition (Ivan Laptev)zukun
 
Learning to Find and Match Interest Points
Learning to Find and Match Interest PointsLearning to Find and Match Interest Points
Learning to Find and Match Interest PointsNAVER Engineering
 
Fake It While We Make It (Data-Driven Prototyping)
Fake It While We Make It (Data-Driven Prototyping)Fake It While We Make It (Data-Driven Prototyping)
Fake It While We Make It (Data-Driven Prototyping)Ryan LaBouve
 
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...Spot the Dog: An overview of semantic retrieval of unannotated images in the ...
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...Jonathon Hare
 
image processing image processing image processing
image processing  image processing  image processingimage processing  image processing  image processing
image processing image processing image processingSportsAcademy1
 

Similar a ICVSS2011 Selected Presentations (20)

IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
 
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
Iccv2009 recognition and learning object categories p2 c03 - objects and an...
Iccv2009 recognition and learning object categories   p2 c03 - objects and an...Iccv2009 recognition and learning object categories   p2 c03 - objects and an...
Iccv2009 recognition and learning object categories p2 c03 - objects and an...
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Computer Vision - cameras
Computer Vision - camerasComputer Vision - cameras
Computer Vision - cameras
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Visual search
Visual searchVisual search
Visual search
 
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2
 
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
 
Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)
 
Blind Verification of Digital Image Originality: A Statistical Approach
Blind Verification of Digital Image Originality: A Statistical ApproachBlind Verification of Digital Image Originality: A Statistical Approach
Blind Verification of Digital Image Originality: A Statistical Approach
 
rips-hk-lenovo (1)
rips-hk-lenovo (1)rips-hk-lenovo (1)
rips-hk-lenovo (1)
 
Deep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningDeep Learning and Reinforcement Learning
Deep Learning and Reinforcement Learning
 
CVML2011: human action recognition (Ivan Laptev)
CVML2011: human action recognition (Ivan Laptev)CVML2011: human action recognition (Ivan Laptev)
CVML2011: human action recognition (Ivan Laptev)
 
Learning to Find and Match Interest Points
Learning to Find and Match Interest PointsLearning to Find and Match Interest Points
Learning to Find and Match Interest Points
 
Fake It While We Make It (Data-Driven Prototyping)
Fake It While We Make It (Data-Driven Prototyping)Fake It While We Make It (Data-Driven Prototyping)
Fake It While We Make It (Data-Driven Prototyping)
 
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...Spot the Dog: An overview of semantic retrieval of unannotated images in the ...
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...
 
image processing image processing image processing
image processing  image processing  image processingimage processing  image processing  image processing
image processing image processing image processing
 

Último

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Último (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

ICVSS2011 Selected Presentations

  • 1. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg ICVSS 2011: Selected Presentations Angel Cruz and Andrea Rueda BioIngenium Research Group, Universidad Nacional de Colombia August 25, 2011 Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  • 2. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg Outline 1 ICVSS 2011 2 A Trillion Photos - Steven Seitz 3 Efficient Novel Class Recognition and Search - Lorenzo Torresani 4 The Life of Structured Learned Dictionaries - Guillermo Sapiro 5 Image Rearrangement & Video Synopsis - Shmuel Peleg Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  • 3. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg Outline 1 ICVSS 2011 2 A Trillion Photos - Steven Seitz 3 Efficient Novel Class Recognition and Search - Lorenzo Torresani 4 The Life of Structured Learned Dictionaries - Guillermo Sapiro 5 Image Rearrangement & Video Synopsis - Shmuel Peleg Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  • 4. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg ICVSS 2011 International Computer Vision Summer School 15 speakers, from USA, France, UK, Italy, Prague and Israel Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  • 5. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg ICVSS 2011 International Computer Vision Summer School Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  • 6. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg ICVSS 2011 International Computer Vision Summer School Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  • 7. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg Outline 1 ICVSS 2011 2 A Trillion Photos - Steven Seitz 3 Efficient Novel Class Recognition and Search - Lorenzo Torresani 4 The Life of Structured Learned Dictionaries - Guillermo Sapiro 5 Image Rearrangement & Video Synopsis - Shmuel Peleg Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  • 8. A Trillion Photos Steve Seitz University of Washington Google Sicily Computer Vision Summer School July 11, 2011
  • 9. Facebook >3 billion uploaded each month ~ trillion photos taken each year
  • 10. What do you do with a trillion photos? Digital Shoebox (hard drives, iphoto, facebook...)
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. ?
  • 18. Comparing images Detect features using SIFT [Lowe, IJCV 2004]
  • 19. Comparing images Extraordinarily robust image matching – Across viewpoint (~60 degree out-of-plane rotations) – Varying illumination – Real-time implementations
  • 20. Edges
  • 21. Scale Invariant Feature Transform 0 2π angle histogram Adapted from slide by David Lowe
  • 22. NASA Mars Rover images
  • 23. NASA Mars Rover images with SIFT feature matches Figure by Noah Snavely
  • 24.
  • 25. Coliseum (outside) St. Peters (inside) Coliseum St. Peters (outside) (inside) Il Vittoriano Trevi Fountain Forum
  • 26. Structure from motion Matched photos 3D structure
  • 27. Structure from motion aka “bundle adjustment” (texts: Zisserman; Faugeras) p4 p1 p3 minimize p2 f (R, T, P) p5 p7 p6 Camera 1 Camera 3 R1,t1 Camera 2 R3,t3 R2,t2
  • 28. ?
  • 29. Reconstructing Rome In a day... From ~1M images Using ~1000 cores Sameer Agarwal, Noah Snavely, Rick Szeliski, Steve Seitz http://grail.cs.washington.edu/rome
  • 35. From Sparse to Dense Sparse output from the SfM system
  • 36. From Sparse to Dense Furukawa, Curless, Seitz, Szeliski, CVPR 2010
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. Most of our photos don’t look like this
  • 42.
  • 44. Your Life in 30 Seconds path optimization
  • 45. Picasa Integration • As “Face Movies” feature in v3.8 – Rahul Garg, Ira Kemelmacher
  • 46. Conclusion trillions of photos + computer vision breakthroughs = new ways to see the world
  • 47. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg Outline 1 ICVSS 2011 2 A Trillion Photos - Steven Seitz 3 Efficient Novel Class Recognition and Search - Lorenzo Torresani 4 The Life of Structured Learned Dictionaries - Guillermo Sapiro 5 Image Rearrangement & Video Synopsis - Shmuel Peleg Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  • 48. Efficient Novel-Class Recognition and Search Lorenzo Torresani
  • 49. Problem statement: novel object-class search • Given: image database user-provided images (e.g., 1 million photos) of an object class + • Want: database • no text/tags available images • query images may of this class represent a novel class
  • 50. Application: Web-powered visual search in unlabeled personal photos Goal: Find “soccer camp” pictures on my computer 1 1 Search the Web for images of “soccer camp” 2 Find images of this visual class on my computer 2
  • 51. Application: product search • Search of aesthetic products
  • 52. RBM predictedpredicted labels (47%) RBM labels (47%) Relation to other tasks sky sky building building tree bed tree bed car car novel class road road Input search Ground truth neighbors image image Input Ground truth neighbors 32−RBM 32−RBM 16384-gist 1 query retrieved image retrieval object categorizationshowingitperce Figure 6. 6. Curves showing per Figure Curves query images that make it int query images that make into ofof the query for 1400 image the query for a a 1400 imag to 5% of the database size. upup to 5% of the database siz analogies: RBM predictedpredicted labels (56%) RBM labels (56%) crucial for scalable retrieval th crucial for scalable retrieval - large databases tree from [Nister and Stewenius, ’07] tree sky sky database make it it to the very database make to the very to is is feasible only for a tiny f feasible only for a tiny fra - efficient indexing database grows large. Hence, w database grows large. Hence, building building the curves meet the y-axis. T the curves meet the y-axis. - compact representation (a) car car given in in Table 1 for larger n given Table 1 for a a larger sidewalk sidewalkcrosswalkcrosswalk conclusions can bebe drawn from conclusions can drawn from road road improves retrieval performance improves retrieval performan differences: from neighbors et al., ’07] performance than vocabularies.1 performance than 2 -norm. En L L2 -norm. Input image imageGround truth [Philbinneighbors 32−RBM 32−RBM vocabularies. O Input least for smaller 16384-gist - simple notions of visual Ground truth least for smaller gives much better performance th gives much better performance (b) relevancy is is setting T. setting T. (e.g., near-duplicate, same object instance, settings used by [17]. settings used by [17]. The performance with vav The performance with same spatial layout) (c) RBM predictedpredicted labels (63%) [Torralba et al., ’08] RBM labels (63%) from on the full 6376 image databa on the full 6376 image data the scores decrease with inc the scores decrease with in ceiling ceiling are more images toto confus are more images confuse Figure Thewall retrieval performance is is evaluated using a large wall performance evaluated using a large Figure 5. 5. The retrieval ofof the vocabulary tree is sh the vocabulary tree is show ground truth database (6376 images) with groups ofof four images ground truth database (6376 images) with groups four images door door defining the vocabulary tree defining the vocabulary tre poster poster
  • 53. Relation to other tasks novel class search image retrieval object classification analogies: analogies: - large databases - recognition of object - efficient indexing classes from a few examples - compact representation differences: differences: - classes to recognize are - simple notions of visual defined a priori relevancy - training and recognition (e.g., near-duplicate, time is unimportant same object instance, - storage of features is not an same spatial layout) issue
  • 54. Technical requirements of novel class-search • The object classifier must be learned on the fly from few examples • Recognition in the database must have low computational cost • Image descriptors must be compact to allow storage in memory
  • 55. State-of-the-art in object classification Winning recipe: many features + non-linear classifiers (e.g. [Gehler and Nowozin, CVPR’09]) non-linear !"#$% decision boundary !"#$%&#'()* +&,-)&.&#(#/* ... 01#-2"#* &'()*+),%% -'.,()*+/% #"0$%
  • 56. Model evaluation on Caltech256 45 40 gist 35 phog phog2pi 30 accuracy (%) ssim 25 bow5000 20 !"#$%&'()*$+' 15 , 10 '"#*"-"*.%+'/$%0.&$1 5 0 0 5 10 15 20 25 30 number of training examples
  • 57. Model evaluation on Caltech256 45 40 gist phog 35 phog2pi 30 ssim accuracy (%) bow5000 !"#$%&'()*$+', 25 linear combination /$%0.&$'2)(3"#%4)# 20 !"#$%&'()*$+' 15 , 10 '"#*"-"*.%+'/$%0.&$1 5 0 0 5 10 15 20 25 30 number of training examples
  • 58. Model evaluation on Caltech256 5)#6+"#$%&'()*$+', 45 /$%0.&$'2)(3"#%4)#' 40 7%898%8':.+4;+$'<$&#$+' gist !$%&#"#=>' 35 phog ?@$A+$&'B'5)C)D"#E'FGH phog2pi 30 accuracy (%) ssim 25 bow5000 !"#$%&'()*$+', linear combination /$%0.&$'2)(3"#%4)# 20 nonlinear combination !"#$%&'()*$+' 15 , 10 '"#*"-"*.%+'/$%0.&$1 5 0 0 5 10 15 20 25 30 number of training examples
  • 59. Multiple kernel combiners Classification output is obtained by combining many features via non-linear kernels: F N h(x) = βf kf (x, xn )αn + b f =1 n=1 sum over features sum over training examples !#$% ... where '()*+),%% -'.,()*+/% #0$%
  • 60. m=1 s. For a kernel function k between a SVM. he short-hand notation Training Same as for averaging. = k(fm (x), fm (x )), Multiple con- 4. Methods: Multiple Kernel Learning kernel learning (MKL) nel km : X × X → R only espect to image feature fal., 2004; Sonnenburg etapproach toVarma and Ray, 2007] is to [Bach et m . If the Another al., 2006; perform kernel selection to a certain aspect, say, it only con- a kernel combination during the training phase of th gorithm. jointly optimizing over Learning a non-linear SVM by One prominent instance of this class is MKL on, then the kernel measures simi- F a linear combinati to this aspect. The subscript m of nderstood as a linear combinationobjective ∗ (x, x ) k=(x, x ) =β over(x,fx ) x ) the par 1. indexing into the set of kernels k is to optimize jointly of kernels: ∗ F β k (x, km f and m m=1 f =1 2. the SVM parameters: α ∈ RN and b ∈ R of an SVM. ters notational convenience, we will de- MKL was originally introduced in [1]. For efficiency e of the m’th feature for a given   F in order N obtain sparse, F to interpretable coefficients, F raining samples xi , i = 1, 1 . . . , N min βf αT Kf α stricts βm ≥ 0 and ,imposes thefconstraintT α βm + C L yn b + β Kf (xn ) m=1 α,β,b 2 Since the scope of this paper is to access the applicab f =1 n=1 f =1 of MKL to feature combination rather than its optimiz ), km (x, x2 ), . . . , km (x, xN )]T . F part we opted to present the MKL formulations in a wa aining sample, i.e. x = xi , then = 1,lowing for easier 1, . . . , F subject to βf βf ≥ 0, f = comparison with the other methods h column of the m’th kernel matrix.f =1 write its objective function as F ernel selection In this papert) = max(0, 1 − yt) 1 where L(y, we min βm αT Km α classifiers that aim to combine sev- 2 m=1 Kf (x) = [kf (x, x1 ), kf (x, x2 ), . . . , kf (x, xN )]T α,β,b e model. Since we associate image N F ctions, kernel combination/selection +C L(yi , b + βm Km (x)T α)
  • 61. LP-β: a two-stage approach to MKL ! [Gehler and Nowozin, 2009] • Classification output of traditional MKL: F N hM KL (x) = βf kf (x, xn )αn + b f =1 n=1 • Classification function of LP-β: F N h(x) = βf kf (x, xn )αf n + bf f =1 n=1 hf (x) Two-stage training procedure: 1. train each hf (x) independently → traditional SVM learning 2. optimize over β → a simple linear program
  • 62. LP-β for novel-class search? The LP-β classifier: F N h(x) = βf kf (x, xn )αf n + bf f =1 n=1 sum over features sum over training examples Unsuitable for our needs due to: • large storage requirements (typically over 20K bytes/image) • costly evaluation (requires query-time kernel distance computation for each test image) • costly training (1+ minute for O(10) training examples)
  • 63. Classemes: a compact descriptor for efficient recognition [Torresani et al., 2010] ! Key-idea: represent each image x in terms of its “closeness” to a set of basis classes (“classemes”) x Φ(x) = [φ1 (x), . . . , φC (x)]T F N φc (x) = hclassemec (x) = c βf kf (x, xc )αn + bc n c f =1 n=1 output of a pre-learned LP-β for the c-th basis class Φ(x1 ) ... Φ(xN ) Query-time learning: training examples of train a linear classifier on Φ(x) novel  class  C F N g duck (Φ(x); wduck ) = Φ(x)T wduck = wc  duck c βf kf (x, xc )αn + bc  n c c=1 f =1 n=1 LP-β trained before the trained at query-time creation of the database
  • 64. How this works... Efficient Object Category Recognition Using Classemes 777 • Accurate weighted classemes. Five classemes with the highest LP-β weights Table 1. Highly semantic labels are not required... to •make semantic sense, but it should bejust used that detectors may create for the retrieval experiment, for a selection of Caltech 256 categories. Somefor appear Classeme classifiers are emphasized as our goal is simply to specific patterns of texture, color, shape, etc. a useful feature vector, not to assign semantic labels. The somewhat peculiar classeme labels reflect the ontology used as a source of base categories. !#$%'()*+$ ,-(./+$#-(.'0$%/1121$ %)#3)+4.'$ !#$% '()*%'+%*,-. -,.+(,/ -)##-%01# $2330/+(,/ 05%6$ 1)$1*+(#,/ 1)45+)3+6,%* '60$$* 6,#.0/7 '%*,07!% 12##+$,#+!*4+ /6$ 3072*+'.,%* -,%%# 7*,8'0% 4,4+1)45 ,/0$,# 7*-13$ 6,%*-*,3%+'2*3,- '-'0+-,1# ,#,*$+-#)-. !0/42 '*80/7+%*,5 6'%*/+!$0'(!*+ '*-/)3-'4898$ -)/89+%!0/7 $0/4+,*, -4(#,5* *),'%0/7+(,/ (*')/ %,.0/7+-,*+)3+ -)/%,0/*+(*''2*+ #./3**)#$ 1,77,7+()*%* -,/)(5+-#)'2*+)(/ *)60/7+'!## ')$%!0/7 1,**0* Large-scale recognition benefits from a compact descriptor for each image, for example allowing databases to be stored in memory rather than on disk. The
  • 65. bject Classes by Between-Class Attribute Transfer Hannes Nickisch Stefan Harmeling Related work or Biological Cybernetics, T¨ bingen, Germany u me.lastname}@tuebingen.mpg.de • otter when train- Attribute-based recognition: black: white: yes no brown: yes examples of stripes: no hardly been water: yes [Lampert et al., CVPR’09] [Farhadi et al., CVPR’09] eats fish: yes rule rather ens of thou- polar bear black: no very few of white: yes d annotated brown: no stripes: no water: yes introducing eats fish: yes ct detection zebra ption of the black: yes description white: yes requires hand-specified attribute-class associations brown: no hape, color s. On the left h properties stripes: water: yes no ribute be hey can predic- eats fish: no to displayed. attribute classifiers must be trained with arethe cur- Figure 1. A description object categories: after learningthe transfer by high-level attributes allows ected based of knowledge between the visual ed for a new cat- human-labeled examples ve across appearance of attributes from any classes with training examples, and to “engine”,can detect also object classes that do not have any training ike facil- we based on which attribute description a test image fits best. randomly selected positively pre new large- images, Figure 5: This figure shows election helps 30,000 an- tributes for 12 typical images from 12 categories in Yahoo set. nd “rein” that of well-labeled training imageslearnedtechniques rson’s clas- lions and is likely out of classifiers are numerous on Pascal train set and tested on Yahoo se reach for years to come. Therefore, emantic at- one class outreducing the number of necessary training imagesattributes from the list of 64 attributes a for domly select 5 predicted have
  • 66. Method overview 1. Classeme learning φ”body of water” (x) → ... φ”walking” (x) → 2. Using the classemes for recognition and retrieval training examples of novel class C g duck (Φ(x)) = wc φc (x) duck c=1 Φ(x1 ) ... Φ(xN )
  • 67. Classeme learning: choosing the basis classes • Classeme labels desiderata: - must be visual concepts - should span the entire space of visual classes • Our selection: concepts defined in the Large Scale Ontology for Multimedia [LSCOM] to be “useful, observable and feasible for automatic detection”. 2659 classeme labels, after manual elimination of plurals, near-duplicates, and inappropriate concepts
  • 68. Classeme learning: gathering the training data • We downloaded the top 150 images returned by Bing Images for each classeme label • For each of the 2659 classemes, a one-versus-the-rest training set was formed to learn a binary classifier φ”walking” (x) yes no
  • 69. Classeme learning: training the classifiers • Each classeme classifier is an LP-β kernel combiner [Gehler and Nowozin, 2009]: F N φ(x) = βf kf (x, xn )αf,n + bf f =1 n=1 linear combination of feature-specific SVMs • We use 13 kernels based on spatial pyramid histograms computed from the following features: - color GIST [Oliva and Torralba, 2001] - oriented gradients [Dalal and Triggs, 2009] - self-similarity descriptors [Schechtman and Irani, 2007] - SIFT [Lowe, 2004]
  • 70. A dimensionality reduction   view of classemes     GIST            self-similarity  descriptor Φ  φ1 (x) ...  x=      φ2659 (x)   oriented     gradients     • near state-of-the-art accuracy SIFT with linear classifiers • can be quantized down to • non-linear kernels are needed 200 bytes/image with almost for good classification no recognition loss • 23K bytes/image
  • 71. Experiment 1: multiclass recognition on Caltech256 60 LP-β in [Gehler LPbeta Nowozin, 2009] LPbeta13 using 39 kernels 50 MKL Csvm LP-β with our x Cq1svm 40 Xsvm our approach: linear SVM with accuracy (%) classemes Φ(x) 30 linear SVM with binarized classemes, 20 i.e. (Φ(x) 0) linear SVM with x 10 0 0 10 20 30 40 50 number of training examples
  • 72. Computational cost comparison Training time Testing time 1500 40 23 hours 30 time (minutes) 1000 time (ms) 20 500 9 minutes 10 0 0 LPbeta Csvm LPbeta Csvm
  • 73. Accuracy vs. compactness 4 10 188 bytes/image compactness (images per MB) 3 10 2.5K bytes/image 2 10 LPbeta13 23K bytes/image 1 Csvm 10 Cq1svm nbnn [Boiman et al., 2008] 128K bytes/image emk [Bo and Sminchisescu, 2008] Xsvm 0 10 10 15 20 25 30 35 40 45 accuracy (%) Lines link performance at 15 and 30 training examples
  • 74. Experiment 2: object class retrieval Efficient Object Category Recognition Using Classemes 787 30 Csvm Cq1Rocchio (β=1, γ=0) 25 Cq1Rocchio (β=0.75, γ=0.15) Precision @ 25 25 Bowsvm Precision (%) @ 20 BowRocchio (β=1, γ=0) BowRocchio (β=0.75, γ=0.15) 15 • Random performance is 0.4% 10 • training Csvm takes 0.6 sec with 5*256 training examples 5 0 0 10 20 30 40 50 Number of training images Fig. 4. Retrieval. Percentage of the top 25 in a 6400-document set which match the query class. Random performance is 0.4%.
  • 75. Analogies with text retrieval • Classeme representation of an image: presence/absence of visual attributes • Bag-of-word representation of a text-document: presence/absence of words
  • 76. Related work • Prior work (e.g., [Sivic Zisserman, 2003; Nister Stewenius, 2006; Philbin et al., 2007]) has exploited a similar analogy for object-instance retrieval by representing images as bag of visual words Detect interest patches Compute SIFT descriptors [Lowe, 2004] … … Quantize Represent image as a sparse descriptors histogram of visual words frequency ….. codewords • To extend this methodology to object-class retrieval we need: - to use a representation more suited to object class recognition (e.g. classemes as opposed to bag of visual words) - to train the ranking/retrieval function for every new query-class
  • 77. Data structures for efficient retrieval Incidence matrix: Inverted index: features f0 f1 f2 f3 f4 f5 f6 f7 f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I0 I2 I0 I2 I1 I0 I4 I6 documents I2: 1 1 0 1 0 0 0 0 I2 I7 I1 I3 I4 I6 I5 I9 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I3 I8 I3 I9 I5 I8 I5: 0 0 0 0 1 0 1 0 I4 I7 I9 I6: 1 0 0 0 0 1 0 1 I6 I9 I7: 0 1 0 0 1 0 0 0 I8 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 • enables efficient calculation of w Φ, as: T ∀Φ • very compact: only one bit per feature entry wi Φi i s.t. Φi =0
  • 78. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8 Goal: compute score w T Φ, for all binary vectors Φ in the database ∀Φ
  • 79. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8 Scoring: I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
  • 80. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8 Scoring: I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
  • 81. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8 Scoring: I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
  • 82. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8 Scoring: I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
  • 83. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8 Scoring: I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
  • 84. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8 Cost of scoring is linear in the sum of the lengths of inverted lists associated to non-zero weights
  • 85. Improve efficiency via sparse weight vectors Key-idea: force w to contain as many zeros as possible classeme vector label of Learning objective of example n Tomographic inversion with example n 1 wavelet penalization 3 N E(w) = R(w) + C N n=1 L(w; Φn , yn ) w2 regularizer loss function w with d = AWT w and smallest 1 -norm • T L2-SVM: R(w) d =wT w w and smallestn ,2yn ) = max(0, 1 − yn (wT Φn )) w with = AW , L(w; Φ -norm d = AWT w • 2 Since |wi | wi for small wi w 2 w 2i |wi | and |wi | wi for large wi , w1 2 choosing R(w) = i |wi | will tend to |w| produce a small number of larger wi weights and 2 -ball: wzero2 weights more 1 + w2 = constant 2 w 1 -ball: |w1 | + |w2 | = constant
  • 86. Improve efficiency via sparse weight vectors Key-idea: force w to contain as many zeros as possible classeme vector label of Learning objective of example n example n N E(w) = R(w) + C N n=1 L(w; Φn , yn ) regularizer loss function • L2-SVM: R(w) = wT w , L(w; Φn , yn ) = max(0, 1 − yn (wT Φn )) • L1-LR: R(w) = i |wi | , L(w; Φn , yn ) = log(1 + exp(−yn wT Φn )) • FGM (Feature Generating Machine) [Tan et al., 2010]: R(w) = wT w , L(w; Φn , yn ) = max(0, 1 − yn (w ⊙ d)T Φn ) s.t. 1T d ≤ B d ∈ {0, 1}D elementwise product
  • 87. Performance evaluation on ImageNet (10M images) 35 ! [Rastegari et al., 2011] 35 Full inner product evaluation L2 SVM 30 Full inner product evaluation L1 LR 30 Inverted index L2 SVM Precision @ 10 (%) 25 Inverted index L1 LR Precision @ 10 (%) 25 20 20 • Performance averaged over 400 object 15 classes used as queries 15 • 10 training examples per query class 10 10 • Database includes 450 images of the query class and 9.7M images of other classes 5 5 • Prec@10 of a random classifiers is 0.005% 0 20 40 60 80 100 120 140 Search time per query (seconds) 0 20 40 60 80 100 120 140 Each curve is obtained by varying sparsity through C in training objective Search time per query (seconds) N E(w) = R(w) + C N n=1 L(w; Φn , yn ) regularizer loss function
  • 88. Top-k ranking • Do we need to rank the entire database? - users only care about the top-ranked images • Key idea: - for each image iteratively update an upper-bound and a lower-bound on the score - gradually prune images that cannot rank in the top-k
  • 89. Top-k pruning ! [Rastegari et al., 2011] w: [ 3 -2 0 -6 0 3 -2 0 ] • Highest possible score: for binary vector ΦU s.t. f0 I0: 1 f1 0 f2 1 f3 0 f4 0 f5 1 f6 0 f7 0 ΦU = 1 iff wi 0 i I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 → initial upper bound I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 u∗ = wT · ΦU (6 in this case) I5: 0 0 0 0 1 0 1 0 I6: 1 0 0 0 0 1 0 1 I7: 0 I8: 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 • Lowest possible score: I9: 0 0 0 1 1 1 0 1 for binary vector ΦL s.t. ΦL = 1 iff wi 0 i → initial lower bound l∗ = wT · ΦL (-10 in this case)
  • 90. Top-k pruning ! [Rastegari et al., 2011] w: [ 3 -2 0 -6 0 3 -2 0 ] • Initialization: u∗ , l∗ for all images upper bound f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I5: 0 0 0 0 1 0 1 0 0 I6: 1 0 0 0 0 1 0 1 I7: 0 1 0 0 1 0 0 0 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 lower bound
  • 91. Top-k pruning ! [Rastegari et al., 2011] w: [ 3 -2 0 -6 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 0 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I5: 0 0 0 0 1 0 1 0 I6: 1 0 0 0 0 1 0 1 I7: 0 1 0 0 1 0 0 0 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 • Load feature i • Since wi = +3 (0), for each image n: - subtract +3 from the upper bound if φn,i = 0 - add +3 to the lower bound if φn,i = 1
  • 92. Top-k pruning ! [Rastegari et al., 2011] w: [ 3 -2 0 -6 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 0 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I5: 0 0 0 0 1 0 1 0 I6: 1 0 0 0 0 1 0 1 I7: 0 1 0 0 1 0 0 0 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 • Load feature i • Since wi = -2 (0), for each image n: - decrement by 2 the upper bound if φn,i = 1 - increment by 2 the lower bound if φn,i = 0
  • 93. Top-k pruning ! [Rastegari et al., 2011] w: [ 3 -2 0 -6 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 0 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I5: 0 0 0 0 1 0 1 0 I6: 1 0 0 0 0 1 0 1 I7: 0 1 0 0 1 0 0 0 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 • Load feature i • Since wi = -6 (0), for each image n: - decrement by 6 the upper bound if φn,i = 1 - increment by 6 the lower bound if φn,i = 0
  • 94. Top-k pruning ! [Rastegari et al., 2011] w: [ 3 -2 0 -6 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 0 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I5: 0 0 0 0 1 0 1 0 I6: 1 0 0 0 0 1 0 1 I7: 0 1 0 0 1 0 0 0 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 • Suppose k = 4: we can prune I2,I9 since they cannot rank in the top-k
  • 95. Distribution of weights and pruning rate CCV CV IC 1745 745 # #1 ICCV 2011 Submission #1745. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE. ICCV 2011 Submission #1745. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE. 540 40 11 100 100 L1−LR L1−LR Distribution absolute weight values Distribution of absolute weight values 41 541 normalized of absolute weight values 42 542 L2−SVM L2−SVM 43 543 0.8 0.8 FGM FGM 80 80 % of images pruned % of images pruned 44 544 TkP L1−LR, k=10 TkP L1−LR, k=10 45 545 TkP L1−LR, k=3000 TkP L1−LR, k=3000 0.6 0.6 60 60 46 546 TkP L2−SVM, k=10 TkP L2−SVM, k=10 47 547 TkP L2−SVM, k=3000 TkP L2−SVM, k=3000 48 548 0.4 0.4 40 40 TkP FGM, k=10 TkP FGM, k=10 49 549 TkP FGM, k=3000 TkP FGM, k=3000 50 550 0.2 0.2 20 20 51 551 52 552 53 553 00 00 54 554 aa 00 500 500 1000 1000 1500 1500 Dimension 2000 2000 2500 2500 bb 00 500 500 1000 1000 1500 1500 2000 2000 Number ofof iterations (d) iterations (d) 2500 2500 Dimension Number 55 555 56 556 Figure 2. (a) Distribution of weight absolute values for different classifiers (after sorting the weight magnitudes). TkP runs faster with Figure 2. (a) Distribution of weight absolute values for different classifiers (after sorting the weight magnitudes). TkP runs faster with 57 557 Features considered in descending order of |wi | sparse, highly skewed weight values. (b) Pruning rate of TkP for various classification model and different values ofof k (k = 10, 3000). sparse, highly skewed weight values. (b) Pruning rate of TkP for various classification model and different values k (k = 10, 3000). 58 558 59 559 60 560 aa smaller value of kk allows the method to eliminate more smaller value of allows the method to eliminate more 61 images from consideration at aavery early stage. 20 20 v=128 561 images from consideration at very early stage. v=128 8 v=256 v=256 62 w=2 8 v=256 v=256 w=28 8 562 w=2 6 v=64 v=64 w=2 6 w=2 w=2 63
  • 96. Performance evaluation on 35 ImageNet (10M images) 30 35 ! [Rastegari et al., 2011] Precision @ 10 (%) 25 30 TkP L1−LR 20 TkP L2−SVM Inverted index L1−LR Precision @ 10 (%) 25 15 Inverted index L2−SVM 20 10 • k = 10 15 • Performance averaged over 400 object 5 classes used as queries 10 • 10 training examples per query class 0 0 50 • 100 150 Database includes 450 images of the query 5 Search time per query (seconds) and 9.7M images of other classes class • Prec@10 of a random classifiers is 0.005% 0 0 50 100 150 Search time per query (seconds) Each curve is obtained by varying sparsity through C in training objective N E(w) = R(w) + C N n=1 L(w; Φn , yn ) regularizer loss function
  • 97. Alternative search strategy: approximate ranking • Key-idea: approximate the score function with a measure that can computed (more) efficiently (related to approximate NN search: [Shakhnarovich et al., 2006; Grauman and Darrell, 2007; Chum et al., 2008]) • Approximate ranking via vector quantization: wT Φ ≈ wT q(Φ) ! q(!) where q(.) is a quantizer returning the cluster centroid nearest to Φ • Problem: - to approximate well the score we need a fine quantization - the dimensionality of our space is D=2659: too large to enable a fine quantization using k-means clustering
  • 98. Product quantization ! Product quantization for nearest neighbor search [Jegou et al., 2011] • Split feature vector ! into v subvectors: ! [ !1 | !2 | ... | !v ] Vector split into m subvectors: • Subvectors are quantized separately by quantizers Subvectors are quantized separately by quantizers q(!) = [ q1(!1) | q2(!2) | ... | qv(!v) ] where each qi(.) is learned in a space of dimensionality D/v where each is learned by k-means with a limited number of centroids • Example from [Jegou vector split in 8 subvectors of dimension 16 Example: y = 128-dim et al., 2011]: ! is a 128-dimensional vector split into 8 subvectors of dimension 16 16 components 16 components y1 y2 y3 y4 y5 y6 y7 y8 !1 !2 !3 !4 !5 !6 !7 !8 xedni noitazitnauq tib-46 stib 8 256 ) 1 y( 1 q q ) 2 y( 2 q q2 ) 3 y( 3 q q3 )4y(4q q4 )5y(5q q5 )6y(6q q6 )7y(7q )8y(8q q7 q8 28 = 256 centroids 1 centroids q1 q2 1 q3 1 q4 1 q5 q6 q7 q8 sdiortnec 1q 2q 3q 4q 5q 6q 7q 8q 652 q1(y1) q2(y2) q3(y3) q4(y4) q5(y5) q6(y6) q7(y7) q8(y8) q1(!1) q2(!2) q3(!3) q4(!4) 1 1y 1 1 1 1 2y 1 3y 4y 5y q5(!5) q6(!6) q7(!7) q8(!8) 6y 7y 8y 8 bits stnenopmoc 61 64-bit quantization index 8 bits 64-bit quantization index 61 noisnemid fo srotcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE hcae erehw sdiortnec fo rebmun detimil a htiw snaem-k yb denrael si
  • 99. obhgien tseraen rof noitazitnauq tcudorP :srotcevbus m otni tilps rotceV wv  . .   .  tnauq yb yletarapes dezitnauq era srotcevbuS   w2   sub-blocks w1   htiw snaem-k yb denrael si centroids (r per sub-block) hcae erehw 1.Filling the look-up table: tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE look-up table can be precomputed and stored in a stnenopmoc 61 j=1 5y 4y 3y 2y T 1y wj qj (Φj ) wT Φ ≈ wT q(Φ) = v 652 5q 4q 3q Efficient approximate scoring 2q 1q sdiortnec y(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q stib 8 xedni noitazitnauq tib-46
  • 100. obhgien tseraen rof noitazitnauq tcudorP :srotcevbus m otni tilps rotceV wv  . .   .  tnauq yb yletarapes dezitnauq era srotcevbuS   w2   sub-blocks s11 w1 in  ner product  quantization for sub-block 1: htiw snaem-k yb denrael si centroids (r per sub-block) hcae erehw 1.Filling the look-up table: tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE look-up table can be precomputed and stored in a stnenopmoc 61 j=1 5y 4y 3y 2y T 1y wj qj (Φj ) wT Φ ≈ wT q(Φ) = v 652 5q 4q 3q Efficient approximate scoring 2q 1q sdiortnec y(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q stib 8 xedni noitazitnauq tib-46
  • 101. obhgien tseraen rof noitazitnauq tcudorP :srotcevbus m otni tilps rotceV wv  . .   .  tnauq yb yletarapes dezitnauq era srotcevbuS   w2   sub-blocks uct s11 s12 prod w1 inner   quantization for sub-block 1: htiw snaem-k yb denrael si centroids (r per sub-block) hcae erehw 1.Filling the look-up table: tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE look-up table can be precomputed and stored in a stnenopmoc 61 j=1 5y 4y 3y 2y T 1y wj qj (Φj ) wT Φ ≈ wT q(Φ) = v 652 5q 4q 3q Efficient approximate scoring 2q 1q sdiortnec y(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q stib 8 xedni noitazitnauq tib-46
  • 102. obhgien tseraen rof noitazitnauq tcudorP :srotcevbus m otni tilps rotceV wv  . .   .  tnauq yb yletarapes dezitnauq era srotcevbuS   w2   sub-blocks duct s11 s12 s13 ... ... ... ... ... ... s1r r pro i w1 nne  quantization for sub-block 1: htiw snaem-k yb denrael si centroids (r per sub-block) hcae erehw 1.Filling the look-up table: tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE look-up table can be precomputed and stored in a stnenopmoc 61 j=1 5y 4y 3y 2y T 1y wj qj (Φj ) wT Φ ≈ wT q(Φ) = v 652 5q 4q 3q Efficient approximate scoring 2q 1q sdiortnec y(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q stib 8 xedni noitazitnauq tib-46
  • 103. obhgien tseraen rof noitazitnauq tcudorP :srotcevbus m otni tilps rotceV wv  . .   .  tnauq yb yletarapes dezitnauq era srotcevbuS   w2   s21 in sub-blocks ner prod uct w1 s11 s12 s13 ... ... ... ... ... ... s1r   quantization for sub-block 2: htiw snaem-k yb denrael si centroids (r per sub-block) hcae erehw 1.Filling the look-up table: tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE look-up table can be precomputed and stored in a stnenopmoc 61 j=1 5y 4y 3y 2y T 1y wj qj (Φj ) wT Φ ≈ wT q(Φ) = v 652 5q 4q 3q Efficient approximate scoring 2q 1q sdiortnec y(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q stib 8 xedni noitazitnauq tib-46
  • 104. xedni noitazitnauq tib-46 stib 8 ) 1 y( 1 q ) 2 y( 2 q )3y(3q )4y(4q y(5q Efficient approximate scoringsdiortnec 652 1q 2q 3q 4q 5q v wT Φ ≈ wT q(Φ) = wj qj (Φj ) T 1y 2y 3y 4y 5y j=1 stnenopmoc 61 can be precomputed and stored in a look-up table tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE 2.Score each quantized vector q(Φ) in the database using the look-up hcae erehw centroids (r per sub-block) htiw snaem-k yb denrael si table: s1r s11 s12 s13 ... ... ... ... ... ... sub-blocks s21 s22 s23 ... ... ... ... ... ... s2r w q(Φ) = w1 q1 (Φ1 ) + w2 q2 (Φ2 ) + . . . + wv qv... ... ) ... T T T T (Φv tnauq yb yletarapes dezitnauq era srotcevbuS... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... T q(Φ) = w1 q1 (Φ1 ) + w2 q2 (Φ2 ) + . . . + wv qv (Φv ) T T T ... ... ... :srotcevbus m otni tilps rotceV ... ... ... ... ... ... ... sv1 sv2 sv3 ... ... ... ... ... ... svr Only v additions per image! obhgien tseraen rof noitazitnauq tcudorP
  • 105. Choice of parameters ! [Rastegari et al., 2011] • Dimensionality is first reduced with PCA from D=2659 to D’ D • How do we choose D’, v (number of sub-blocks), r (number of centroids per sub-block)? • Effect of parameter choices on a database of 150K images: (v,r) 20 8 8 (128,2 ) (256,2 ) 6 (256,2 ) 6 (64,2 ) 15 Precision @ 10 (%) 6 8 (64,2 ) (32,2 ) (128,28) D’=512 10 8 (16,2 ) D’=256 8 6 (32,2 ) (64,2 ) D’=128 5 (32,28) 8 (16,2 ) 8 (16,2 ) 0 0 0.05 0.1 0.15 0.2 0.25 0.3 Search time per query (seconds)