Large Scale Online Learning of Image Similarity Through Ranking
1. Large Scale Online Learning of Image Similarity
Through Ranking
from G. Chechik, V. Sharma, U. Shalit, S. Bengio – JML 2010
by Lukas Tencer
2. Motivation
• Needed for applications, which compare any kind of data:
– image, video, web-page, document
• Two levels of similarity:
– Features (visual for images)
– Semantic
• Large-scale learning: limited by computational cost, not by
availability of data
• What similarity the user wants to express, visual or semantic?
• Presented approach deals with semantic similarity once we have
visual similarity
• Similarity learning requires pairwise distance, not always available
• Instead pairwise distance use relative distance, two images are
close:
– if are returned by same query
– if does have the same label
3. Example of query
• Especially problem in QVE (Query by Visual Example)
• Query:
• Images retrieved for vs. visually similar images
“mount royal park”
4. Motivation II
• Relationship to classification:
– Similarity measure could be used as metric for
classification
– Good classification infers labels, which induce
similarity across images
• Constrain on semidefinite positive
similarity matrix:
– for small data prevents overfitting
– for big data, with enough of samples could
be removed to reduce computational cost
5. Problem Statement
• Get pairwise similarity function S on given data
on relative pairs of image simlarities
• Given data P and rij r ( pi , p j ) relative
similarities
• We do not have access to all values of r, where
it is not available equals 0
• Then S ( pi , p j ) is defined as:
S ( pi , pi ) S ( pi , pi ), pi , pi , pi P, such as r ( pi , pi ) r ( pi , pi )
SW ( pi , p j ) piTWp j , whereW Rd d
6. Online Algorithm
• Passive-Aggressive family of learning
algorithms, online learning algorithm (iterative)
– PA 1:
1 2
wt 1 arg min 2
w wt , such as l ( w; ( xt , yt )) 0
w Rn
– Passive, if loss function is 0
– Aggressive, if loss is positive, enforces to satisfy
regardless of the step size l ( w; ( xt , yt )) 0
– PA2: Trade off between proximity and desired
margin – constrained optimization problem
7. Online Algorithm II
• So we are searching for S, with safety margin of 1, to
then:
SW ( pi , pi ) SW ( pi , pi ) 1
• The hinge loss function is defined as:
lW ( pi , pi , pi ) max{ 0,1 SW ( pi , pi ) SW ( pi , pi )}
LW lW ( pi , pi , pi )
( pi , pi , pi ) P
• Then the PA 2 constrained optimization problem is:
i 1 i 1 2
w arg min W W C
W 2 Fro
such that lW ( pi , pi , pi ) and 0
where C is the parameter, which controls tradeoff
between margin enforcement and proximity of solution
8. Online Algorithm III
• Loss bound could be derived by rewriting
into linear classification problem
9. Sampling strategy
• Uniformly sample pi from P
• Uniformly sample pi+ from images with same category
• Uniformly sample pi- from images which does not share
category with pi,
– pi- could be chosen by random from all images, if number of
categories and number of queries is very large
• If relevance feedback r(pi,pj) is not just binary function,
then sampling of positive examples could be changed
to prioritize samples with higher relevance
10. Image representation
• bag-of-word approach (bag-of-local-descriptors)
– get regions of interest
– calculate local descriptors
– treat them independently
• Divide image into overlapping square blocks
• Extract color and edge descriptors
– Edge: uniform Local Binary Patterns – difference of intensities
at circular neighborhood,
• 2^8 possible sequence = 256 bin histogram
• Non-uniform sequences could be merged 59 bin histogram
– Color: histograms from k-means clustering
• Train color codebook and map block pixel to closes value in codebook
– Concatenate in the end
11. Image representation II
• Aim for high dimensional sparse vector representation
• Thus representing local descriptor as visual term and
image is represented as binary vector indicating
presence/absence of visual term
• Visual terms are rated according to term frequency and
inverse document frequency
• Parameters of setup:
– 20 bins for colors
– 10000 visterm vocabulary size (approx 70 non 0 values / img)
– Blocks of 64x64 overlapping each 32 pixels
– Blocks extracted at different scales, by downscaling images by
factor of 1:25 until less then 10 block remains
12. Experiments and evaluation
• Tested in 2 settings
– Caltech256 dataset (30k images)
– Web-Scale experiment (2.7 M images)
– (another databases for image retrieval testing: MIRFLICK
1M, Corel5k, Corel30k, UCID)
• Web-Scale Experiment:
– Queries from Google Image Search and relevance feedback
– Stop condition for training is value of mean average precision (160M
iterations) ~ 4000 min on single CPU
– Evaluation Criterion: mAP and precision at top k
18. Discussion
• Metric learning could help to capture semantic relationships, once
visual similarity is available
• Relevance feedback or semantic similarity measure (class
modeling) is required to capture semantic similarity
• Compared to raw visual similarity comparison precision at top k
and mAP increases,
• recall is hard to measure for databases, which are not fully
annotated
• Online metric learning is an ongoing problem (Davis 2007) (Jain
2008) (Chechik 2010) and even though applied to images, could
be used in other fields to capture semantic similarity
• Images: object semantics vs. visual features
• Documents: topics vs. textual features (dtf,tf-idf)
• SBIR: relative object mapping vs. sketch features
19. Thank you for your attention
Available at: http://www.slideshare.net/lukastencer