Beyond The Euclidean Distance: Creating effective visual codebooks using the histogram intersection kernel

Beyond the Euclidean distance: Creating effective visual codebooks using the histogram intersection kernel Authors: Jianxin Wu and James Rehg @Georgia Institute of Technology Presenter: Shao-Chuan Wang

Beyond the Euclidean distance Key Ideas: Use histogram intersection kernel (HIK) to create the visual codebook due to the fact that most of descriptors are histogram-based features Kernel K-means (using HIK) One-class SVM (using HIK) Conclusions: One-class SVM with HIK performs the best K-median is the compromise (comparable with HIK K-means)

Background: Bag of Visual Words Codebook construction (Find D) Clustering-based, such as k-means Assignment of descriptors to visual word (Find lpha) Pooling (sum pooling to construct histograms) ←focus of this paper Voronoi diagram Subject to some constraints

Kernel K-means (1/2) Finding the nearest centroidfrom K centroids: Updating the centroids by averaging the new assigned atoms Iteration t:

Contribution 1: fast evaluation of HIK Based on (Maji et al. 2008) and transforming R^d_+ into N^d, and the evaluation of (1) can be reduced to O(d) ->pre-compute a lookup table!

Contribution 2: Encoding via One-class SVM Example one-class SVM in 2D using Gaussian kernel: Gamma = 0.01, C=2000 Gamma = 0.1, C=2000

Contribution 2: Encoding via One-class SVM Use kernel K-means (with HIK) to create codebook of size K. Train K one-class SVM for each cluster. Assign the word according to the maximum response out of K SVM machines. :Lagrangian multiplier

Contribution 3: Comparison with K-median Codebook K-median clustering: Finding nearest centroid using L1 distance Updating the centroids by finding the median of the updated atoms. ‘Median’ is the minimizer of the following opt. problem,

Some engineering details Pyramid overlapping pooling strategy 31 subwindows => 31K dimension vector

Some engineering details Concatenation of Sobel image Pictures from Wikipedia => 31K*2=62K dimension image representation

Some engineering details SIFT for Caltech, CENTRIST for others Codebook size K = 200 Pyramid level L = 0, 1, 2 Using one-vs-one SVM for smaller dataset, using BSVM for Caltech 101 Random splitting is repeated 5 times.

Results: Caltech 101 B, not B: concatenation of Sobel or not s: grid step size of dense SIFT extraction oc_{svm}: one class SVM encoding k_{HI}: using histogram intersection kernel

Results: Scene 15 B, not B: concatenation of Sobel or not s: grid step size of dense SIFT extraction oc_{svm}: one class SVM encoding k_{HI}: using histogram intersection kernel

Conclusions HIK visual codebook improves classification accuracy. K-median is a compromise between k-means and HIK. One-class SVM encoding helps build a more compact representation Smaller step size is better?

Beyond The Euclidean Distance: Creating effective visual codebooks using the histogram intersection kernel

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (18)

Similar a Beyond The Euclidean Distance: Creating effective visual codebooks using the histogram intersection kernel

Similar a Beyond The Euclidean Distance: Creating effective visual codebooks using the histogram intersection kernel (20)

Más de Shao-Chuan Wang

Más de Shao-Chuan Wang (10)

Último

Último (20)

Beyond The Euclidean Distance: Creating effective visual codebooks using the histogram intersection kernel