Science 7 - LAND and SEA BREEZE and its Characteristics
ICCV2009: Max-Margin Ađitive Classifiers for Detection
1. Max-Margin Additive Classifiers for Detection SubhransuMaji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto, Japan
2. Accuracy vs. Evaluation Timefor SVM Classifiers Non-linear Kernel Evaluation time Linear Kernel Accuracy
3. Accuracy vs. Evaluation Timefor SVM Classifiers Non-linear Kernel Evaluation time Our CVPR 08 Linear Kernel Accuracy
4. Non-linear Kernel Additive Kernel Evaluation time Our CVPR 08 Linear Kernel Accuracy Accuracy vs. Evaluation Timefor SVM Classifiers
5. Additive Kernel Non-linear Kernel Additive Kernel Evaluation time Our CVPR 08 Linear Kernel Accuracy Accuracy vs. Evaluation Timefor SVM Classifiers
6. Accuracy vs. Evaluation Timefor SVM Classifiers Additive Kernel Non-linear Kernel Evaluation time Our CVPR 08 Linear Kernel Additive Kernel Accuracy Made it possible to use SVMs with additive kernels for detection.
7. Additive Classifiers Much work already uses them! SVMs with additive kernels are additive classifiers Histogram based kernels Histogram intersection, chi-squared kernel Pyramid Match Kernel (Grauman & Darell, ICCV’05) Spatial Pyramid Match Kernel (Lazebniket.al., CVPR’06) ….
8. Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time Linear Kernel Accuracy
9. Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time <=1990s Linear Accuracy
10. Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time Today Linear Accuracy Eg. Cutting Plane, Stoc. Gradient Descend, Dual Coordinate Descend
11. Accuracy vs. Training Timefor SVM Classifiers Non-linear Additive Training time Our CVPR 08 Linear Accuracy
12. Accuracy vs. Training Timefor SVM Classifiers Non-linear Additive Training time Our CVPR 08 ✗ Linear Accuracy
13. Accuracy vs. Training Timefor SVM Classifiers Non-linear Additive Training time This Paper Linear Accuracy
14. Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time This Paper Linear Additive Accuracy Makes it possible to train additive classifiers very fast.
15. Summary Additive classifiers are widely used and can provide better accuracy than linear Our CVPR 08: SVMs with additive kernels are additive classifiers and can be evaluated in O(#dim) -- same as linear. This work: additive classifiers can be trained directly as efficiently (up to a small constant) as the best approaches for training linear classifiers. An example
16.
17. Can learn non-linear boundaries in input space Classification Function Kernel Trick
18. Embeddings… These embeddings can be high dimensional (even infinite) Our approach is based on embeddings thatapproximate kernels. We’d like this to be as accurate as possible We are going to use fast linear classifier training algorithms on the so sparseness is important.
19. Key Idea: Embedding an Additive Kernel Additive Kernels are easy to embed, just embed each dimension independently Linear Embedding for min Kernel for integers For non integers can approximate by quantizing
22. Linear SVM objective (solve with LIBLINEAR): Encoded SVM objective (not practical): Linear vs. Encoded SVMs
23. Linear vs. Encoded SVMs Linear SVM objective (solve with LIBLINEAR): Encoded SVM modified (custom solver): Encourages smooth functions Closely approximates min kernel SVM Custom solver : PWLSGD (see paper)
24. Linear SVM objective (solve with LIBLINEAR): Encoded SVM objective (solve with LIBLINEAR) : Linear vs. Encoded SVMs
32. Experiment : DC Pedestrians (3.18s, 89.25%) (1.86s, 88.80%) (363s, 89.05%) (2.98s, 85.71%) 100x faster training time ~ linear SVM accuracy ~ kernel SVM (1.89s, 72.98%) 20,000 features, 656 dimensional 100 bins for encoding 6-fold cross validation
33. Experiment : Caltech 101 (291s, 55.35%) (2687s, 56.49%) (102s, 54.8%) (90s, 51.64%) 10x faster Small loss in accuracy (41s, 46.15%) 30 training examples per category 100 bins for encoding Pyramid HOG + Spatial Pyramid Match Kernel
34. Experiment : INRIA Pedestrians (140 mins, 0.95) (76s, 0.94) (27s, 0.88) 300x faster training time ~ linear SVM accuracy ~ kernel SVMtrains the detector in < 2 mins (122s, 0.85) (20s, 0.82) SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots
35. Experiment : INRIA Pedestrians 300x faster training time ~ linear SVM accuracy ~ kernel SVMtrains the detector in < 2 mins SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots
36. Take Home Messages Additive models are practical for large scale data Can be trained discriminatively: Poor man’s version : encode + Linear SVM Solver Middle man’s version : encode + Custom Solver Rich man’s version : Min Kernel SVM Embedding only Approximates kernels, leads to small loss in accuracy but up to 100x speedup in training time Everyone should use: see code on our websites Fast IKSVM from CVPR’08, Encoded SVMs, etc
Thankyou. Good afternoon everybody. I am going to present ways to train additive classifiers efficiently . This work is a part of an ongoing collaboration with alex berg.
For any classification task the two main things we care about are accuracy and evaluation time. Especially for object detection where one evalutaes a classifier on thousands of windowsPer image – the evalutation time becomes very important. In the past linear SVMs though relatively less accurate were preferred over kernel SVMs for real-time applications.
In our CVPR 08 paper…
We identified a subset of non-linear kernels, called additive kernels that are used in many of the current object recognition tasks. These kernels have the special form that they decompose as a sum of Kernels over individual dimensions.
We identified a subset of non-linear kernels, called additive kernels that are used in many of the current object recognition tasks. These kernels have the special form that they decompose as a sum of Kernels over individual dimensions.
And showed that they can be evaulated efficiently. This makes it possible for one to use more accurate classifiers with relatively no loss in speed. In fact more than half of thisYear’s submissions to the PACCAL VOC object detection challenge use variants of additive kernels.
In this talk we are going to talk about additive models in general – where the classifier decomposes into dimensions. This may seem restrictive but it’s a useful class of classifiers which iis strictly more general than linear classifiers.In fact if the underlying kernel for the SVM is additive then the classifier is also additive
Pic looks similar to that for evaluation time… it is important to note that this was not the case even somewhat recently…
Maybe put some refs on this…
Maybe put some refs on this…As mentioned before, our previous work identified a subset of non-linear classifiers with an additive structure and showed they could be evaluated efficiently, but unfortunately did not address improving efficiency for training…
Maybe put some refs on this…
This paper addresses efficient training for additive classifiers, developing training methods that are about as efficient as the best methods fortraining linear classifiers. We also demonstrate the accuracy avantages on some popular datasets.?....
Should we change the wording? Drop SVM?
(finish this by 5 mins)
The idea of support vector machines is to find a separating hyperplane on the data into a high dimension space using a Kernel.The final classifier is ofcouse a line in a very high dimensional space but can be expressed using only the Kernel function using the so called kernel trick. If the embedded space is low dimensional then one can take advantage of the very fast linear SVM training algorithms which scale linearly with trainingData as opposed to the quadratic growth for the kernel SVM.
Unfortunately these embeddings are often high dimensionalOur approach can be seen as finding embeddings that are both sparse and accurate so that we can use the very best of the linear SVM training algorithms for trainingThe classifier. In fact we would ideally like the number of non zero entries in the embedded features to be a small multiple of the nonn zero entries in the input features.
A key idea of the paper is to realize that additive kernels are easy to embed as the final embedding is just a concatenation of the individual dimension embeddingsAS as example the min kernel or the histogram intersection kernel defined as A well known embedding for min kernel for integers is the unaryencoding where each number is represented in the unaryExample …For non-integers one may just approximate this by quantization