1 / 36

Max-Margin Additive Classifiers for Detection

Max-Margin Additive Classifiers for Detection. Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto, Japan. Accuracy vs. Evaluation Time for SVM Classifiers. Non-linear Kernel. Evaluation time. Linear Kernel. Accuracy.

tory
Download Presentation

Max-Margin Additive Classifiers for Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Max-Margin Additive Classifiers for Detection Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto, Japan

  2. Accuracy vs. Evaluation Timefor SVM Classifiers Non-linear Kernel Evaluation time Linear Kernel Accuracy

  3. Accuracy vs. Evaluation Timefor SVM Classifiers Non-linear Kernel Evaluation time Our CVPR 08 Linear Kernel Accuracy

  4. Accuracy vs. Evaluation Timefor SVM Classifiers Non-linear Kernel Additive Kernel Evaluation time Our CVPR 08 Linear Kernel Accuracy

  5. Accuracy vs. Evaluation Timefor SVM Classifiers Additive Kernel Non-linear Kernel Additive Kernel Evaluation time Our CVPR 08 Linear Kernel Accuracy

  6. Accuracy vs. Evaluation Timefor SVM Classifiers Additive Kernel Non-linear Kernel Evaluation time Our CVPR 08 Linear Kernel Additive Kernel Accuracy Made it possible to use SVMs with additive kernels for detection.

  7. Additive Classifiers • Much work already uses them! • SVMs with additive kernels are additive classifiers • Histogram based kernels • Histogram intersection, chi-squared kernel • Pyramid Match Kernel (Grauman & Darell, ICCV’05) • Spatial Pyramid Match Kernel (Lazebnik et.al., CVPR’06) • ….

  8. Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time Linear Kernel Accuracy

  9. Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time <=1990s Linear Accuracy

  10. Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time Today Linear Accuracy • Eg. Cutting Plane, Stoc. Gradient Descend, Dual Coordinate Descend

  11. Accuracy vs. Training Timefor SVM Classifiers Non-linear Additive Training time Our CVPR 08 Linear Accuracy

  12. Accuracy vs. Training Timefor SVM Classifiers Non-linear Additive Training time Our CVPR 08 ✗ Linear Accuracy

  13. Accuracy vs. Training Timefor SVM Classifiers Non-linear Additive Training time This Paper Linear Accuracy

  14. Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time This Paper Linear Additive Accuracy Makes it possible to train additive classifiers very fast.

  15. Summary • Additive classifiers are widely used and can provide better accuracy than linear • Our CVPR 08: SVMs with additive kernels are additive classifiers and can be evaluated in O(#dim) -- same as linear. • This work: additive classifiers can be trained directly as efficiently (up to a small constant) as the best approaches for training linear classifiers. An example

  16. Support Vector Machines Embedded Space Input Space • Kernel Function • Inner Product in the embedded space • Can learn non-linear boundaries in input space Classification Function Kernel Trick

  17. Embeddings… • These embeddings can be high dimensional (even infinite) • Our approach is based on embeddings that approximate kernels. • We’d like this to be as accurate as possible • We are going to use fast linear classifier training algorithms on the so sparseness is important.

  18. Key Idea: Embedding an Additive Kernel • Additive Kernels are easy to embed, just embed each dimension independently • Linear Embedding for min Kernel for integers • For non integers can approximate by quantizing

  19. Issues: Embedding Error • Quantization leads to large errors • Better encoding x y

  20. Issues: Sparsity • Represent with sparse values

  21. Linear vs. Encoded SVMs • Linear SVM objective (solve with LIBLINEAR): • Encoded SVM objective (not practical):

  22. Linear vs. Encoded SVMs • Linear SVM objective (solve with LIBLINEAR): • Encoded SVM modified (custom solver): Encourages smooth functions Closely approximates min kernel SVM Custom solver : PWLSGD (see paper)

  23. Linear vs. Encoded SVMs • Linear SVM objective (solve with LIBLINEAR): • Encoded SVM objective (solve with LIBLINEAR) :

  24. Additive Classifier Choices Regularization Encoding

  25. Additive Classifier Choices Accuracy Increases Regularization Encoding Evaluation times are similar

  26. Additive Classifier Choices Accuracy Increases Regularization Encoding Accuracy Increases Evaluation times are similar

  27. Additive Classifier Choices Accuracy Increases Regularization Encoding Accuracy Increases Standard solver Eg. LIBSVM Few lines of code + standard solver Eg. LIBLINEAR

  28. Additive Classifier Choices Accuracy Increases Regularization Encoding Accuracy Increases Custom solver

  29. Additive Classifier Choices Accuracy Increases Regularization Encoding Accuracy Increases Classifier Notations

  30. Experiments • “Small” Scale: Caltech 101 (Fei-Fei, et.al.) • “Medium” Scale: DC Pedestrians (Munder & Gavrila) • “Large” Scale : INRIA Pedestrians (Dalal & Triggs)

  31. Experiment : DC Pedestrians (3.18s, 89.25%) (1.86s, 88.80%) (363s, 89.05%) (2.98s, 85.71%) 100x faster training time ~ linear SVM accuracy ~ kernel SVM (1.89s, 72.98%) 20,000 features, 656 dimensional 100 bins for encoding 6-fold cross validation

  32. Experiment : Caltech 101 (291s, 55.35%) (2687s, 56.49%) (102s, 54.8%) (90s, 51.64%) 10x faster Small loss in accuracy (41s, 46.15%) 30 training examples per category 100 bins for encoding Pyramid HOG + Spatial Pyramid Match Kernel

  33. Experiment : INRIA Pedestrians (140 mins, 0.95) (76s, 0.94) (27s, 0.88) 300x faster training time ~ linear SVM accuracy ~ kernel SVMtrains the detector in < 2 mins (122s, 0.85) (20s, 0.82) SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots

  34. Experiment : INRIA Pedestrians 300x faster training time ~ linear SVM accuracy ~ kernel SVMtrains the detector in < 2 mins SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots

  35. Take Home Messages • Additive models are practical for large scale data • Can be trained discriminatively: • Poor man’s version : encode + Linear SVM Solver • Middle man’s version : encode + Custom Solver • Rich man’s version : Min Kernel SVM • Embedding only Approximates kernels, leads to small loss in accuracy but up to 100x speedup in training time • Everyone should use: see code on our websites • Fast IKSVM from CVPR’08, Encoded SVMs, etc

  36. Thank You

More Related