1 / 14

Learning Mid-Level Features For Recognition

Learning Mid-Level Features For Recognition. Y-Lan Boureau, Francis Bach, Yann LeCun and Jean Ponce. Published in CVPR 2010. Presented by Bo Chen, 8.20,2010. Outline. 1. Classification System 2. Brief introduction of each step 3. Systematic evaluation of unsupervised mid-level features

brian
Download Presentation

Learning Mid-Level Features For Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Mid-Level Features For Recognition Y-Lan Boureau, Francis Bach, Yann LeCun and Jean Ponce Published in CVPR 2010 Presented by Bo Chen, 8.20,2010

  2. Outline • 1. Classification System • 2. Brief introduction of each step • 3. Systematic evaluation of unsupervised mid-level features • 4. Learning discriminative dictionaries • 5. Average and max pooling • 6. Conclusions

  3. Patches SIFT Coding Pooling Classifier SPM System Flow Chart Subsampling sliding patches to cover all of details Robust features invariant to some conditions Generate mid-level feature, such as sparse coding, vector quantization and deep network Max-pooling or average pooling Spatial pyramid model Linear or nonlinear SVM

  4. Scale Invariant Feature Transform (D. Lowe, IJCV,2004) Motivations: Image matching, scale invariance, rotation invariance, illumination invariance and viewpoint invariance Figures from David Lee’s ppt

  5. Calculate SIFT Descriptors Divide a 16x16 patch into 4 subregions, 8 bins in each subregion which leads to a 4x4x8=128 dimensional vector. (low-level) Figures from Jason Clemons’s ppt

  6. Notations Question: How can we represent each region? Figure from S. Lazebnik et.al, CVPR06

  7. Coding and Pooling • Vector quantization (Bag-of-features) Or • Sparse Coding

  8. Systematic evaluation of unsupervised mid-level features

  9. Macrofeatures and Denser SIFT Sampling • Parameterizations: • SIFT sampling density • macrofeature side length • subsampling parameter Results: Caltech101: 75.7% (4, 2, 4) Scene: 84.3% (8, 2, 1)

  10. Results

  11. Discriminative Dictionaries Algorithm: stochastic gradient descent Cons: high computational complexity Solutions: 1.approximate z(n) by pooling over a random sample of ten locations of the image. 2. Update only a random subset of coordinates at each iteration. Scenes dataset

  12. Average and Max Pooling • Why pooling? • Pooling is used to achieve invariance to image transformations, • more compact representations, and better robustness to noise and • clutter so as to preserves important information while discarding • irrelevant detail, the crux of the matter being to determine what falls • in which category. • Max-pooling vs. Average-pooling • The authors show that using max pooling on hard vector quantized • features in a spatial pyramid brings the performance of linear • classification to the level of that obtained by Lazebnik et al. (2006) • with an intersection kernel, even though the resulting feature is • binary. • Our feeling • Pooling helps the learned codes sparse, which follows the human • visual function. Especially, for convolutional deep network, pooling • appears very necessary since there is correlations between neighbor • contents. Part conclusion from Y-Lan Boureau et. al, ICML 2010

  13. Theoretical Comparison of Average and Max Pooling Experimental methodology: Binary classification (positive and negative)

  14. Conclusions • 1. Give a comprehensive and systematic comparison across each step of mid-level feature extraction through several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the average, or the maximum), which obtains state-of-the-art performance or better on several recognition benchmarks. • 2. Supervised dictionary learning method for sparse coding • 3. Theoretical and empirical insight into the remarkable performance of max pooling.

More Related