130 likes | 285 Views
The Role of Learning in Vision. 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3 .50pm: Kai Yu 4.00pm: Yann LeCun 4.10pm: Alan Yuille 4.20pm: Deva Ramanan 4.30pm: Erik Learned-Miller 4 .40pm: Erik Sudderth 4.50pm: Spotlights - Qiang Ji , M-H Yang 4 .55pm: Discussion
E N D
The Role of Learning in Vision 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3.50pm: Kai Yu 4.00pm:YannLeCun 4.10pm: Alan Yuille 4.20pm: Deva Ramanan 4.30pm: Erik Learned-Miller 4.40pm: Erik Sudderth 4.50pm: Spotlights - QiangJi, M-H Yang 4.55pm: Discussion 5.30pm: End Overview Feature / Deep Learning Compositional Models Learning Representations Low-level Representations Learning on the fly
An Overview of Hierarchical Feature Learning and Relations to Other Models Rob Fergus Dept. of Computer Science, Courant Institute, New York University
Motivation • Multitude of hand-designed features currently in use • SIFT, HOG, LBP, MSER, Color-SIFT…………. • Maybe some way of learning the features? • Also, just capture low-level edge gradients Yan & Huang (Winner of PASCAL 2010 classification competition) Felzenszwalb, Girshick, McAllester and Ramanan, PAMI 2007
Beyond Edges? • Mid-level cues Continuation Parallelism Junctions Corners “Tokens” from Vision by D.Marr: • High-level object parts: • Difficult to hand-engineer What about learning them?
Deep/Feature Learning Goal • Build hierarchy of feature extractors (≥ 1 layers) • All the way from pixels classifier • Homogenous structure per layer • Unsupervised training Image/Video Pixels Layer 1 Layer 2 Layer 3 Simple Classifier • Numerous approaches: • Restricted Boltzmann Machines (Hinton, Ng, Bengio,…) • Sparse coding (Yu, Fergus, LeCun) • Auto-encoders (LeCun, Bengio) • ICA variants (Ng, Cottrell) • & many more….
Single Layer Architecture Input: Image Pixels / Features Filter Details in the boxes matter (especially in a hierarchy) Links to neuroscience Normalize Pool Output: Features / Classifier
Example Feature Learning Architectures Filter with Dictionary (patch/tiled/convolutional) + Non-linearity Pixels / Features Normalizationbetween feature responses (Group) Sparsity Max / Softmax Local Contrast Normalization (Subtractive / Divisive) Spatial/Feature (Sum or Max) Features
SIFT Descriptor Image Pixels ApplyGabor filters Spatial pool (Sum) Feature Vector Normalize to unit length
Spatial Pyramid Matching Lazebnik, Schmid, Ponce [CVPR 2006] SIFTFeatures Filter with Visual Words Max Multi-scalespatial pool (Sum) Classifier
Role of Normalization • Lots of different mechanisms (max, sparsity, LCN etc.) • All induce local competition between features to explain input • “Explaining away” • Just like top-down models • But more local mechanism |.|1 |.|1 |.|1 |.|1 Convolution Filters Example: Convolutional Sparse Coding Zeiler et al. [CVPR’10/ICCV’11], Kavakouglou et al. [NIPS’10], Yang et al. [CVPR’10]
Role of Pooling • Spatial pooling • Invariance to small transformations • Larger receptive fields • Pooling across feature groups • Gives AND/OR type behavior • Compositional models of Zhu, Yuille Zeiler, Taylor, Fergus [ICCV 2011] • Pooling with latent variables (& springs) • Pictorial structures models Felzenszwalb, Girshick, McAllester, Ramanan[PAMI 2009] Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007]
Object Detection with Discriminatively Trained Part-Based Models Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009] HOGPyramid Apply objectpart filters Pool part responses (latent variables & springs) Non-maxSuppression (Spatial) Score + +