1 / 56

Semi-Supervised Learning in Gigantic Image Collections

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT). TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A. What does the world look like?.

dorie
Download Presentation

Semi-Supervised Learning in Gigantic Image Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

  2. What does the world look like? Gigantic Image Collections High level image statistics Object Recognition for large-scale image search

  3. Spectrum of Label Information Noisy labels Unlabeled Human annotations

  4. Semi-Supervised Learning • Classification function should be smooth with respect to data density Data Supervised Semi-Supervised

  5. Semi-Supervised Learning using Graph Laplacian [Zhu03,Zhou04] is n x n affinity matrix (n = # of points) Graph Laplacian:

  6. SSL using Graph Laplacian • Want to find label function f that minimizes: • y = labels • If labeled, , otherwise Smoothness Agreement with labels • Solution: n x n system (n = # points)

  7. Eigenvectors of Laplacian • Smooth vectors will be linear combinations of eigenvectors U with small eigenvalues: [Belkin & Niyogi 06, Schoelkopf & Smola 02, Zhu et al 03, 08]

  8. Rewrite System • Let • U = smallest k eigenvectors of L • = coeffs. • k is user parameter (typically ~100) • Optimal is now solution to k x k system:

  9. Computational Bottleneck • Consider a dataset of 80 million images • Inverting L • Inverting 80 million x 80 million matrix • Finding eigenvectors of L • Diagonalizing 80 million x 80 million matrix

  10. Large Scale SSL - Related work • Nystrom method: pick small set of landmark points • Compute exact eigenvectors on these • Interpolate solution to rest • Other approaches include: [see Zhu ‘08 survey] Data Landmarks Mixture models (Zhu and Lafferty ‘05), Sparse Grids (Garcke and Griebel ‘05), Sparse Graphs (Tsang and Kwok ‘06)

  11. Our Approach

  12. Overview of Our Approach • Compute approximate eigenvectors Density Data Landmarks Ours Nystrom Limit as n ∞ Reduce n Linear in number of data-points Polynomial in number of landmarks

  13. Consider Limit as n  ∞ • Consider x to be drawn from 2D distribution p(x) • Let Lp(F) be a smoothness operator on p(x), for a function F(x) • Smoothness operator penalizesfunctions that vary in areasof high density • Analyze eigenfunctions of Lp(F) where 2

  14. Eigenvectors & Eigenfunctions

  15. Key Assumption: Separability of Input data • Claim: If p is separable, then: Eigenfunctions of marginals are also eigenfunctions of the joint density, with same eigenvalue p(x1) p(x2) • [Nadler et al. 06,Weiss et al. 08] p(x1,x2)

  16. Numerical Approximations to Eigenfunctions in 1D • 300,000 points drawn from distribution p(x) • Consider p(x1) p(x1) Histogram h(x1) p(x) Data

  17. Numerical Approximations to Eigenfunctions in 1D • Solve for values of eigenfunction at set of discrete locations (histogram bin centers) • and associated eigenvalues • B x B system (B = # histogram bins, e.g. 50)

  18. 1D Approximate Eigenfunctions 1st Eigenfunction of h(x1) 2nd Eigenfunction of h(x1) 3rd Eigenfunction of h(x1)

  19. Separability over Dimension • Build histogram over dimension 2: h(x2) • Now solve for eigenfunctions of h(x2) 1st Eigenfunction of h(x2) 2nd Eigenfunction of h(x2) 3rd Eigenfunction of h(x2)

  20. From Eigenfunctions to Approximate Eigenvectors • Take each data point • Do 1-D interpolation in each eigenfunction • Very fast operation Eigenfunction value 1 50 Histogram bin

  21. Preprocessing • Need to make data separable • Rotate using PCA PCA Separable Not separable

  22. Overall Algorithm • Rotate data to maximize separability (currently use PCA) • For each of the d input dimensions: • Construct 1D histogram • Solve numerically for eigenfunctions/values • Order eigenfunctions from all dimensions by increasing eigenvalue & take first k • Interpolate data into keigenfunctions • Yields approximate eigenvectors of Laplacian • Solve k x k least squares system to give label function

  23. Experimentson Toy Data

  24. Nystrom Comparison • With Nystrom, too few landmark points result in highly unstable eigenvectors

  25. Nystrom Comparison • Eigenfunctions fail when data has significant dependencies between dimensions

  26. Experimentson Real Data

  27. Experiments • Images from 126 classes downloaded from Internet search engines, total 63,000 images Dump truck Emu • Labels (correct/incorrect) provided by Alex Krizhevsky, Vinod Nair & Geoff Hinton, (CIFAR & U. Toronto)

  28. Input Image Representation • Pixels not a convenient representation • Use Gist descriptor (Oliva & Torralba, 2001) • L2 distance btw. Gist vectors rough substitute for human perceptual distance • Apply oriented Gabor filters • over different scales • Average filter energy • in each bin

  29. Are Dimensions Independent? Joint histogram for pairs of dimensions after PCA to 64 dimensions Joint histogram for pairs of dimensions from raw 384-dimensional Gist PCA MI is mutual information score. 0 = Independent

  30. Real 1-D Eigenfunctions of PCA’d Gist descriptors Eigenfunction 1 Input Dimension

  31. Protocol • Task is to re-rank images of each class (class/non-class) • Use eigenfunctionscomputed on all63,000 images • Vary number of labeled examples • Measure precision @ 15% recall

  32. Total number of images 4800 5000 6000 8000

  33. Total number of images 4800 5000 6000 8000

  34. Total number of images 4800 5000 6000 8000

  35. Total number of images 4800 5000 6000 8000

  36. 80 Million Images

  37. Running on 80 million images • PCA to 32 dims, k=48 eigenfunctions • For each class, labels propagating through 80 million images • Precompute approximate eigenvectors (~20Gb) • Label propagation is fast <0.1secs/keyword

  38. Japanese Spaniel 3 positive 3 negative Labels from CIFAR set

  39. Airbus, Ostrich, Auto

  40. Summary • Semi-supervised scheme that can scale to really large problems – linear in # points • Rather than sub-sampling the data, we take the limit of infinite unlabeled data • Assumes input data distribution is separable • Can propagate labels in graph with 80 million nodes in fractions of second • Related paper in this NIPS by Nadler, Srebro & Zhou • See spotlights on Wednesday

  41. Future Work • Can potentially use 2D or 3D histograms instead of 1D • Requires more data • Consider diagonal eigenfunctions • Sharing of labels between classes

  42. Comparison of Approaches

  43. Eigenvalues Approximate Exact Eigenvectors Exact -- Approximate Eigenvectors 0.0531 : 0.0535 Data 0.1920 : 0.1928 0.2049 : 0.2068 0.2480 : 0.5512 0.3580 : 0.7979

  44. Are Dimensions Independent? Joint histogram for pairs of dimensions after PCA Joint histogram for pairs of dimensions from raw 384-dimensional Gist PCA MI is mutual information score. 0 = Independent

  45. Are Dimensions Independent? Joint histogram for pairs of dimensions after ICA Joint histogram for pairs of dimensions from raw 384-dimensional Gist ICA MI is mutual information score. 0 = Independent

  46. Varying # Eigenfunctions

  47. Leveraging Noisy Labels • Images in dataset have noisy labels • Keyword used in from Internet search engine • Can easily be incorporated into SSL scheme • Give weight 1/10th of hand-labeled example

  48. Leveraging Noisy Labels

  49. Effect of Noisy Labels

More Related