1 / 54

Large Image Databases and Small Codes for Object Recognition

Large Image Databases and Small Codes for Object Recognition. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT). Object Recognition. Pixels  Description of scene contents. Banksy, 2006. Large image datasets. Internet contains billions of images.

khalil
Download Presentation

Large Image Databases and Small Codes for Object Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

  2. Object Recognition Pixels  Description of scene contents Banksy, 2006

  3. Large image datasets Internet contains billions of images • Amazing resource • Maybe we can use it for recognition?

  4. Web image dataset • 79.3 million images • Collected using imagesearch engines • List of nouns taken from Wordnet • 32x32resolution • Example first 10: a-bomb a-horizon a._conan_doyle a._e._burnside a._e._housman a._e._kennelly a.e. a_battery a_cappella_singing a_horizon See “80 Million Tiny images” TR

  5. Noisy Output from Image Search Engines

  6. Nearest-Neighbor methods for recognition # images 105 106 108

  7. Issues • Need lots of data • How to find neighbors quickly? • What distance metric to use? • How to transfer labels from neighbors to query image? • What happens if labels are unreliable?

  8. Overview • Fast retrieval using compact codes • Recognition using neighbors with unreliable labels

  9. Overview • Fast retrieval using compact codes • Recognition using neighbors with unreliable labels

  10. Binary codes for images • Want images with similar contentto have similar binary codes • Use Hamming distance between codes • Number of bit flips • E.g.: • Semantic Hashing [Salakhutdinov & Hinton, 2007] • Text documents Ham_Dist(10001010,10001110)=1 Ham_Dist(10001010,11101110)=3

  11. Binary codes for images • Permits fast lookup via hashing • Each code is a memory address • Find neighbors by exploring Hamming ball around query address • Lookup time depends on radius of ball, NOT on # data points Address Space Semantically similar images Figure adapted from Salakhutdinov & Hinton ‘07 Query address Semantic HashFunction Query Image

  12. Compact Binary Codes • Google has few billion images (109) • Big PC has ~10 Gbytes (1011 bits) • Codes must fit in memory (disk too slow)  Budget of 102 bits/image • 1 Megapixel image is 107 bits • 32x32 color image is 104 bits •  Semantic hash function must also reduce dimensionality

  13. Code requirements • Preserves neighborhood structureof input space • Very compact (<102 bits/image) • Fast to compute Three approaches: • Locality Sensitive Hashing (LSH) • Boosting • Restricted Boltzmann Machines (RBM’s)

  14. Image representation: Gist vectors • Pixels not a convenient representation • Use GIST descriptor instead • 512 dimensions/image • L2 distance btw. Gist vectors not bad substitute for human perceptual distance Oliva & Torralba, IJCV 2001

  15. 1. Locality Sensitive Hashing • Gionis, A. & Indyk, P. & Motwani, R. (1999) • Take random projections of data • Quantize each projection with few bits • For our N bit code: • Compute first N PCA components of data • Each random projection must be linear combination of the N PCA components

  16. 2. Boosting • Modified form of BoostSSC [Shaknarovich, Viola & Darrell, 2003] • GentleBoost with exponential loss • Positive examples are pairs of neighbors • Negative examples are pairs of unrelated images • Each binary regression stump examines a single coordinate in input pair, comparing to some learnt threshold to see that they agree

  17. 3. Restricted Boltzmann Machine (RBM) • Network of binary stochastic units • Hinton & Salakhutdinov, Science 2006 Hidden units: h Symmetric weights w Parameters: Weights w Biases b Visible units: v

  18. RBM architecture • Network of binary stochastic units • Hinton & Salakhutdinov, Science 2006 Hidden units: h Symmetric weights w Parameters: Weights w Biases b Convenient conditional distributions: Visible units: v Learn weights and biases using Contrastive Divergence

  19. Multi-Layer RBM: non-linear dimensionality reduction Output binary code (N dimensions) N Layer 3 w3 256 256 Layer 2 w2 512 512 Layer 1 w1 512 Input Gist vector (512 dimensions)

  20. Training RBM models • Two phases • Pre-training • Unsupervised • Use Contrastive Divergence to learn weights and biases • Gets parameters to right ballpark • Fine-tuning • Can make use of label information • No longer stochastic • Back-propagate error to update parameters • Moves parameters to local minimum

  21. Greedy pre-training (Unsupervised) 512 Layer 1 w1 512 Input Gist vector (512 dimensions)

  22. Greedy pre-training (Unsupervised) 256 Layer 2 w2 512 Activations of hidden units from layer 1 (512 dimensions)

  23. Greedy pre-training (Unsupervised) N Layer 3 w3 256 Activations of hidden units from layer 2 (256 dimensions)

  24. Backpropagation using Neighborhood Components Analysis objective Output binary code (N dimensions) N Layer 3 w3 256 256 Layer 2 w2 512 512 Layer 1 w1 512 Input Gist vector (512 dimensions)

  25. Neighborhood Components Analysis • Goldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004 Points in output space (low dimensional) Labels defined ininput space (high dim.)

  26. Neighborhood Components Analysis • Goldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004 Output of RBM W are RBM weights

  27. Neighborhood Components Analysis • Goldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004 Pulls nearby pointsOF SAME CLASS closer Push nearby pointsOF DIFFERENT CLASS away

  28. Neighborhood Components Analysis • Goldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004 Pulls nearby pointsOF SAME CLASS closer Push nearby pointsOF DIFFERENT CLASS away Preserves neighborhood structure of original, high dimensional, space

  29. Two test datasets • LabelMe • 22,000 images (20,000 train | 2,000 test) • Ground truth segmentations for all • Can define ground truth distance btw. images using these segmentations • Per pixel labels [SUPERVISED] • Web data • 79.3 million images • Collected from Internet • No labels, so use L2 distance btw. GIST vectors as ground truth distance [UNSUPERVISED] • Noisy image labels only

  30. Retrieval Experiments

  31. Examples of LabelMe retrieval • 12 closest neighbors under different distance metrics

  32. LabelMe Retrieval % of 50 true neighbors in retrieval set 0 2,000 10,000 20,0000 Size of retrieval set

  33. LabelMe Retrieval % of 50 true neighbors in retrieval set % of 50 true neighbors in first 500 retrieved 0 2,000 10,000 20,0000 Number of bits Size of retrieval set

  34. Web images retrieval % of 50 true neighbors in retrieval set Size of retrieval set

  35. Web images retrieval % of 50 true neighbors in retrieval set % of 50 true neighbors in retrieval set Size of retrieval set Size of retrieval set

  36. Examples of Web retrieval • 12 neighbors using different distance metrics

  37. Retrieval Timings

  38. LabelMe Recognition examples

  39. LabelMe Recognition

  40. Overview • Fast retrieval using compact codes • Recognition using neighbors with unreliable labels

  41. Label Assignment • Retrieval gives set of nearby images • How to compute label? • Issues: • Labeling noise • Keywords can be very specific • e.g. yellowfin tuna Query Grover Cleveland Linnet Birdcage Chiefs Casing Neighbors

  42. Wordnet – a Lexical Dictionary http://wordnet.princeton.edu/ Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun aardvark Sense 1 aardvark, ant bear, anteater, Orycteropus afer => placental, placental mammal, eutherian, eutherian mammal => mammal => vertebrate, craniate => chordate => animal, animate being, beast, brute, creature => organism, being => living thing, animate thing => object, physical object => entity

  43. Wordnet Hierarchy • Convert graph structure into tree by taking most common meaning Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun aardvark Sense 1 aardvark, ant bear, anteater, Orycteropus afer => placental, placental mammal, eutherian, eutherian mammal => mammal => vertebrate, craniate => chordate => animal, animate being, beast, brute, creature => organism, being => living thing, animate thing => object, physical object => entity

  44. Wordnet Voting Scheme Ground truth One image – one vote

  45. Classification at Multiple Semantic Levels Votes: Living 44 Artifact 9 Land 3 Region 7 Others 10 Votes: Animal 6 Person 33 Plant 5 Device 3 Administrative 4 Others 22

  46. Wordnet Voting Scheme

  47. Person Recognition • 23% of all imagesin dataset containpeople • Wide range ofposes: not justfrontal faces

  48. Person Recognition – Test Set • 1016 images fromAltavista using“person” query • High res and 32x32available • Disjoint from 79million tiny images

  49. Person Recognition • Task: person in image or not? • 64-bit RBM code trained on web data (Weak image labels only) Viola-Jones Viola-Jones

More Related