1 / 58

Local Features and Bag of Words Models

Local Features and Bag of Words Models. Jinxiang Chai. Slides from Svetlana Lazebnik, Derek Hoiem, James Hays, Antonio Torralba, David Lowe, Fei Fei Li and others. Previous Class. Overview and history of recognition. Specific recognition tasks. Svetlana Lazebnik.

daviseileen
Download Presentation

Local Features and Bag of Words Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Local Features and Bag of Words Models Jinxiang Chai Slides from Svetlana Lazebnik, Derek Hoiem, James Hays, Antonio Torralba, David Lowe, Fei Fei Li and others

  2. Previous Class • Overview and history of recognition

  3. Specific recognition tasks Svetlana Lazebnik

  4. Scene categorization or classification • outdoor/indoor • city/forest/factory/etc. Svetlana Lazebnik

  5. Image annotation / tagging / attributes • street • people • building • mountain • tourism • cloudy • brick • … Svetlana Lazebnik

  6. Object detection • find pedestrians Svetlana Lazebnik

  7. Image parsing sky mountain building tree building banner street lamp market people Svetlana Lazebnik

  8. Today’s class: features and bag of words models • Representation • Gist descriptor • Image histograms • Sift-like features • Bag of Words models • Encoding methods

  9. Image Categorization Training Training Labels Training Images Image Features Classifier Training Trained Classifier Derek Hoiem

  10. Image Categorization Training Training Labels Training Images Image Features Classifier Training Trained Classifier Testing Prediction Image Features Trained Classifier Outdoor Test Image Derek Hoiem

  11. Part 1: Image features Training Training Labels Training Images Image Features Classifier Training Trained Classifier Derek Hoiem

  12. Image representations • Templates • Intensity, gradients, etc. • Histograms • Color, texture, SIFT descriptors, etc.

  13. Image Representations: Histograms Global histogram • Represent distribution of features • Color, texture, depth, … Space Shuttle Cargo Bay Images from Dave Kauchak

  14. Image Representations: Histograms Histogram: Probability or count of data in each bin • Joint histogram • Requires lots of data • Loss of resolution to avoid empty bins • Marginal histogram • Requires independent features • More data/bin than joint histogram Images from Dave Kauchak

  15. Image Representations: Histograms Clustering EASE Truss Assembly Use the same cluster centers for all images Space Shuttle Cargo Bay Images from Dave Kauchak

  16. Computing histogram distance Histogram intersection (assuming normalized histograms) Chi-squared Histogram matching distance Cars found by color histogram matching using chi-squared

  17. Histograms: Implementation issues • Quantization • Grids: fast but applicable only with few dimensions • Clustering: slower but can quantize data in higher dimensions • Matching • Histogram intersection or Euclidean may be faster • Chi-squared often works better • Earth mover’s distance is good for when nearby bins represent similar values Few Bins Need less data Coarser representation Many Bins Need more data Finer representation

  18. What kind of things do we compute histograms of? • Color • Texture (filter banks or HOG over regions) L*a*b* color space HSV color space

  19. What kind of things do we compute histograms of? • Histograms of oriented gradients SIFT – Lowe IJCV 2004

  20. SIFT vector formation • Computed on rotated and scaled version of window according to computed orientation & scale • resample the window • Based on gradients weighted by a Gaussian of variance half the window (for smooth falloff)

  21. SIFT vector formation • 4x4 array of gradient orientation histograms • not really histogram, weighted by magnitude • 8 orientations x 4x4 array = 128 dimensions • Motivation: some sensitivity to spatial layout, but not too much. showing only 2x2 here but is 4x4

  22. Local Descriptors: Shape Context Count the number of points inside each bin, e.g.: Count = 4 ... Count = 10 Log-polar binning: more precision for nearby points, more flexibility for farther points. Belongie & Malik, ICCV 2001 K. Grauman, B. Leibe

  23. Shape Context Descriptor

  24. Things to remember about representation • Most features can be thought of as templates, histograms (counts), or combinations • Think about the right features for the problem • Coverage • Concision • Directness

  25. Bag-of-features models Svetlana Lazebnik

  26. Origin 1: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) - A document is a bag - The bag include some words the dictionary

  27. Origin 1: Bag-of-words models US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

  28. Origin 1: Bag-of-words models US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

  29. Origin 1: Bag-of-words models US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

  30. Origin 1: Bag-of-words models Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) D1: “John likes to watch movies. Mary likes too.” D2: “John also likes to watch football games.”

  31. Origin 1: Bag-of-words models Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) D1: “John likes to watch movies. Mary likes too.” D2: “John also likes to watch football games.” Dictionary={1:"John", 2:"likes", 3:"to", 4:"watch", 5:"movies", 6:"also", 7:"football", 8:"games", 9:"Mary", 10:"too"},

  32. Origin 1: Bag-of-words models Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) D1: “John likes to watch movies. Mary likes too.” D2: “John also likes to watch football games.” Dictionary={1:"John", 2:"likes", 3:"to", 4:"watch", 5:"movies", 6:"also", 7:"football", 8:"games", 9:"Mary", 10:"too"}, R1: [1, 2, 1, 1, 1, 0, 0, 0, 1, 1] R2: [1, 1, 1, 1, 0, 1, 1, 1, 0, 0]

  33. Origin 1: Bag-of-words models Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) How can we extend this idea for image search and object recognition? - a document = an image - a word = a feature - dictionary = feature sets (visual code book)

  34. Origin 2: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

  35. Origin 2: Texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

  36. Bags of features for object recognition Works pretty well for image-level classification face, flowers, building Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

  37. Bags of features for object recognition Caltech6 dataset bag of features bag of features Parts-and-shape model

  38. Bag-of-features steps • Extract features • Learn “visual vocabulary” • Quantize features using visual vocabulary • Represent images by frequencies of “visual words”

  39. Regular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 1. Feature extraction

  40. Regular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 Interest point detector Csurka et al. 2004 Fei-Fei & Perona, 2005 Sivic et al. 2005 1. Feature extraction

  41. 1. Feature extraction Compute descriptor Normalize patch Detect patches Slide credit: Josef Sivic

  42. 1. Feature extraction Slide credit: Josef Sivic

  43. 2. Learning the visual vocabulary Slide credit: Josef Sivic

  44. 2. Learning the visual vocabulary Clustering Slide credit: Josef Sivic

  45. 2. Learning the visual vocabulary Visual vocabulary Clustering Slide credit: Josef Sivic

  46. K-means clustering • Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk • Algorithm: • Randomly initialize K cluster centers • Iterate until convergence: • Assign each data point to the nearest center • Recompute each cluster center as the mean of all points assigned to it

  47. Clustering and vector quantization • Clustering is a common method for learning a visual vocabulary or codebook • Unsupervised learning process • Each cluster center produced by k-means becomes a codevector • Codebook can be learned on separate training set • Provided the training set is sufficiently representative, the codebook will be “universal” • The codebook is used for quantizing features • A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook • Codebook = visual vocabulary • Codevector = visual word

  48. Example codebook Appearance codebook Source: B. Leibe

  49. … … … … Appearance codebook Another codebook Source: B. Leibe

  50. Visual vocabularies: Issues • How to choose vocabulary size? • Too small: visual words not representative of all patches • Too large: quantization artifacts, overfitting • Computational efficiency • Vocabulary trees (Nister & Stewenius, 2006)

More Related