1 / 28

Mobile Visual Search

Mobile Visual Search. Bernd Girod Stanford University bgirod@stanford.edu. Mobile Visual Search. Mobile Augmented Reality. Outline. Computer vision: “Bag of Words” matching Scalability for large data bases Feature compression Phone-based vs. server-based processing. d x. d y. scale.

myron
Download Presentation

Mobile Visual Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mobile Visual Search Bernd Girod Stanford University bgirod@stanford.edu

  2. Mobile Visual Search

  3. Mobile Augmented Reality

  4. Outline • Computer vision: “Bag of Words” matching • Scalability for large data bases • Feature compression • Phone-based vs. server-based processing

  5. dx dy scale SIFT Descriptor SURF Descriptor y x Computing Visual Words Σdx Σdy Σ|dx| Σ|dy| Σ Σ Σ Σ Σ Σ Σ Σ Color Gray Dxx Σdx Σdy Σ|dx| Σ|dy| Maxima Dxy … … DxxDyy-(0.9Dxy)2 Σdx Σdy Σ|dx| Σ|dy| Dyy Orient along dominant gradient Oriented Patch Gradient Field Filters Blob Response

  6. Pairwise Comparison “Bag of Words” Matching & Geometric Consistency Check

  7. Growing Vocabulary Tree [Nistér and Stewenius, 2006]

  8. Growing Vocabulary Tree [Nistér and Stewenius, 2006]

  9. Growing Vocabulary Tree [Nistér and Stewenius, 2006]

  10. Growing Vocabulary Tree k = 3 [Nistér and Stewenius, 2006]

  11. Growing Vocabulary Tree k = 3 [Nistér and Stewenius, 2006]

  12. Querying Vocabulary Tree Query

  13. Real-time System: Send Image 20 kbps  20 sec Image Wireless Network Information Camera Client Server Feature Extraction VocTreeImage Matching

  14. Real-time System: Send Features Features Wireless Network Information Camera FeatureExtraction Coding Client Server VocTree Image Matching

  15. Advanced Feature Compression • Transform Coding of SIFT/SURF descriptors[Chandrasekhar et al., VCIP 09] • Direct compression of oriented image patch [M. Makar et al., ICASSP 09] • Descriptor designed for compressibility: CHoG[Chandrasekhar et al., CVPR 09] • Tree-Structured Vector QuantizationTree Histogram Coding [Chen et al., DCC 09] • Compression of Location Information[Tsai et al., Mobimedia 09]

  16. Transform Coding of SURF/SIFT Descriptors OrthonormalTransform Quantizer Entropy Coder OriginalDescriptor Client Channel Channel Server Inverse Transform Inverse Quantizer Entropy Decoder DecodedDescriptor [Chandrasekhar et al., VCIP 2009]

  17. Patch Compression Compute Descriptors Compress Descriptors Detect Keypoints Decompress Descriptors Decompress Patches Matching Compress Patches Client Server [M. Makar et al., ICASSP 2009]

  18. Patch CHoG: Compressed Histogram of Gradients Gradient distributions for each bin Gradients dx dx dy dy 011101 Spatial binning 0100101 01101 101101 Histogram compression 0100011 111001 0010011 01100 1010100 CHoGDescriptor

  19. 0.46 1/2 0.21 1/4 0.46 0.16 1/8 0.08 0.09 1/16 1/16 0.21 Gradient distribution 0.08 0.16 0.09 CHoG: Histogram Compression Huffman treeapproximatesprobabilities Gradient binning

  20. Rooted binary trees with nleaf nodes Enumerating Huffman Trees

  21. Ground truth data setof matching patches Liberty data set [Winder & Brown CVPR ’09] Feature Matching Performance

  22. Location Histogram Coding Feature Locations (x,y) Spatial Binning Context-based Arithmetic Coding - Refinement Bits Quantize + [Tsai et al., MobiMedia 2009]

  23. Compressed Feature Vector 52 84 1024 1088 59 Size (bits) SIFT Location x,y 1088 bits CHoG Location x,y ~ 84 bits Compressed x,yCHoG ~ 59 bits • [Tsai et al., MobiMedia 2009]

  24. Timing Analysis Nokia N95 330 MHz ARM 64 MB RAM Server Delay Execution Time (sec) Upload Image 40 kByte Server Delay Upload Features 2.2 kByte Extract Features “Send Features” “Send Image”

  25. Timing Analysis Nokia N95 330 MHz ARM 64 MB RAM Execution Time (sec) Server Delay Upload Image 40 kByte Server Delay Upload 2.2 kByte Extract Features “Send Features” “Send Image”

  26. Timing Analysis Nokia N95 330 MHz ARM 64 MB RAM Execution Time (sec) Server Delay Server Delay Extract Features “Send Features” “Send Image”

  27. Concluding Remarks • Mobile Visual Search is ready for prime-time: • Bag of Words approach: > 95% recall for > 1M database • Vocabulary trees for fast retrieval (1 sec for > 1M database) • Feature compression helps: • Sending a JPEG image over wireless link can be rather slow send salient image features to server • Small database: send compressed database features to phone • CHoG outperforms SIFT and SURF in rate-constrained recognition performance: ~60 bits incl. location • Feature extraction ~1 sec on 330 MHz smartphone

  28. Acknowledgments Radek Grzeszczuk David Chen Vijay Chandrasekhar YuriyReznik Jatinder Singh Gabriel Takacs Wei-Chao Chen Natasha Gelfand Yingen Xiong Kari Pulli Sam Tsai Rahul Swaminathan Thanos Bismpigiannis Mina Makar

More Related