1 / 35

Epitomic Location Recognition

K. Ni, A. Kannan , A. Criminisi and J. Winn. Epitomic Location Recognition. A g enerative approach for location recognition. In proc. CVPR 2008. Anchorage, Alaska. Goal Introduction Recognition Enhancements Evaluation. Location Recognition. Where am I? Instance recognition

Download Presentation

Epitomic Location Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. K. Ni, A. Kannan, A. Criminisi and J. Winn Epitomic Location Recognition A generative approach for location recognition In proc. CVPR 2008. Anchorage, Alaska.

  2. Goal Introduction Recognition Enhancements Evaluation

  3. Location Recognition • Where am I? • Instance recognition • Category recognition (more difficult) Lobby? Cubicle? Hallway? Kitchen?

  4. Goal Introduction Recognition Enhancements Evaluation

  5. Geometry Based Recognition • SLAM & structure from motion • Why do we need metric reconstruction? • Lose the flexibility to do class recognition. Training Images Local Feature Database Geometry &Labels Testing Image Features F. Schaffalitzky and A. Zisserman G. Schindler, M. Brown, R. Szeliski

  6. Appearance Based Recognition • Capture global appearance information • Gaussian mixture model used by A. Torralba, et. al Preprocessing Image Vectors Training Training Images Appearance Model (e.g. PCA) A. Torralba, K. Murphy, W. T. Freeman and M. A. Rubin M. Cummins and P. Newman

  7. Appearance or Geometry? • Can we do better by fusing both information together? A small example with 2 location labels: cubicle and corridor

  8. The Simplest Model • Nearest neighbor classification • Naive but still effective with enough samples. • A small shift may disrupt the recognition. • Does not capture uncertainty.

  9. How to Incorporate Translation Invariance? • We need something better than a “bag of frames” model Training images Testing image

  10. Panorama • It models both appearance & geometry • Adapts to camera rotation and focal length change • Generative • An image is a patch “extracted” from the panorama M. Brown and D. G. Lowe

  11. Cons of Panoramas • Not easy to build a panorama due to parallax • Do not capture uncertainty • Only work for location instance recognition • No compact representation for repetitive scenes

  12. Gaussian Mixture Model • Six mixtures trained as in Torralba et al’s paper • Handles uncertainties but no translation invariance Remove boundaries Much more blurred Means Variances

  13. A Weak Panorama • 3D motions can be roughly modeled by 2D translation + scaling. 2D translation Scaling

  14. Epitome = Panorama + GMM • Epitome • Generative model for image patches /video frames • Captures repetitive patterns in the original image • Mapping = 2D translation + scaling Epitome A source image Image patches N. Jojic et.al., ICCV 2003; N. Petrovic, et.al., CVPR 2006

  15. Epitome as Probabilistic Panorama • Model 3D scenes rather than a single 2D image Location Epitome Means Variances Environment = Virtual panorama

  16. Learning the Location Epitome • Initialize epitome randomly • EM Iterations • E-step: infer the posteriors over all mappings • M-step: use the posteriors as weights to update the mean and variance of epitome pixels Free energy EM iterations

  17. Model Comparison • Epitome is a smart mixture of Gaussians model with parameters sharing among components • For the same number of parameters, the epitome generalizes better

  18. Goal Introduction Recognition Enhancements Evaluation

  19. Build Label Maps • The label maps are the posterior of the label given the mapping Cubicle label map Corridor label map Epitome Label maps

  20. Recognition from Location Epitomes • Fast correlation: infer the best mapping region • Sum the pixel-wise votes • Temporal smoothing using HMM Best matching patch Input testing image Cubicle label map Location epitome Corridor label map

  21. Goal Introduction Recognition Enhancements Evaluation

  22. Color is not always the best feature • Other features besides RGB • For example, stereo feature captures the depth info. • Do not need high stereo accuracy (efficient DP here) Corridor Cubicle Kitchen

  23. Integrating Multiple Features • Stack multiple feature “channels” Stereo R G B

  24. Local Histograms • Enable better translation invariance and more generalization • Error rate: 0.49  0.36 in a test, 4-class dataset • Improve the efficiency dramatically: 30 times speed-up

  25. Supervised Learning • Incorporates training image labels • Helps discriminate images with similar features but different location labels. A microwave in the kitchen An example epitome A monitor in the cubicle Discriminative features An example label feature

  26. Goal Introduction Recognition Enhancements Evaluation

  27. MIT Image Database • Created by Antonio Torralba, and et. al. • 17 sequences, 62 locations, 7 categories, 72077 images

  28. Results on Recognizing Location Instances • Location epitome vs. GMM, 10% better in average

  29. Results on Recognizing Location Classes • Location Epitome vs. GMM, 10%-20% better

  30. MSRC Data Set • Captured with a stereo camera • 5409 images collected at the speed of 4 fps • 11 sequences and 7 classes corridor_visionlab cubicle_mlp kitchen-fl2-north lectureroom-large lectureroom-small stairs-1st-to-2nd stairs-2nd-to-1st

  31. Integrate Depth Cues corridor_visionlab cubicle_mlp kitchen-fl2-north lectureroom-large lectureroom-small stairs-1st-to-2nd stairs-2nd-to-1st

  32. Instance Recognition with Multiple Features • RGB & Stereo overwhelms the other features • Learning: 5.7 fps • Recognition: 116 fps = 29 times the capture speed

  33. Summary • A generative model for the recognition of both location instances and classes • Fast: capable of real-time applications • Flexible: capable of integrating various features • Probabilistic: capable of capturing uncertainties • Future applications • Navigation for visually impaired people • Appearance-based loop closing for SLAM problems

  34. K. Ni, A. Kannan, A. Criminisi and J. Winn Epitomic Location Recognition Thank you ! A generative approach for location recognition

  35. Local Histograms (2) • Improves efficiency (both training and testing) • The bottle neck: convoluting epitome and images • Compression rate: 3*(C1C2)2/50 = 2400 • Learning: 3 hours  6 mins, 30 times faster Ne/C2 N/C2 Me/C1 M/C1 N Ne Epitome Image Me M * * Convolute 3-dimension RGB features Convolute 50-dimension local histograms

More Related