1 / 49

A System for Observing and Recognizing Objects in the Real World

A System for Observing and Recognizing Objects in the Real World. J.O. Eklundh, M. Björkman, E. Hayman Computational Vision and Active Perception Lab Royal Institute of Technology (KTH). Motivation.

kaethe
Download Presentation

A System for Observing and Recognizing Objects in the Real World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A System for Observing and Recognizing Objects in the Real World J.O. Eklundh, M. Björkman, E. Hayman Computational Vision and Active Perception Lab Royal Institute of Technology (KTH)

  2. Motivation An autonomous agent moving about in a dynamic indoor environment, performing tasks such as finding, picking up and delivering known objects or classes of objects. What should the vision system of such an agent be capable of? Vision in the Real World: Attending, Foveating and Recognizing Objects

  3. Capabilities “where”what attention segmentation recognition what Should be dealt with jointly. Bootstrapping? Vision in the Real World: Attending, Foveating and Recognizing Objects

  4. Recognition and CategorizationFeifei et al 03 Vision in the Real World: Attending, Foveating and Recognizing Objects

  5. A robot looking at a table at 1.5 m Objects subtend only a fraction of the scene and are not centered (no attentional step) Vision in the Real World: Attending, Foveating and Recognizing Objects

  6. Approach - themes • 3D cues relevant in the scene • Motion and stereo used for bootstrapping • Integration of multiple cues • A system interacting with the environment • Fast processes and anytime algorithms desirable Vision in the Real World: Attending, Foveating and Recognizing Objects

  7. The f-g-s problem • Segmenting 3D objects from the background • Computing motion, depth and ego-motion • Acquiring appearance models Issues: • Combining cues • Demonstrate simple algorithms that suffice Monocular as well as binocular cases Vision in the Real World: Attending, Foveating and Recognizing Objects

  8. Cue integration for f-g-s First, the dynamic monocular case • Problem: classifying pixels as being foreground or background (or into layers) • Cues: motion, colour, contrast + prediction (temporal continuity) Inference problem: observations from different spaces to be combined Vision in the Real World: Attending, Foveating and Recognizing Objects

  9. Integration • Two approaches: - probabilistic: likelihood of observing data, given a model of each layer - voting: each cue decides independently, form weighted combination Algorithm • Online initialization of colour + texture models • Use segmentation from motion to train distributions • Suppress unreliable cues • Sequential algorithm Vision in the Real World: Attending, Foveating and Recognizing Objects

  10. Related work • Khan & Shah CVPR’01 • Triesch & von der Malsburg ICAFGR’00 • Spengler & Schiele ICVS’01 • Toyama & Horvitz ACCV’00 • Kragic & Christensen ICRA’99 • Belongie et al ICCV’98 • … Similarity to tracking Vision in the Real World: Attending, Foveating and Recognizing Objects

  11. Voting The likelihood of observation from cue k at pixel i given model of layer j : Posterior probability of layer j : We set Vision in the Real World: Attending, Foveating and Recognizing Objects

  12. Probabilistic fusion • Assume independent observations • The total likelihood of the observations given the combined model : • The posterior estimate of layer membership: An independent opinion pool Vision in the Real World: Attending, Foveating and Recognizing Objects

  13. f.g. model 8 f ( Z | f.g ) = 8 b.g. model f ( Z | b.g ) = 0.4 0.4 Observation: Z Illustrations • Assume a distribution for each layer for each cue Vision in the Real World: Attending, Foveating and Recognizing Objects

  14. Cue combination: Weighted voting • Each cue makes independent decision • Combine using weighted sum • Assuming equal weights: Vision in the Real World: Attending, Foveating and Recognizing Objects

  15. Cue combination: Probabilistic • Compute total likelihood of observations for each layer • Classify using Bayes’ Rule • Assuming uniform priors: Vision in the Real World: Attending, Foveating and Recognizing Objects

  16. Simple! Easy to combine observations from very different spaces Not obvious how cues interact Graded output Mathematically well-founded One cue can easily dominate over others Almost binary output Pros and cons Voting: Probabilistic: Vision in the Real World: Attending, Foveating and Recognizing Objects

  17. Training and adaptation • Start with segmentation just from motion • Use this to train colour and contrast distributions • Use EM algorithm to train Gaussian mixture models • Subsequently adapted online • Recompute models from current data • Update model as weighted sum over time window (Raja et al ECCV’98) Vision in the Real World: Attending, Foveating and Recognizing Objects

  18. Suppressing unreliable cues • Cues unreliable during training • No independent motion poor motion segmentation • Unreliable in the past probably unreliable now! (Triesch and von der Malsburg ICAFGR’00) • Mechanisms: Voting: weights Probabilistic:hyper-priors Vision in the Real World: Attending, Foveating and Recognizing Objects

  19. The effect of hyper-priors Orginal pdf’s After marginalization Vision in the Real World: Attending, Foveating and Recognizing Objects

  20. Results Degree of membership to foreground Original Probabilistic Voting Vision in the Real World: Attending, Foveating and Recognizing Objects

  21. Cue combination Texture (contrast) Motion Colour Prediction Combined Probabilistic Combined Voting Vision in the Real World: Attending, Foveating and Recognizing Objects

  22. Results Original Foreground Vision in the Real World: Attending, Foveating and Recognizing Objects

  23. Results: Probabilistic cue integration Original Foreground mask Vision in the Real World: Attending, Foveating and Recognizing Objects

  24. The cues Texture (contrast) Motion Colour Prediction Combined Vision in the Real World: Attending, Foveating and Recognizing Objects

  25. Process overview Image features Epipolar geometry Non-retinal info. Disparity map Ego-motion Independent motion Additional retinal info. Regions of interest Top-down control Fixation point Vision in the Real World: Attending, Foveating and Recognizing Objects

  26. Calibration • Relative orientations has to be known to • relate disparities to depths • simplify estimation of disparities Using corner features and optical flow model 2 (1+x )a – yr xy a + r + x r Dx Dy 1 1 – x t - y t = z + z y z Unstable process => be careful We first assume r and r to be zero. z y Vision in the Real World: Attending, Foveating and Recognizing Objects

  27. More examples Vision in the Real World: Attending, Foveating and Recognizing Objects

  28. Final output Vision in the Real World: Attending, Foveating and Recognizing Objects

  29. 3D Cues: stereo and motion • Many objects of interest static. Harder! • Motion included in full system; now only stereo • The cues in these examples • Stereo data - exist along contours • Color data/appearance between contours Vision in the Real World: Attending, Foveating and Recognizing Objects

  30. Proposed system structure A wide field for attention, recognition in foveated view Technically by a combination of wide field and foveal cameras • Problem: transfer from wide field to foveal view • Steps: • Divide scene into 3D objects • Select objects through attention (e.g.hue and expected size) • Fixate (and track)object of interest • Recognize objects in foveal view Vision in the Real World: Attending, Foveating and Recognizing Objects

  31. Processes Where Hypotheses Segmen- tation Attention Adaptation Shape and size Knowledge Recognition Knowledge Region of Interest What Vision in the Real World: Attending, Foveating and Recognizing Objects

  32. Flow of information Wide field Foveal Left Right Left Right Calibration Registration Registration Fixation Segmentation Global hue SIFT features Local hue Recognition Attention Gaze direction Vision in the Real World: Attending, Foveating and Recognizing Objects

  33. Figure-ground segmentation Disparity map is sliced into layers. Widths are set to that of requested object. Vision in the Real World: Attending, Foveating and Recognizing Objects

  34. Figure-ground segmentation BinoCues BinoAttn • Disparities using SAD correlations. • Segmentation based on slicing the 3D world. Vision in the Real World: Attending, Foveating and Recognizing Objects

  35. Hue based attention Local hue histograms correlated with that of requested object. Fast implementation using rotating sums. Vision in the Real World: Attending, Foveating and Recognizing Objects

  36. Saliency peaks Peaks from blob detection of depth slices. Based on Differences of Gaussians. Hue saliency map used for weighting. Random value added before selection. Vision in the Real World: Attending, Foveating and Recognizing Objects

  37. Fixation • The foveal system continuously tries to fixate • done using corner features • and affine essential matrix 0 0 c 0 0 d a b e F = Zero disparity filters won’t work Vision in the Real World: Attending, Foveating and Recognizing Objects

  38. Foveated segmentation • To boost recognition • Foveal segmentation based on disparities • Rectification using affine fundamental matrix • Only search for disparities around zero => • Large number of false positives • Points clustered in 3D using mean shift Vision in the Real World: Attending, Foveating and Recognizing Objects

  39. Foveated segmentation Vision in the Real World: Attending, Foveating and Recognizing Objects

  40. Foveated segmentation Vision in the Real World: Attending, Foveating and Recognizing Objects

  41. Small object database in real-time experiments Models of SIFT features and hue histograms Vision in the Real World: Attending, Foveating and Recognizing Objects

  42. Visual scene search Vision in the Real World: Attending, Foveating and Recognizing Objects

  43. Segmentation robustness Vision in the Real World: Attending, Foveating and Recognizing Objects

  44. Segmentation robustness Vision in the Real World: Attending, Foveating and Recognizing Objects

  45. Effect of occlusions Vision in the Real World: Attending, Foveating and Recognizing Objects

  46. Effect of rotations Vision in the Real World: Attending, Foveating and Recognizing Objects

  47. Recognition in w-f-o-v Vision in the Real World: Attending, Foveating and Recognizing Objects

  48. Recognition after foveation Vision in the Real World: Attending, Foveating and Recognizing Objects

  49. Conclusions • We have a running system. • Objects normally found within three saccades • Concern: dependency on corner features • Current work: • Focus on recognition and categorization • More robust foveal segmentation • Additional cues e.g. texture • Learning and adaptation on all levels Vision in the Real World: Attending, Foveating and Recognizing Objects

More Related