1 / 55

NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features

NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features. David Lowe Computer Science Department University of British Columbia. Object Recognition. Definition: Identify an object and determine its pose and model parameters Commercial object recognition

casta
Download Presentation

NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NIPS 2003 TutorialReal-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia

  2. Object Recognition • Definition: Identify an object and determine its pose and model parameters • Commercial object recognition • Currently a $4 billion/year industry for inspection and assembly • Almost entirely based on template matching • Upcoming applications • Mobile robots, toys, user interfaces • Location recognition • Digital camera panoramas, 3D scene modeling

  3. Invariant Local Features • Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features

  4. Advantages of invariant local features • Locality: features are local, so robust to occlusion and clutter (no prior segmentation) • Distinctiveness: individual features can be matched to a large database of objects • Quantity: many features can be generated for even small objects • Efficiency: close to real-time performance • Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness

  5. Zhang, Deriche, Faugeras, Luong (95) • Apply Harris corner detector • Match points by correlating only at corner points • Derive epipolar alignment using robust least-squares

  6. Cordelia Schmid & Roger Mohr (97) • Apply Harris corner detector • Use rotational invariants at corner points • However, not scale invariant. Sensitive to viewpoint and illumination change.

  7. Scale invariance Requires a method to repeatably select points in location and scale: • The only reasonable scale-space kernel is a Gaussian (Koenderink, 1984; Lindeberg, 1994) • An efficient choice is to detect peaks in the difference of Gaussian pyramid (Burt & Adelson, 1983; Crowley & Parker, 1984 – but examining more scales) • Difference-of-Gaussian with constant ratio of scales is a close approximation to Lindeberg’s scale-normalized Laplacian (can be shown from the heat diffusion equation)

  8. Scale space processed one octave at a time

  9. Key point localization • Detect maxima and minima of difference-of-Gaussian in scale space • Fit a quadratic to surrounding values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002) • Taylor expansion around point: • Offset of extremum (use finite differences for derivatives):

  10. Sampling frequency for scale More points are found as sampling frequency increases, but accuracy of matching decreases after 3 scales/octave

  11. Eliminating unstable keypoints • Discard points with DOG value below threshold (low contrast) • However, points along edges may have high contrast in one direction but low in another • Compute principal curvatures from eigenvalues of 2x2 Hessian matrix, and limit ratio (Harris approach):

  12. Select canonical orientation • Create histogram of local gradient directions computed at selected scale • Assign canonical orientation at peak of smoothed histogram • Each key specifies stable 2D coordinates (x, y, scale, orientation)

  13. Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) • (a) 233x189 image • (b) 832 DOG extrema • (c) 729 left after peak • value threshold • (d) 536 left after testing • ratio of principle • curvatures

  14. Edelman, Intrator & Poggio (97) showed that complex cell outputs are better for 3D recognition than simple correlation Creating features stable to viewpoint change

  15. Classification of rotated 3D models (Edelman 97): Complex cells: 94% vs simple cells: 35% Stability to viewpoint change

  16. SIFT vector formation • Thresholded image gradients are sampled over 16x16 array of locations in scale space • Create array of orientation histograms • 8 orientations x 4x4 histogram array = 128 dimensions

  17. Feature stability to noise • Match features after random change in image scale & orientation, with differing levels of image noise • Find nearest neighbor in database of 30,000 features

  18. Feature stability to affine change • Match features after random change in image scale & orientation, with 2% image noise, and affine distortion • Find nearest neighbor in database of 30,000 features

  19. Distinctiveness of features • Vary size of database of features, with 30 degree affine change, 2% image noise • Measure % correct for single nearest neighbor match

  20. Nearest-neighbor matching to feature database • Hypotheses are generated by matching each feature to nearest neighbor vectors in database • No fast method exists for always finding 128-element vector to nearest neighbor in a large database • Therefore, use approximate nearest neighbor: • We use best-bin-first (Beis & Lowe, 97) modification to k-d tree algorithm • Use heap data structure to identify bins in order by their distance from query point • Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time

  21. Detecting 0.1% inliers among 99.9% outliers • Need to recognize clusters of just 3 consistent features among 3000 feature match hypotheses • LMS or RANSAC would be hopeless! • Generalized Hough transform • Vote for each potential match according to model ID and pose • Insert into multiple bins to allow for error in similarity approximation • Using a hash table instead of an array avoids need to form empty bins or predict array size

  22. Probability of correct match • Compare distance of nearest neighbor to second nearest neighbor (from different object) • Threshold of 0.8 provides excellent separation

  23. Model verification • Examine all clusters in Hough transform with at least 3 features • Perform least-squares affine fit to model. • Discard outliers and perform top-down check for additional features. • Evaluate probability that match is correct • Use Bayesian model, with probability that features would arise by chance if object was not present • Takes account of object size in image, textured regions, model feature count in database, accuracy of fit (Lowe, CVPR 01)

  24. Solution for affine parameters • Affine transform of [x,y] to [u,v]: • Rewrite to solve for transform parameters:

  25. Models for planar surfaces with SIFT keys Planar texture models

  26. Planar recognition • Planar surfaces can be reliably recognized at a rotation of 60° away from the camera • Affine fit approximates perspective projection • Only 3 points are needed for recognition

  27. 3D Object Recognition • Extract outlines with background subtraction

  28. 3D Object Recognition • Only 3 keys are needed for recognition, so extra keys provide robustness • Affine model is no longer as accurate

  29. Recognition under occlusion

  30. Test of illumination invariance • Same image under differing illumination 273 keys verified in final match

  31. Examples of view interpolation

  32. Recognition using View Interpolation

  33. Location recognition

  34. Robot Localization • Joint work with Stephen Se, Jim Little

  35. Map continuously built over time

  36. Locations of map features in 3D

  37. Recognizing Panoramas Matthew Brown and David Lowe (ICCV 2003) • Recognize overlap from an unordered set of images and automatically stitch together • SIFT features provide initial feature matching • Image blending at multiple scales hides the seams Panorama of our lab automatically assembled from 143 images

  38. Why Panoramas? • Are you getting the whole picture? • Compact Camera FOV = 50 x 35°

  39. Why Panoramas? • Are you getting the whole picture? • Compact Camera FOV = 50 x 35° • Human FOV = 200 x 135°

  40. Why Panoramas? • Are you getting the whole picture? • Compact Camera FOV = 50 x 35° • Human FOV = 200 x 135° • Panoramic Mosaic = 360 x 180°

  41. RANSAC for Homography

  42. RANSAC for Homography

  43. RANSAC for Homography

  44. Finding the panoramas

  45. Finding the panoramas

  46. Finding the panoramas

  47. Bundle Adjustment • New images initialised with rotation, focal length of best matching image

  48. Bundle Adjustment • New images initialised with rotation, focal length of best matching image

  49. Multi-band Blending • Burt & Adelson 1983 • Blend frequency bands over range l

  50. 2-band Blending Low frequency (l > 2 pixels) High frequency (l < 2 pixels)

More Related