1 / 60

4.1 Feature: Point and Patches

4.1 Feature: Point and Patches. Xuejin Chen Reading: Szeliski Chap. 4 222~238. 4.1.2 Feature Descriptor. Feature Descriptors. We know how to detect good points Next question: How to match them?. ?. Feature Descriptors. ?. Lots of possibilities (this is a popular research area)

Download Presentation

4.1 Feature: Point and Patches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 4.1 Feature: Point and Patches Xuejin Chen Reading: Szeliski Chap. 4 222~238

  2. 4.1.2 Feature Descriptor

  3. Feature Descriptors We know how to detect good points Next question: How to match them? ?

  4. Feature Descriptors ? Lots of possibilities (this is a popular research area) • Simple option: match square windows around the point • State of the art approach: SIFT • David Lowe, UBC http://www.cs.ubc.ca/~lowe/keypoints/

  5. Feature Descriptors • Sum of Squared Difference (SSD) • Normalized Cross-Correlation (NCC) • For small motion: video stereo, tracking… • Time consuming and sensitive to noise, transforms..

  6. Invariance Suppose we are comparing two images I1 and I2 • I2 may be a transformed version of I1 • What kinds of transformations are we likely to encounter in practice? We’d like to find the same features regardless of the transformation • This is called transformational invariance • Most feature methods are designed to be invariant to • Translation, 2D rotation, scale • They can usually also handle • Limited 3D rotations (SIFT works up to about 60 degrees) • Limited affine transformations (some are fully affine invariant) • Limited illumination/contrast changes

  7. How to Achieve Invariance Need both of the following: • Make sure your detector is invariant • Harris is invariant to translation and rotation • Scale is trickier • common approach is to detect features at many scales using a Gaussian pyramid (e.g., MOPS) • More sophisticated methods find “the best scale” to represent each feature (e.g., SIFT) 2. Design an invariant feature descriptor • A descriptor captures the information in a region around the detected feature point • The simplest descriptor: a square window of pixels • What’s this invariant to? • Let’s look at some better approaches…

  8. Rotation Invariance for Feature Descriptors Find dominant orientation of the image patch • This is given by x+, the eigenvector of H corresponding to + • + is the larger eigenvalue • Rotate the patch according to this angle Figure by Matthew Brown

  9. MultiscaleOriented PatcheS descriptor Take 40x40 square window around detected feature • Scale to 1/5 size (using prefiltering) • Rotate to horizontal • Sample 8x8 square window centered at feature • Intensity normalize the window by subtracting the mean, dividing by the standard deviation in the window 40 pixels 8 pixels Adapted from slide by Matthew Brown

  10. Detections at Multiple Scales

  11. Scale Invariant Feature Transform • Basic idea: • Take 16x16 square window around detected feature • Compute edge orientation (angle of the gradient - 90) for each pixel • Throw out weak edges (threshold gradient magnitude) • Create histogram of surviving edge orientations angle histogram 2 0 Adapted from slide by David Lowe

  12. SIFT Descriptor • Full version • Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) • Compute an orientation histogram for each cell • 16 cells * 8 orientations = 128 dimensional descriptor Adapted from slide by David Lowe

  13. Properties of SIFT Extraordinarily robust matching technique • Can handle changes in viewpoint • Up to about 60 degree out of plane rotation • Can handle significant changes in illumination • Sometimes even day vs. night (below) • Fast and efficient—can run in real time • Lots of code available • http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT

  14. PCA-SIFT Ke and Sukthankar 2004 • 39x39 patches • Ix, Iy • 39x39x2=3021 D vector  36 D using Principle Component Analysis

  15. Gradient Location-Orientation Histogram (GLOH) Mikolajczyk and Schmid 2005 • Log-polar binning structure instead of four quadrants • 17 spatial bins * 16 orientation bins = 272 D • PCA -> 128 D Best performance overall

  16. Maximally Stable Extremal Regions J.Matas et.al. “Distinguished Regions for Wide-baseline Stereo”. BMVC 2002. • Maximally Stable Extremal Regions • Threshold image intensities: I > threshfor several increasing values of thresh • Extract connected components(“Extremal Regions”) • Find a threshold when region is “Maximally Stable”, i.e. local minimumof the relative growth • Approximate each region with an ellipse

  17. Performance of Local Descriptor • Comparison in • GLOH > SIFT > others Mikolajczyk and Schmid 2005

  18. Many New Descriptors • Many newer techniques: tuning parameters.. • Trained from large database • Class- or instance- specific features…

  19. 4.1.3 Feature Matching Given a feature in I1, how to find the best match in I2? • Define distance function that compares two descriptors • Test all the features in I2, find the one with min distance • Efficient data structures and algorithms

  20. Feature Distance How to define the difference between features f1, f2? • Simple approach is SSD(f1, f2) • Sum of square differences between entries of the two descriptors • Can give good scores to very ambiguous (bad) matches f1 f2 I1 I2

  21. Feature Distance How to define the difference between features f1, f2? • Better approach: ratio distance = SSD(f1, f2) / SSD(f1, f2’) • f2 is best SSD match to f1 in I2 • f2’ is 2nd best SSD match to f1 in I2 • gives small values for ambiguous matches f1 f2' f2 I1 I2

  22. Evaluating the Results How can we measure the performance of a feature matcher? 50 75 200 feature distance

  23. True/False Positives The distance threshold affects performance True positives = # of detected matches that are correct Suppose we want to maximize these—how to choose threshold? False positives = # of detected matches that are incorrect Suppose we want to minimize these—how to choose threshold? 50 true match 75 200 false match feature distance Threshold

  24. Performance Evaluation • TP: true positives, i.e., number of correct matches; • FN: false negatives, matches that were not correctly detected; • FP: false positives, proposed matches that are incorrect; • TN: true negatives, non-matches that were correctly rejected.

  25. True/False Positive/Negative • Threshold dashed circle • Green 1: true positive • Blue 1: true positive • 3: false positive • 4: false positive • Black 1, 2: features • Threshold solid circle • Green 1: true positive • Blue 1: false negative • 3: false positive • 4: true negative

  26. Performance Evaluation • Unit rates: • True positive rate (TPR) • False positive rate (FPR) • Positive predictive value (PPV) • Accuracy (ACC)

  27. Confusion Matrix True matches True non-matches Predicted matches Predicted non-matches Ideally, TPR => 1, FPR =>0

  28. # true positives # matching features (positives) # false positives # unmatched features (negatives) Evaluating the Results How can we measure the performance of a feature matcher? 1 0.7 For a specific threshold truepositiverate 0 1 false positive rate 0.1

  29. Evaluating the Results How can we measure the performance of a feature matcher? # true positives # matching features (positives) # false positives # unmatched features (negatives) ROC curve (“Receiver Operator Characteristic”) 1 0.7 Close to upper left corner truepositiverate 0 1 false positive rate 0.1 • ROC Curves • Generated by counting # current/incorrect matches, for different thresholds • Want to maximize area under the curve (AUC) • Useful for comparing different feature matching methods • For more info: http://en.wikipedia.org/wiki/Receiver_operating_characteristic

  30. Performance Evaluation • Give the distribution, look for the best threshold • However, hard to have a clear distribution function • Hard to set a best threshold Distribution of positives and negatives Inter-space distance

  31. Threshold will adapt to different regions of the feature space • Fixed threshold • Db: FN; Dc, De: FP • Nearest Neighbor • Db: TP; Dc: FP Trained from large database to get appropriate threshold • Nearest neighbor distance ratio (NNDR) • Small d1/d2 • Db: TP; Dc, De: TN (Mikolajczyk and Schmid 2005)

  32. Performance Evaluation (Mikolajczyk and Schmid 2005) Fixed threshold

  33. Performance Evaluation (Mikolajczyk and Schmid 2005) Nearest neighbor

  34. Performance Evaluation (Mikolajczyk and Schmid 2005) NNDR

  35. Feature Space • Nearest Neighbor Distance Ratio • Adaptive threshold based on different regions in feature space • Trained from database • ..

  36. Efficient Matching • Compare all features against all others • Quadratic, impractical for most applications • Indexing structure • For individual image, or globally for all images • Multi-dimensional search tree • Hash table • Vocabulary tree • …

  37. Efficient Matching • Multi-dimensional hashing • Map descriptor into fixed size buckets • When matching, new feature is hashed into a bucket • Search for nearby buckets to return potential candidates 1 2 3 N ……

  38. Haar Wavelets MOPS, Brown, Szeliski, andWinder (2005) • 3D index by performing sum over the quadrants • Normalize the 3 values by their expected standard deviations • V_n  two nearest bins (of 10)  index 2^3 = 8 bins • Query: primary 3D bin, k nearest neighbors for further process

  39. More Structures Page 232, 233 • Locality Sensitive Hashing • More widely applicable • Use unions of independently computed hashing functions to index the features • Parameter-Sensitive Hashing • More sensitive to the distribution of points in parameter space • High-D vectors to binary codes • Compared using Hamming distance • Accommodate arbitrary kernel functions

  40. KD Tree • Multi-dimensional search tree • Recursively divide multi-D feature space along alternating axis-aligned hyperplanes • Choose the threshold along each axis so as to maximize some criterion (tree balance, maximum depth as small as possible…) • Query: nearby bins

  41. KD Tree

  42. Additional Data Structure Page 234 • Slicing • use a series of 1D binary searches on point list sorted along different dimensions to efficiently cull down a list of candidate points that lie within a hypercube of the query point • Reweight the matches at different levels of the indexing tree • Less sensitive to discretization errors in tree construction • Metric tree • A small number of prototypes at each level in a hierarchy • Visual words  classical information retrieval, fast

  43. Indexing Structure • Survey and comparison on indexing structures • Kd tree works best • Rapid computation of image feature correspondences remains a challenging open research problem Muja and Lowe (2009)

  44. Verification and Densification • Geometry alignment to verify • Inliers and outliers • RANSAC, Random sampling • Will be discussed further in next sections

  45. 4.1.4 Feature Tracking • Find features in all candidate images • Detect in one then search in others • Detect and track: for video tracking applications, where the expected amount of motion and appearance deformation between adjacent frames in expected to be small • Subsequent frames • SSD works well • NCC is better if there is brightness change • Hierarchical search strategy when search range is large: matches in lower resolution images to provide better initial guesses and speed up the search

  46. 4.1.4 Feature Tracking • Long image sequence: large appearance change • Whether to continue matching against the originally detected patch or • Re-sample each subsequent frame at the matching location Fail when original patch apperance changes such as foreshotening Risk: features drift from its original location

  47. Feature Tracking (Shi and Tomasi 1994) • Affine motion model • Compare patches in neighboring frames using a translational model • Use the location to initialize an affine registration between the patch in the current frame and the base frame • The area around the predicted feature location is searched with an incremental registration algorithm • Kanade–Lucas–Tomasi (KLT) tracker. Detect new features in regions where the tracking has fail

  48. A lot of Expansions Page 236 • Tracking combined with structure from motion • Tie together the corners • Tracking in video with large number of moving objects or points • Special purpose recognizer: Learning algorithms • Train classifiers on sample patches and their affine deformation, fast and reliable, fast motion is supported • Survey … (Yilmaz, Javed, and Shah 2006)

  49. Real-time head tracking using the fast trained classifiers of Lepetit, Pilet, and Fua (2004) 2004 IEEE.

  50. Application: Performance-driven animation • Expression and head tracking • Morph among a series of hand-drawn sketches Buck, Finkelstein, Jacobs et al. (2000)

More Related