1 / 58

Contents

Contents. Overall pictureRegion detectorsScale invariant detectionLocalizationOrientation assignmentRegion descriptionSIFT approachLocal JetStoring and matchingState of the art - Video Google. Our goal. Detecting repeatable image regionsObtaining reliable and distinctive descriptorsSearching an image database for an object efficiently.

meara
Download Presentation

Contents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Ani – 13 to 26 and 40 to END Mudit – 1 to 12 and 27 to 39Ani – 13 to 26 and 40 to END Mudit – 1 to 12 and 27 to 39

    2. Contents Overall picture Region detectors Scale invariant detection Localization Orientation assignment Region description SIFT approach Local Jet Storing and matching State of the art - Video Google

    3. Our goal Detecting repeatable image regions Obtaining reliable and distinctive descriptors Searching an image database for an object efficiently

    4. Invariance vital Scale Rotation Orientation Illumination Noise Affine

    5. Region detectors Harris points - Invariant to rotation Two significant eigenvalues indicate an interest point Harris-Laplace – Invariant to rotation and scale Uses Laplacian of Gaussian operator SIFT - Scale space extrema using Difference of Gaussian

    6. Scale Invariant Detection Consider regions (e.g. circles) of different sizes around a point Regions of corresponding sizes will look the same in both images

    7. Scale Invariant Detection The problem: how do we choose corresponding circles independently in each image?

    8. Scale Invariant Detection A “good” function for scale detection has one stable sharp peak

    9. Scale-space Definition: where Keypoints are detected using scale-space extrema in difference-of-Gaussian function D Efficient to compute Close approximation to scale-normalized Laplacian of Gaussian,

    10. Image space to scale space

    11. Relationship of D to Diffusion equation: Approximate ?G/?s: giving, Therefore, When D has scales differing by a constant factor it already incorporates the s2 scale normalization required for scale-invariance

    12. Local extrema detection Find maxima and minima in scale space

    13. Frequency of sampling in scale

    14. Localization 3D quadratic function is fit to the local sample points Start with Taylor expansion with sample point as the origin where Take the derivative with respect to X, and set it to 0, giving is the location of the keypoint This is a 3x3 linear system

    15. Localization Derivatives approximated by finite differences, example: If X’ is > 0.5 in any dimension, process repeated

    16. Being picky! Contrast (use prev. equation): If | D(X) | < 0.03, throw it out Edge-iness: Use ratio of principal curvatures to throw out poorly defined peaks Curvatures come from Hessian: No need to explicitly calculate eigenvalues. We only need their ratio!

    17. Orientation assignment Descriptor computed relative to keypoint’s orientation achieves rotation invariance Precomputed along with mag. for all levels (useful in descriptor computation) Multiple orientations assigned to keypoints from an orientation histogram Significantly improve stability of matching

    18. Choosing the right image descriptors Distribution-Based Luminance based approaches Histograms of pixel intensities and location SIFT – Based on gradient distribution in the region Geometric based approaches Shape context Spatial-Frequency Techniques Fourier transform based – No spatial information Gabor filters and wavelets – Large number of filters

    19. Choosing the right image descriptors Differential descriptors Local Jets - Set of image derivatives Steerable filters – steering derivatives in the direction of the gradient Miscellaneous Using generalized moment invariants – characterize shape and intensity distribution

    20. Who wants to be a Millionaire?

    22. Local image descriptor - SIFT

    23. Local image descriptor - SIFT Weight magnitude of each sample point by Gaussian weighting function, s=0.5*width Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects) Allows for significant shift in gradient positions

    24. Illumination invariance for SIFT Affine changes Normalizing vector to unit length – accounts for overall brightness change Non-linear changes Occur due to camera saturation / viewpoint changes Thresholding values in the unit feature vector to 0.2 Re-normalizing Less importance to large gradients More importance to distribution of orientations

    25. Width of SIFT desriptor 2 parameters to be obtained Number of orientations in histogram Size of histogram array Optimal size obtained experimentally

    26. Stability as a function of affine distortion The approach is not truly affine invariant Initial features located in a non-affine manner

    27. Local Image Descriptor – Local Jet Image in a neighborhood of a point can be described by the set of its derivatives Local jet of order N at a point x = (x1,x2) is defined using convolution of image I with the Gaussian derivatives Complete set of invariants is computed that locally characterizes the signal By stacking invariants in a vector

    28. Local Image Descriptor – Multiscale approach Vector of invariants are calculated at different scales Half-octave quantization is used Difference between consecutive sizes < 20% s varies between 0.48 to 2.07

    29. Longer vectors decrease the probability of repeatability Global features are sensitive to extraneous features or partial visibility Solution For each interest point, select p nearest features For matching, the constraint of angle between line joining neighboring points is added Assumption that 50% of the points will match using these semi-local constraints Local Image Descriptor – Semilocal Constraints

    30. Semilocal Constraints

    31. Comparison of SIFT and Local Grayvalue Invariants

    32. Storing A set of keypoints are obtained from each reference image Each such keypoint has a graphical descriptor – which is a 128 components vector (4*4*8) All such (keypoint, vector) pairs corresponding to a set of reference images are stored in a set

    33. Matching Test image gives a new set of (keypoint, vector) pair For each such pair, we find the nearest (top 2) descriptors in our database set

    34. Acceptance of a match Match accepted IF Ratio of distance to first nearest descriptor to that of second < threshold

    35. Complexity Initial complexity: Number of features in the query image * total number of features in the database Reason: Because each keypoint(feature) is to be matched with all the features in the database to give the best two matches Solution: k-d Trees!

    36. Storage using k-d trees The set is stored using a k-d tree (in both Schmidt Mohr and Lowe techniques)

    37. K-d Trees The elements are stored in the leaves. The other nodes are divisions of the space in some dimension. Fixed size one-dimensional buckets are used Each dimension is accessed sequentially Depth of the tree is at most the number of dimensions of stored vectors

    38. New complexity!

    40. Update and demo

    41. STATE OF THE ART Video Google ……… NOT videos.google.com !! A text retrieval approach to object matching in videos Josef Sivic and Andrew Zisserman

    42. Text retrieval overview Documents are parsed into words Common words are ignored (the, an, etc) This is called ‘stop list’ Words are represented by their stems ‘walk’, ‘walking’, ‘walks’ ?’walk’ Each word is assigned a unique identifier The vocabulary contains K words Each document is represented by a K components vector of words frequencies

    43. Parse and clean “…… Representation, detection and learning are the main issues that need to be tackled in designing a visual system for recognizing object. categories …….” Representation, detection and learning are the main issues that need to be tackled in designing a visual system for recognizing object categories Represent detect learn main issue need tackle design visual system recognize object category

    44. Creating the database

    45. Querying Parsing the query to create query vector Query: “Representation learning” ? Query Doc ID = (1,0,1,0,0,…) Retrieve all documents ID containing one of the Query words ID (Using the invert file index) Calculate the distance between the query and document vectors (angle between vectors) Rank the results

    46. Using the text search as an analogy Basic idea: Build a visual vocabulary based on a large set of images. Given a query image, search through the database in a manner similar to the text search.

    47. Again …. Detection and Description Detection – finding invariant regions Description – using the SIFT descriptor

    48. Building the “Visual Stems” Cluster descriptors into K groups using K-mean clustering algorithm Each cluster represent a “visual word” in the “visual vocabulary” Result: Between 10000 and 20000 clusters used

    49. Example clusters

    50. Visual “Stop List” The most frequent visual words that occur in almost all images are suppressed

    51. Ranking Frames Distance between vectors (Like in words/Document) Spatial consistency (= Word order in the text)

    52. The Visual Analogy

    55. Example searches Object query http://www.robots.ox.ac.uk/~vgg/research/vgoogle/how/results/bolle/bolle.html http://www.robots.ox.ac.uk/~vgg/research/vgoogle/how/results/poster/poster.html Scene Query http://www.robots.ox.ac.uk/~vgg/research/vgoogle/how/examples/example_scene.html

    56. Open issues Automatic ways for building the vocabulary are needed Ranking of retrieval results method as Google does Extension to non rigid objects, like faces Using this method for higher level analysis of movies

    57. References David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110. C.Schmid & R.Mohr (1997), “Local Grayvalue Invariants for Image Retrieval”, IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(5), 530-535.   K. Mikolajczk and C. Schmid. “A performance evaluation of local descriptors”. In IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pages 257–263, 2003. Darya Frolova, Denis Simakov, “Matching with Invariant Features”, The Weizmann Institute of Science, March 2004 Javier Ruiz-del-Solar, and Patricio Loncomilla, “Object Recognition using Local Descriptors”, Center for Web Research, Universidad de Chile Lior Shoval and Rafi Haddad, “Approximate Nearest Neighbor – Applications to Vision and Matching”

    58. References…continued Josef Sivic, Frederik Schaffalitzky and Andrew Zisserman, “Video Google Demo”, http://www.robots.ox.ac.uk/~vgg/research/vgoogle David Lowe, “Demo Software: SIFT Keypoint Detector”, http://www.cs.ubc.ca/~lowe/keypoints/ http://cs223b.stanford.edu/ http://www.cs.ubc.ca/~lowe/keypoints/

    59. Thanks !

More Related