1. Ani � 13 to 26 and 40 to END Mudit � 1 to 12 and 27 to 39Ani � 13 to 26 and 40 to END Mudit � 1 to 12 and 27 to 39

2. Contents Overall picture Region detectors Scale invariant detection Localization Orientation assignment Region description SIFT approach Local Jet Storing and matching State of the art - Video Google

3. Our goal Detecting repeatable image regions Obtaining reliable and distinctive descriptors Searching an image database for an object efficiently

4. Invariance vital Scale Rotation Orientation Illumination Noise Affine

5. Region detectors Harris points - Invariant to rotation Two significant eigenvalues indicate an interest point Harris-Laplace � Invariant to rotation and scale Uses Laplacian of Gaussian operator SIFT - Scale space extrema using Difference of Gaussian

6. Scale Invariant Detection Consider regions (e.g. circles) of different sizes around a point Regions of corresponding sizes will look the same in both images

7. Scale Invariant Detection The problem: how do we choose corresponding circles independently in each image?

8. Scale Invariant Detection A �good� function for scale detection has one stable sharp peak

9. Scale-space Definition: where Keypoints are detected using scale-space extrema in difference-of-Gaussian function D Efficient to compute Close approximation to scale-normalized Laplacian of Gaussian,

10. Image space to scale space

11. Relationship of D to Diffusion equation: Approximate ?G/?s: giving, Therefore, When D has scales differing by a constant factor it already incorporates the s2 scale normalization required for scale-invariance

12. Local extrema detection Find maxima and minima in scale space

13. Frequency of sampling in scale

14. Localization 3D quadratic function is fit to the local sample points Start with Taylor expansion with sample point as the origin where Take the derivative with respect to X, and set it to 0, giving is the location of the keypoint This is a 3x3 linear system

15. Localization Derivatives approximated by finite differences, example: If X� is > 0.5 in any dimension, process repeated

16. Being picky! Contrast (use prev. equation): If | D(X) | < 0.03, throw it out Edge-iness: Use ratio of principal curvatures to throw out poorly defined peaks Curvatures come from Hessian: No need to explicitly calculate eigenvalues. We only need their ratio!

17. Orientation assignment Descriptor computed relative to keypoint�s orientation achieves rotation invariance Precomputed along with mag. for all levels (useful in descriptor computation) Multiple orientations assigned to keypoints from an orientation histogram Significantly improve stability of matching

18. Choosing the right image descriptors Distribution-Based Luminance based approaches Histograms of pixel intensities and location SIFT � Based on gradient distribution in the region Geometric based approaches Shape context Spatial-Frequency Techniques Fourier transform based � No spatial information Gabor filters and wavelets � Large number of filters

19. Choosing the right image descriptors Differential descriptors Local Jets - Set of image derivatives Steerable filters � steering derivatives in the direction of the gradient Miscellaneous Using generalized moment invariants � characterize shape and intensity distribution

20. Who wants to be a Millionaire?

22. Local image descriptor - SIFT

23. Local image descriptor - SIFT Weight magnitude of each sample point by Gaussian weighting function, s=0.5*width Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects) Allows for significant shift in gradient positions

24. Illumination invariance for SIFT Affine changes Normalizing vector to unit length � accounts for overall brightness change Non-linear changes Occur due to camera saturation / viewpoint changes Thresholding values in the unit feature vector to 0.2 Re-normalizing Less importance to large gradients More importance to distribution of orientations

25. Width of SIFT desriptor 2 parameters to be obtained Number of orientations in histogram Size of histogram array Optimal size obtained experimentally

26. Stability as a function of affine distortion The approach is not truly affine invariant Initial features located in a non-affine manner

27. Local Image Descriptor � Local Jet Image in a neighborhood of a point can be described by the set of its derivatives Local jet of order N at a point x = (x1,x2) is defined using convolution of image I with the Gaussian derivatives Complete set of invariants is computed that locally characterizes the signal By stacking invariants in a vector

28. Local Image Descriptor � Multiscale approach Vector of invariants are calculated at different scales Half-octave quantization is used Difference between consecutive sizes < 20% s varies between 0.48 to 2.07

29. Longer vectors decrease the probability of repeatability Global features are sensitive to extraneous features or partial visibility Solution For each interest point, select p nearest features For matching, the constraint of angle between line joining neighboring points is added Assumption that 50% of the points will match using these semi-local constraints Local Image Descriptor � Semilocal Constraints

30. Semilocal Constraints

31. Comparison of SIFT and Local Grayvalue Invariants

32. Storing A set of keypoints are obtained from each reference image Each such keypoint has a graphical descriptor � which is a 128 components vector (4*4*8) All such (keypoint, vector) pairs corresponding to a set of reference images are stored in a set

33. Matching Test image gives a new set of (keypoint, vector) pair For each such pair, we find the nearest (top 2) descriptors in our database set

34. Acceptance of a match Match accepted IF Ratio of distance to first nearest descriptor to that of second < threshold

35. Complexity Initial complexity: Number of features in the query image * total number of features in the database Reason: Because each keypoint(feature) is to be matched with all the features in the database to give the best two matches Solution: k-d Trees!

36. Storage using k-d trees The set is stored using a k-d tree (in both Schmidt Mohr and Lowe techniques)

37. K-d Trees The elements are stored in the leaves. The other nodes are divisions of the space in some dimension. Fixed size one-dimensional buckets are used Each dimension is accessed sequentially Depth of the tree is at most the number of dimensions of stored vectors

38. New complexity!

40. Update and demo

41. STATE OF THE ART Video Google �� NOT videos.google.com !! A text retrieval approach to object matching in videos Josef Sivic and Andrew Zisserman

42. Text retrieval overview Documents are parsed into words Common words are ignored (the, an, etc) This is called �stop list� Words are represented by their stems �walk�, �walking�, �walks� ?�walk� Each word is assigned a unique identifier The vocabulary contains K words Each document is represented by a K components vector of words frequencies

43. Parse and clean �� Representation, detection and learning are the main issues that need to be tackled in designing a visual system for recognizing object. categories ��.� Representation, detection and learning are the main issues that need to be tackled in designing a visual system for recognizing object categories Represent detect learn main issue need tackle design visual system recognize object category

44. Creating the database

45. Querying Parsing the query to create query vector Query: �Representation learning�? Query Doc ID = (1,0,1,0,0,�) Retrieve all documents ID containing one of the Query words ID (Using the invert file index) Calculate the distance between the query and document vectors (angle between vectors) Rank the results

46. Using the text search as an analogy Basic idea: Build a visual vocabulary based on a large set of images. Given a query image, search through the database in a manner similar to the text search.

47. Again �. Detection and Description Detection � finding invariant regions Description � using the SIFT descriptor

48. Building the �Visual Stems� Cluster descriptors into K groups using K-mean clustering algorithm Each cluster represent a �visual word� in the �visual vocabulary� Result: Between 10000 and 20000 clusters used

49. Example clusters

50. Visual �Stop List� The most frequent visual words that occur in almost all images are suppressed

51. Ranking Frames Distance between vectors (Like in words/Document) Spatial consistency (= Word order in the text)

52. The Visual Analogy

55. Example searches Object query http://www.robots.ox.ac.uk/~vgg/research/vgoogle/how/results/bolle/bolle.html http://www.robots.ox.ac.uk/~vgg/research/vgoogle/how/results/poster/poster.html Scene Query http://www.robots.ox.ac.uk/~vgg/research/vgoogle/how/examples/example_scene.html

56. Open issues Automatic ways for building the vocabulary are needed Ranking of retrieval results method as Google does Extension to non rigid objects, like faces Using this method for higher level analysis of movies

57. References David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110. C.Schmid & R.Mohr (1997), �Local Grayvalue Invariants for Image Retrieval�, IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(5), 530-535. � K. Mikolajczk and C. Schmid. �A performance evaluation of local descriptors�. In IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pages 257�263, 2003. Darya Frolova, Denis Simakov, �Matching with Invariant Features�, The Weizmann Institute of Science, March 2004 Javier Ruiz-del-Solar, and Patricio Loncomilla, �Object Recognition using Local Descriptors�, Center for Web Research, Universidad de Chile Lior Shoval and Rafi Haddad, �Approximate Nearest Neighbor � Applications to Vision and Matching�

58. References�continued Josef Sivic, Frederik Schaffalitzky and Andrew Zisserman, �Video Google Demo�, http://www.robots.ox.ac.uk/~vgg/research/vgoogle David Lowe, �Demo Software: SIFT Keypoint Detector�, http://www.cs.ubc.ca/~lowe/keypoints/ http://cs223b.stanford.edu/ http://www.cs.ubc.ca/~lowe/keypoints/

59. Thanks !

Contents

Contents

Presentation Transcript

Contents

Contents

Contents

Contents

Contents

Contents

Contents

CONTENTS

Contents

Contents