580 likes | 791 Views
Contents. Overall pictureRegion detectorsScale invariant detectionLocalizationOrientation assignmentRegion descriptionSIFT approachLocal JetStoring and matchingState of the art - Video Google. Our goal. Detecting repeatable image regionsObtaining reliable and distinctive descriptorsSearching an image database for an object efficiently.
E N D
1. Ani – 13 to 26 and 40 to END
Mudit – 1 to 12 and 27 to 39Ani – 13 to 26 and 40 to END
Mudit – 1 to 12 and 27 to 39
2. Contents Overall picture
Region detectors
Scale invariant detection
Localization
Orientation assignment
Region description
SIFT approach
Local Jet
Storing and matching
State of the art - Video Google
3. Our goal Detecting repeatable image regions
Obtaining reliable and distinctive descriptors
Searching an image database for an object efficiently
4. Invariance vital Scale
Rotation
Orientation
Illumination
Noise
Affine
5. Region detectors Harris points - Invariant to rotation
Two significant eigenvalues indicate an interest point
Harris-Laplace – Invariant to rotation and scale
Uses Laplacian of Gaussian operator
SIFT - Scale space extrema using Difference of Gaussian
6. Scale Invariant Detection Consider regions (e.g. circles) of different sizes around a point
Regions of corresponding sizes will look the same in both images
7. Scale Invariant Detection The problem: how do we choose corresponding circles independently in each image?
8. Scale Invariant Detection A “good” function for scale detection has one stable sharp peak
9. Scale-space Definition:
where
Keypoints are detected using scale-space extrema in difference-of-Gaussian function D
Efficient to compute
Close approximation to scale-normalized Laplacian of Gaussian,
10. Image space to scale space
11. Relationship of D to Diffusion equation:
Approximate ?G/?s:
giving,
Therefore,
When D has scales differing by a constant factor it already incorporates the s2 scale normalization required for scale-invariance
12. Local extrema detection Find maxima and minima in scale space
13. Frequency of sampling in scale
14. Localization 3D quadratic function is fit to the local sample points
Start with Taylor expansion with sample point as the origin
where
Take the derivative with respect to X, and set it to 0, giving
is the location of the keypoint
This is a 3x3 linear system
15. Localization
Derivatives approximated by finite differences,
example:
If X’ is > 0.5 in any dimension, process repeated
16. Being picky! Contrast (use prev. equation):
If | D(X) | < 0.03, throw it out
Edge-iness:
Use ratio of principal curvatures to throw out poorly defined peaks
Curvatures come from Hessian:
No need to explicitly calculate eigenvalues. We only need their ratio!
17. Orientation assignment Descriptor computed relative to keypoint’s orientation achieves rotation invariance
Precomputed along with mag. for all levels (useful in descriptor computation)
Multiple orientations assigned to keypoints from an orientation histogram
Significantly improve stability of matching
18. Choosing the right image descriptors Distribution-Based
Luminance based approaches
Histograms of pixel intensities and location
SIFT – Based on gradient distribution in the region
Geometric based approaches
Shape context
Spatial-Frequency Techniques
Fourier transform based – No spatial information
Gabor filters and wavelets – Large number of filters
19. Choosing the right image descriptors Differential descriptors
Local Jets - Set of image derivatives
Steerable filters – steering derivatives in the direction of the gradient
Miscellaneous
Using generalized moment invariants – characterize shape and intensity distribution
20. Who wants to be a Millionaire?
22. Local image descriptor - SIFT
23. Local image descriptor - SIFT
Weight magnitude of each sample point by Gaussian weighting function, s=0.5*width
Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects)
Allows for significant shift in gradient positions
24. Illumination invariance for SIFT Affine changes
Normalizing vector to unit length – accounts for overall brightness change
Non-linear changes
Occur due to camera saturation / viewpoint changes
Thresholding values in the unit feature vector to 0.2
Re-normalizing
Less importance to large gradients
More importance to distribution of orientations
25. Width of SIFT desriptor 2 parameters to be obtained
Number of orientations in histogram
Size of histogram array
Optimal size obtained experimentally
26. Stability as a function of affine distortion The approach is not truly affine invariant
Initial features located in a non-affine manner
27. Local Image Descriptor – Local Jet Image in a neighborhood of a point can be described by the set of its derivatives
Local jet of order N at a point x = (x1,x2) is defined using convolution of image I with the Gaussian derivatives
Complete set of invariants is computed that locally characterizes the signal
By stacking invariants in a vector
28. Local Image Descriptor – Multiscale approach Vector of invariants are calculated at different scales
Half-octave quantization is used
Difference between consecutive sizes < 20%
s varies between 0.48 to 2.07
29. Longer vectors decrease the probability of repeatability
Global features are sensitive to extraneous features or partial visibility
Solution
For each interest point, select p nearest features
For matching, the constraint of angle between line joining neighboring points is added
Assumption that 50% of the points will match using these semi-local constraints Local Image Descriptor – Semilocal Constraints
30. Semilocal Constraints
31. Comparison of SIFT and Local Grayvalue Invariants
32. Storing A set of keypoints are obtained from each reference image
Each such keypoint has a graphical descriptor – which is a 128 components vector (4*4*8)
All such (keypoint, vector) pairs corresponding to a set of reference images are stored in a set
33. Matching Test image gives a new set of (keypoint, vector) pair
For each such pair, we find the nearest (top 2) descriptors in our database set
34. Acceptance of a match Match accepted IF
Ratio of distance to first nearest descriptor to that of second < threshold
35. Complexity Initial complexity:
Number of features in the query image * total number of features in the database
Reason: Because each keypoint(feature) is to be matched with all the features in the database to give the best two matches
Solution: k-d Trees!
36. Storage using k-d trees The set is stored using a k-d tree (in both Schmidt Mohr and Lowe techniques)
37. K-d Trees The elements are stored in the leaves.
The other nodes are divisions of the space in some dimension.
Fixed size one-dimensional buckets are used
Each dimension is accessed sequentially
Depth of the tree is at most the number of dimensions of stored vectors
38. New complexity!
40. Update and demo
41. STATE OF THE ART Video Google ………
NOT videos.google.com !!
A text retrieval approach to object matching in videos
Josef Sivic and Andrew Zisserman
42. Text retrieval overview Documents are parsed into words
Common words are ignored (the, an, etc)
This is called ‘stop list’
Words are represented by their stems
‘walk’, ‘walking’, ‘walks’ ?’walk’
Each word is assigned a unique identifier
The vocabulary contains K words
Each document is represented by a K components vector of words frequencies
43. Parse and clean “…… Representation, detection and learning are the main issues that need to be tackled in designing a visual system for recognizing object. categories …….”
Representation, detection and learning are the main issues that need to be tackled in designing a visual system for recognizing object categories
Represent detect learn main issue need tackle design visual system recognize object category
44. Creating the database
45. Querying Parsing the query to create query vector Query: “Representation learning”? Query Doc ID = (1,0,1,0,0,…)
Retrieve all documents ID containing one of the Query words ID (Using the invert file index)
Calculate the distance between the query and document vectors (angle between vectors)
Rank the results
46. Using the text search as an analogy
Basic idea: Build a visual vocabulary based on a large set of images. Given a query image, search through the database in a manner similar to the text search.
47. Again …. Detection and Description Detection – finding invariant regions
Description – using the SIFT descriptor
48. Building the “Visual Stems”
Cluster descriptors into K groups using K-mean clustering algorithm
Each cluster represent a “visual word” in the “visual vocabulary”
Result: Between 10000 and 20000 clusters used
49. Example clusters
50. Visual “Stop List” The most frequent visual words that occur in almost all images are suppressed
51. Ranking Frames Distance between vectors (Like in words/Document)
Spatial consistency (= Word order in the text)
52. The Visual Analogy
55. Example searches Object query
http://www.robots.ox.ac.uk/~vgg/research/vgoogle/how/results/bolle/bolle.html
http://www.robots.ox.ac.uk/~vgg/research/vgoogle/how/results/poster/poster.html
Scene Query
http://www.robots.ox.ac.uk/~vgg/research/vgoogle/how/examples/example_scene.html
56. Open issues Automatic ways for building the vocabulary are needed
Ranking of retrieval results method as Google does
Extension to non rigid objects, like faces
Using this method for higher level analysis of movies
57. References David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
C.Schmid & R.Mohr (1997), “Local Grayvalue Invariants for Image Retrieval”, IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(5), 530-535.
K. Mikolajczk and C. Schmid. “A performance evaluation of local descriptors”. In IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pages 257–263, 2003.
Darya Frolova, Denis Simakov, “Matching with Invariant Features”, The Weizmann Institute of Science, March 2004
Javier Ruiz-del-Solar, and Patricio Loncomilla, “Object Recognition using Local Descriptors”, Center for Web Research, Universidad de Chile
Lior Shoval and Rafi Haddad, “Approximate Nearest Neighbor – Applications to Vision and Matching”
58. References…continued Josef Sivic, Frederik Schaffalitzky and Andrew Zisserman, “Video Google Demo”, http://www.robots.ox.ac.uk/~vgg/research/vgoogle
David Lowe, “Demo Software: SIFT Keypoint Detector”, http://www.cs.ubc.ca/~lowe/keypoints/
http://cs223b.stanford.edu/
http://www.cs.ubc.ca/~lowe/keypoints/
59.
Thanks !